Scorecard

What VoxLore Genesis actually does, measured.

Every model version is run through a scorecard of 36 dimensions covering output structure, character voice, conversation depth, robustness, and game integration. Click any dimension to see exactly what we test, why it matters for your game, and how the model has improved across versions.

Dimensions

six categories

Current model

v50

updated continuously

Avg score

90.4

across 29 scored dims

Test runs

2,200+

prompts per scorecard

Version:

Category:

Output Structure

Does the model produce valid, machine-readable JSON every time? This is the foundation - if structure breaks, everything downstream breaks too.

Character Voice

Does each character sound distinct? Can the model stay in character across long sessions without bleeding into a generic AI tone?

Dialogue Quality

Are responses natural, in-context, and emotionally appropriate? This is the visible craft of the model.

Robustness

Does the model handle hostile players, gibberish input, and conversational stress without breaking character?

Conversation Depth

How does the model perform across long, multi-turn sessions? Does quality hold or degrade?

Technical Reliability

Does the runtime behave correctly: streaming, cancellation, schema enforcement, fallback paths?

Game Integration

Does the model dispatch tool calls, follow constraints, react to game state, and handle world events the way your game expects?

Want to see this on your own model?

The full scorecard runs as part of every VoxLore release. The corpora, scoring code, and baseline JSON files all live in the open repository alongside the model.

Read the docs

What VoxLore Genesis actually does, measured.

Output Structure

JSON Validity

Character Voice

Character Consistency

Knowledge Boundaries

Character Stability

Phrase Repetition Rate

Metadata Variance

Exact-Sentence Repetition

Voice Marker Class Coverage

Dialogue Quality

Dialogue Quality

Emotional Range

Conversational Relevance

Emotional Arcs

Secret Reveal

Robustness

Adversarial Robustness

Stress / Edge Cases

Unmarked Secret Leakage

Factual Self-Consistency

Conversation Depth

Multi-Turn Continuity

Deep Conversation

Topic Recall

Conversation Quality (Long Multi-Turn)

Conversation Depth Stability

Numeric Fidelity

Technical Reliability

JSON Repair Robustness

Stream Cancellation

Emergency Fallback Activation

Field Presence Validation

Game Integration

Animation Metadata

World Events Awareness

Constraint Following

Game State Awareness

NPC-to-NPC Coherence

SDK Constraint Following

Object Interaction

Tool Call Dispatch

Schema Extension Compliance

Transaction Dispatch

SDK Dispatch Parity

Tool Args Validity

Want to see this on your own model?