Scorecard
What VoxLore Genesis actually does, measured.
Every model version is run through a scorecard of 36 dimensions covering output structure, character voice, conversation depth, robustness, and game integration. Click any dimension to see exactly what we test, why it matters for your game, and how the model has improved across versions.
Dimensions
36
six categories
Current model
v50
updated continuously
Avg score
90.4
across 29 scored dims
Test runs
2,200+
prompts per scorecard
Version:
Category:
Output Structure
Does the model produce valid, machine-readable JSON every time? This is the foundation - if structure breaks, everything downstream breaks too.
Character Voice
Does each character sound distinct? Can the model stay in character across long sessions without bleeding into a generic AI tone?
Dialogue Quality
Are responses natural, in-context, and emotionally appropriate? This is the visible craft of the model.
Robustness
Does the model handle hostile players, gibberish input, and conversational stress without breaking character?
Conversation Depth
How does the model perform across long, multi-turn sessions? Does quality hold or degrade?
Technical Reliability
Does the runtime behave correctly: streaming, cancellation, schema enforcement, fallback paths?
Game Integration
Does the model dispatch tool calls, follow constraints, react to game state, and handle world events the way your game expects?
Want to see this on your own model?
The full scorecard runs as part of every VoxLore release. The corpora, scoring code, and baseline JSON files all live in the open repository alongside the model.