Skip to main content

Tokenmaxxing

Tokenmaxxing is Inferoa's discipline for spending inference tokens where they change the outcome. It is not only compression. It combines prompt stability, context selection, routing, endpoint evidence, and verification.

Surfaces

SurfaceWhat Inferoa TracksWhy It Matters
Prompt prefixPrompt epochs, section hashes, tool schema hashAvoid invalidating reusable prefixes
ContextThresholds, protected recent loops, summariesKeep the next turn focused
ToolsDeterministic schemas and bounded outputsReduce schema churn and output bloat
EndpointProvider, model, usage, request ids, cache fieldsMake inference behavior inspectable
ArtifactsManaged resources for generated media and evidenceAvoid pasting large payloads into prompts

Reading Token Pressure

Open the tokenmaxxing view from the TUI:

/tokenmaxxing

The view reports recent token usage, cache evidence when the endpoint exposes it, RTK savings, context pressure, and model-selection pressure. Cache fields are shown only when the provider returns enough usage detail to make them meaningful. The view is also reachable through friendly aliases: /cache, /rtk, /activity, /evidence, and /history. See Slash commands for the full registry.

Interpreting The View

The tokenmaxxing view groups signals into four areas:

AreaWhat It ShowsWhat To Watch For
Token usageRecent prompt and completion tokens per turnSudden spikes may mean context is not being compressed or a large file was read without bounding
Cache evidenceCached prompt tokens when the endpoint reports themA low cache ratio across turns suggests the prompt epoch is changing too often
RTK savingsTokens saved by RTK context optimizationZero savings may mean RTK is disabled or the workspace has not been indexed
Model selectionWhich model handled recent turnsUnexpected model switches may indicate routing pressure or endpoint fallback

Practical Examples

Stable prefix, good cache reuse:

prompt_tokens: 12480 cached: 11200 completion: 340
prompt_tokens: 12520 cached: 11200 completion: 280
prompt_tokens: 12610 cached: 11200 completion: 410

The cached token count stays constant while prompt tokens grow slowly — the mutable section is absorbing task progress without disturbing the prefix.

Prefix invalidation between turns:

prompt_tokens: 12480 cached: 11200 completion: 340
prompt_tokens: 12520 cached: 0 completion: 280
prompt_tokens: 12610 cached: 0 completion: 410

Cache drops to zero after the first turn. This usually means the tool schema or a system prompt section changed between turns. Check whether tools were added or removed mid-session.

Relationship To Other Concepts

Tokenmaxxing depends on Prefix cache discipline to keep the stable prefix reusable, and on Context optimization to reduce the mutable section. Together, these three disciplines control how much inference work each turn actually costs.