Skip to main content

Inferoa

Inferoa is an Inference-native Tokenmaxxing Agent Harness for Loop Engineering. It is built for recursive long-horizon goals in coding and research work across the vLLM ecosystem.

That is what inference-native means here: Inferoa starts from the inference stack and co-designs loop engineering around tokenmaxxing: prefix-cache discipline, context optimization, intelligent routing through vLLM Semantic Router, high-throughput vLLM serving, vLLM Omni multimodal capability, and RTK/CodeGraph-backed context selection.

Most agents treat inference as a black-box chat API. Inferoa starts from the opposite direction: the agent loop is designed around the optimization surfaces that modern inference systems expose. Long sessions, prefix-cache discipline, context pressure, model routing, self-hosted serving signals, multimodal artifacts, and verification belong to one durable harness.

What Inferoa Coordinates

  • Goal-driven loop engineering keeps recursive long-horizon goals, horizons, completion reflection, and evidence attached to a durable session.
  • Plan and autoresearch modes support approved scope and repeated measurement inside the same long-running loop.
  • Prefix-cache discipline keeps the stable parts of the prompt stable, so reusable prefixes are not invalidated by avoidable churn.
  • Context optimization selects the evidence needed for the next turn using summaries, code intelligence, bounded tool output, and RTK.
  • Intelligent routing can choose model paths by cost, safety, privacy, capability, and current session pressure.
  • Self-hosted serving uses vLLM Engine-compatible endpoints as first-class inference surfaces instead of opaque chat backends.
  • Multimodal execution routes image, video, audio, and speech work through vLLM-Omni-compatible endpoints and stores produced media as managed artifacts.

Why Coding First

Coding is a high-pressure long-horizon task: large repositories, tool failures, context limits, repeated model calls, and proof through tests all appear in the same workflow. That makes it a strong first domain for co-designing agent behavior with inference behavior.

Documentation Map

Current Implementation

Inferoa is a TypeScript and Node.js terminal application. It stores local state under ~/.inferoa/ by default and keeps raw endpoint secrets in the local vault instead of plain configuration files. Node.js 24 or newer is required; the npm package is published as inferoa from the agentic-in organization.