<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://inferoa.agentic-in.ai/blog</id>
    <title>Inferoa Blog</title>
    <updated>2026-06-08T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://inferoa.agentic-in.ai/blog"/>
    <subtitle>Inferoa Blog</subtitle>
    <icon>https://inferoa.agentic-in.ai/img/inferoa-favicon.svg</icon>
    <entry>
        <title type="html"><![CDATA[Inferoa: Inference-native Tokenmaxxing Agent Harness for Loop Engineering]]></title>
        <id>https://inferoa.agentic-in.ai/blog/announcing-inferoa</id>
        <link href="https://inferoa.agentic-in.ai/blog/announcing-inferoa"/>
        <updated>2026-06-08T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Inferoa is an Inference-native Tokenmaxxing Agent Harness for Loop Engineering: goal loops, verification, memory, prefix-cache discipline, context optimization, routing, and high-throughput model serving.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Inferoa: Inference-native Tokenmaxxing Agent Harness for Loop Engineering" src="https://inferoa.agentic-in.ai/assets/images/inferoa-banner-cbfd0562bb7aeb25cb0f9c78d549e8ef.png" width="1672" height="941" class="img_ev3q"></p>
<p>The most interesting agent work is moving from better prompts to better loops.</p>
<p><strong>Loop Engineering</strong> means giving the model a goal, feedback, verification, memory,
and tools, then letting it self-correct until the work is proven. Primitives
like <code>/goal</code>, rubric-driven outcomes, verifier sub-agents, and memory-backed
sessions matter because they move the work from "prompt the next answer" to
"design the system that keeps improving."</p>
<p>That loop is also an inference workload. As turns accumulate, prompt prefixes
drift, cache reuse collapses, stale evidence fills context, model routing gets
harder, and serving choices start to matter.</p>
<p><img decoding="async" loading="lazy" alt="Loop Engineering as a recursive system of goals, tools, feedback, memory, verification, reflection, and proof" src="https://inferoa.agentic-in.ai/assets/images/inferoa-loop-inference-workload-448a6b05eb1fd43b965bdbd8332c08b5.png" width="1672" height="941" class="img_ev3q"></p>
<p>That is where Loop Engineering has to become inference-native. A long-horizon
loop needs to see the substrate it is consuming: tokens, cache, context, routes,
endpoints, and model capacity. Tokenmaxxing is the discipline of keeping those
surfaces explicit so every horizon can reuse, compress, route, and recover
instead of sending another blind chat turn.</p>
<p>That is the gap Inferoa is built around. The name is deliberately literal:</p>
<p>Inferoa = <strong>Infer</strong>(Inference-native)<strong>o</strong>(Tokenmaxxing Loop
Engineering)<strong>a</strong>(Agent Harness).</p>
<p>Inferoa is an <strong>Inference-native Tokenmaxxing Agent Harness for Loop
Engineering</strong>. It brings the pieces a serious loop needs into one runtime:
goal/rubric feedback, verification evidence, memory and context control,
prefix-cache discipline, intelligent routing through
<a href="https://github.com/vllm-project/semantic-router" target="_blank" rel="noopener noreferrer" class="">vLLM Semantic Router</a>,
high-throughput serving with
<a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class="">vLLM Engine</a>, <a href="https://github.com/vllm-project/vllm-omni" target="_blank" rel="noopener noreferrer" class="">vLLM Omni</a>
multimodal capability, and tokenmaxxing observability across every turn.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa welcome session" src="https://inferoa.agentic-in.ai/assets/images/welcome-c6cfc1ba62eccb15647a4a5c59316e95.gif" width="1864" height="1080" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-breaks">What Breaks<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#what-breaks" class="hash-link" aria-label="Direct link to What Breaks" title="Direct link to What Breaks" translate="no">​</a></h2>
<p>Long-horizon agents are not one prompt. They are loops: plan, act, observe,
verify, remember, and decide whether to continue. If the runtime treats every
turn as generic chat traffic, it loses both sides of the optimization surface:
the feedback that drives self-correction and the inference signals that keep the
workload efficient.</p>
<p><img decoding="async" loading="lazy" alt="What breaks when loop engineering cannot see inference signals" src="https://inferoa.agentic-in.ai/assets/images/inferoa-what-breaks-2bf3415b4d15e8ec724d3188fee2f2df.png" width="1672" height="941" class="img_ev3q"></p>
<p>The failure modes are familiar:</p>
<ul>
<li class="">the goal is present, but the feedback loop is too weak to drive correction;</li>
<li class="">grading is collapsed into self-critique instead of independent evidence;</li>
<li class="">memory becomes a folder of notes rather than a reusable outer loop;</li>
<li class="">prompt shape drifts, so prefix cache cannot be reused reliably;</li>
<li class="">context selection becomes "paste more" instead of "select better";</li>
<li class="">cheap, private, or mechanical turns still take expensive model paths;</li>
<li class="">serving and cache signals arrive too late to shape the next action.</li>
</ul>
<p>These are runtime design problems, not analytics problems.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-changes">What Changes<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#what-changes" class="hash-link" aria-label="Direct link to What Changes" title="Direct link to What Changes" translate="no">​</a></h2>
<p>Inferoa makes inference behavior visible while the loop is still running. The
point is not to add another dashboard. The point is to let the runtime choose
better prompts, better context, better routes, and better recovery behavior
before the next turn is sent.</p>
<p><img decoding="async" loading="lazy" alt="What changes when inference signals become native to the agent loop" src="https://inferoa.agentic-in.ai/assets/images/inferoa-what-changes-d06084e1b0d64855136724ff9dd79d1a.png" width="1672" height="941" class="img_ev3q"></p>
<table><thead><tr><th>Surface</th><th>Substrate</th><th>What Inferoa Makes Native</th><th>Why It Matters</th></tr></thead><tbody><tr><td>Loop Engineering</td><td><a href="https://inferoa.agentic-in.ai/docs/workflows/goal-mode" target="_blank" rel="noopener noreferrer" class="">Inferoa Goal Mode</a></td><td>Recursive long-horizon goals, horizons, candidate work, reflection, and completion evidence</td><td>The engineering loop keeps running until the work is proven</td></tr><tr><td>Agent Harness</td><td><a href="https://github.com/agentic-in/inferoa" target="_blank" rel="noopener noreferrer" class="">Inferoa</a></td><td>Sessions, tools, plans, autoresearch, resources, recovery, and prefix-cache discipline</td><td>Long work gets a durable runtime while preserving reusable prompt prefixes</td></tr><tr><td>Context Optimization</td><td><a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a>, <a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a></td><td>Compression, graph-shaped repo context, bounded tool output, and evidence selection</td><td>The model sees evidence, not raw sprawl</td></tr><tr><td>Intelligent Routing</td><td><a href="https://github.com/vllm-project/semantic-router" target="_blank" rel="noopener noreferrer" class="">vLLM Semantic Router</a></td><td>Model paths respond to cost, safety, privacy, capability, and session pressure</td><td>Turns can route between self-hosted vLLM models and external frontier models</td></tr><tr><td>Model Serving</td><td><a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class="">vLLM Engine</a>, <a href="https://github.com/vllm-project/vllm-omni" target="_blank" rel="noopener noreferrer" class="">vLLM Omni</a></td><td>High-throughput, memory-efficient serving and multimodal endpoints stay visible to the harness</td><td>Self-hosted paths make cost, safety, privacy, and data sovereignty controllable when an external frontier model is unnecessary</td></tr></tbody></table>
<p>This is the core design: the agent is not merely calling an inference system;
the loop is shaped by it.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="goal-mode-loop-engineering-for-long-horizon-work">Goal Mode: Loop Engineering For Long-Horizon Work<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#goal-mode-loop-engineering-for-long-horizon-work" class="hash-link" aria-label="Direct link to Goal Mode: Loop Engineering For Long-Horizon Work" title="Direct link to Goal Mode: Loop Engineering For Long-Horizon Work" translate="no">​</a></h2>
<p>Prompt engineering improves the next answer. Loop engineering designs the
system that decides what to do after that answer. In Inferoa, <code>/goal</code> is the
entry point: it starts a recursive long-horizon loop, expands work through
horizons, preserves evidence, uses reflection as a checkpoint, and requires
proof before completion.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa goal mode" src="https://inferoa.agentic-in.ai/assets/images/goal-90ce768aee9e75af9839be25d56f671b.gif" width="1660" height="1080" class="img_ev3q"></p>
<p>Goal Mode is deliberately not just a persistent note in the prompt. It gives the
runtime a durable outcome, a visible Horizon 0 orientation, a strategy,
candidate work, step status, verifier-ready evidence, reflection decisions, and
a completion report. That is the difference between asking for the next step and
engineering the loop that can keep going.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="inferoa-at-a-glance">Inferoa At A Glance<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#inferoa-at-a-glance" class="hash-link" aria-label="Direct link to Inferoa At A Glance" title="Direct link to Inferoa At A Glance" translate="no">​</a></h2>
<p>The product surface is terminal-first, but it is not just a shell. Each mode
exposes a different part of the loop while the agent works.</p>
<p>Run <code>/goal</code> to start a long-horizon recursive goal. The agent can decompose
work, update steps, attach evidence, reflect between horizons, and avoid
mistaking an empty checklist for a finished outcome.</p>
<p>Plan mode turns ambiguous scope into an inspectable decision. A plan can stay in
drafting, move to approval, or become executable context without blocking the
runtime on process overhead.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa plan mode" src="https://inferoa.agentic-in.ai/assets/images/plan-7ffc3bc78df51fb34e073c06ff4fec05.gif" width="1884" height="1080" class="img_ev3q"></p>
<p>Autoresearch mode makes the evaluation loop native: define the experiment, run
the harness, record failures, patch the implementation, and keep the metric
trail in the same session.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa autoresearch iteration" src="https://inferoa.agentic-in.ai/assets/images/research-a235db555c704536d347c73b71fe19e4.gif" width="1864" height="1080" class="img_ev3q"></p>
<p>Tokenmaxxing is the savings ledger for prefix-cache reuse, context optimization,
<a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> tool-output savings, recent turn usage, and
model-selection pressure. It shows whether the loop is actually becoming more
efficient, not just how many tokens were spent.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa tokenmaxxing report" src="https://inferoa.agentic-in.ai/assets/images/tokenmaxxing-61155fb440155901f7e98a9294e9f329.png" width="3840" height="2100" class="img_ev3q"></p>
<p>The command surface stays small: <code>/goal</code> for durable objectives, <code>/plan</code> for
inspectable scope, <code>/autoresearch</code> for metric-driven iteration, and
<code>/tokenmaxxing</code> for the savings ledger across prefix cache,
<a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a> and
<a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> context savings, recent turn usage, and
model-selection cost pressure.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="proof-of-value">Proof Of Value<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#proof-of-value" class="hash-link" aria-label="Direct link to Proof Of Value" title="Direct link to Proof Of Value" translate="no">​</a></h2>
<p>The value story is not one benchmark score. It is whether the tokenmaxxing path
stays stable, measurable, and cheaper as the horizon grows. The public eval is
split into measured stress runs and calibrated projections: measured runs check
runtime invariants and continuity; projections ask what happens if the measured
shape is carried to 1k-10k loops.</p>
<p>Key results:</p>
<ul>
<li class=""><strong>Prefix cache and continuity</strong>: measured profiles kept <strong>one prompt epoch,
one tool schema hash, and one cache salt</strong> while cache reuse improved after
warmup. A <strong>256-turn compression regression</strong> preserved continuity markers and
archive pointers, and 1k-10k projections were calibrated from measured tail
slope instead of claimed as live 10k-request runs.</li>
<li class=""><strong>CodeGraph context reduction</strong>:
<a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a>-style
symbol/range selection saved <strong>80.8%</strong> of inspected context.</li>
<li class=""><strong>RTK tool-output reduction</strong>: <a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> command
records saved <strong>61.4%</strong> of command-token footprint.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Inferoa tokenmaxxing surfaces" src="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI5ODAiIGhlaWdodD0iMzgwIiB2aWV3Qm94PSIwIDAgOTgwIDM4MCIgcm9sZT0iaW1nIiBhcmlhLWxhYmVsbGVkYnk9InRpdGxlIGRlc2MiPgogIDx0aXRsZSBpZD0idGl0bGUiPlRva2VubWF4eGluZyByZWR1Y2VzIHRva2VuIHByZXNzdXJlPC90aXRsZT4KICA8ZGVzYyBpZD0iZGVzYyI+SG9yaXpvbnRhbCBiYXIgY2hhcnQgc2hvd2luZyBwcmVmaXggY2FjaGUsIENvZGVHcmFwaCwgYW5kIFJUSyB0b2tlbiBvciBjb250ZXh0IHJlZHVjdGlvbnMuPC9kZXNjPgogIDxyZWN0IHdpZHRoPSI5ODAiIGhlaWdodD0iMzgwIiBmaWxsPSIjZmZmZmZmIi8+CiAgPHRleHQgeD0iMjgiIHk9IjM4IiBmb250LWZhbWlseT0iRGVqYVZ1IFNhbnMsIEFyaWFsLCBzYW5zLXNlcmlmIiBmb250LXNpemU9IjIxIiBmb250LXdlaWdodD0iNzAwIiBmaWxsPSIjMjYzMjM4Ij5Ub2tlbm1heHhpbmcgcmVkdWNlcyB0b2tlbiBwcmVzc3VyZTwvdGV4dD4KICA8ZyBmb250LWZhbWlseT0iRGVqYVZ1IFNhbnMsIEFyaWFsLCBzYW5zLXNlcmlmIj48bGluZSB4MT0iMjg2IiB5MT0iNzgiIHgyPSIyODYiIHkyPSIzMDgiIHN0cm9rZT0iI2U4ZWVmMiIvPjx0ZXh0IHg9IjI4NiIgeT0iMzMwIiB0ZXh0LWFuY2hvcj0ibWlkZGxlIiBmb250LXNpemU9IjEyIiBmaWxsPSIjNjA3ZDhiIj4wJTwvdGV4dD4KPGxpbmUgeDE9IjQzOSIgeTE9Ijc4IiB4Mj0iNDM5IiB5Mj0iMzA4IiBzdHJva2U9IiNlOGVlZjIiLz48dGV4dCB4PSI0MzkiIHk9IjMzMCIgdGV4dC1hbmNob3I9Im1pZGRsZSIgZm9udC1zaXplPSIxMiIgZmlsbD0iIzYwN2Q4YiI+MjUlPC90ZXh0Pgo8bGluZSB4MT0iNTkyIiB5MT0iNzgiIHgyPSI1OTIiIHkyPSIzMDgiIHN0cm9rZT0iI2U4ZWVmMiIvPjx0ZXh0IHg9IjU5MiIgeT0iMzMwIiB0ZXh0LWFuY2hvcj0ibWlkZGxlIiBmb250LXNpemU9IjEyIiBmaWxsPSIjNjA3ZDhiIj41MCU8L3RleHQ+CjxsaW5lIHgxPSI3NDUiIHkxPSI3OCIgeDI9Ijc0NSIgeTI9IjMwOCIgc3Ryb2tlPSIjZThlZWYyIi8+PHRleHQgeD0iNzQ1IiB5PSIzMzAiIHRleHQtYW5jaG9yPSJtaWRkbGUiIGZvbnQtc2l6ZT0iMTIiIGZpbGw9IiM2MDdkOGIiPjc1JTwvdGV4dD4KPGxpbmUgeDE9Ijg5OCIgeTE9Ijc4IiB4Mj0iODk4IiB5Mj0iMzA4IiBzdHJva2U9IiNlOGVlZjIiLz48dGV4dCB4PSI4OTgiIHk9IjMzMCIgdGV4dC1hbmNob3I9Im1pZGRsZSIgZm9udC1zaXplPSIxMiIgZmlsbD0iIzYwN2Q4YiI+MTAwJTwvdGV4dD48L2c+CiAgPGxpbmUgeDE9IjI4NiIgeTE9IjMwOCIgeDI9Ijg5OCIgeTI9IjMwOCIgc3Ryb2tlPSIjYjBiZWM1Ii8+CiAgPGcgZm9udC1mYW1pbHk9IkRlamFWdSBTYW5zLCBBcmlhbCwgc2Fucy1zZXJpZiI+PHRleHQgeD0iMjY4IiB5PSIxMjEuMyIgdGV4dC1hbmNob3I9ImVuZCIgZm9udC1zaXplPSIxNCIgZmlsbD0iIzI2MzIzOCI+UHJlZml4IGNhY2hlIGNhY2hlZC10b2tlbiBkaXNjb3VudDwvdGV4dD4KPHJlY3QgeD0iMjg2IiB5PSI5Ny4zIiB3aWR0aD0iNTUwLjgiIGhlaWdodD0iMzguMCIgcng9IjQiIGZpbGw9IiMxOTc2ZDIiLz4KPHRleHQgeD0iODQ2LjgiIHk9IjEyMS4zIiBmb250LXNpemU9IjE1IiBmb250LXdlaWdodD0iNzAwIiBmaWxsPSIjMjYzMjM4Ij45MC4wJTwvdGV4dD4KPHRleHQgeD0iMjY4IiB5PSIxOTguMCIgdGV4dC1hbmNob3I9ImVuZCIgZm9udC1zaXplPSIxNCIgZmlsbD0iIzI2MzIzOCI+Q29kZUdyYXBoIGNvbnRleHQgcmVkdWNlZDwvdGV4dD4KPHJlY3QgeD0iMjg2IiB5PSIxNzQuMCIgd2lkdGg9IjQ5NC43IiBoZWlnaHQ9IjM4LjAiIHJ4PSI0IiBmaWxsPSIjMDA3OTZiIi8+Cjx0ZXh0IHg9Ijc5MC43IiB5PSIxOTguMCIgZm9udC1zaXplPSIxNSIgZm9udC13ZWlnaHQ9IjcwMCIgZmlsbD0iIzI2MzIzOCI+ODAuOCU8L3RleHQ+Cjx0ZXh0IHg9IjI2OCIgeT0iMjc0LjciIHRleHQtYW5jaG9yPSJlbmQiIGZvbnQtc2l6ZT0iMTQiIGZpbGw9IiMyNjMyMzgiPlJUSyB0b29sIG91dHB1dCByZWR1Y2VkPC90ZXh0Pgo8cmVjdCB4PSIyODYiIHk9IjI1MC43IiB3aWR0aD0iMzc1LjgiIGhlaWdodD0iMzguMCIgcng9IjQiIGZpbGw9IiM4ZTI0YWEiLz4KPHRleHQgeD0iNjcxLjgiIHk9IjI3NC43IiBmb250LXNpemU9IjE1IiBmb250LXdlaWdodD0iNzAwIiBmaWxsPSIjMjYzMjM4Ij42MS40JTwvdGV4dD48L2c+CiAgPHRleHQgeD0iMjg2IiB5PSIzNTgiIGZvbnQtZmFtaWx5PSJEZWphVnUgU2FucywgQXJpYWwsIHNhbnMtc2VyaWYiIGZvbnQtc2l6ZT0iMTMiIGZpbGw9IiM2MDdkOGIiPlNvdXJjZXM6IHByZWZpeC1jYWNoZSBjb3N0IG1vZGVsLCBDb2RlR3JhcGggcHJvamVjdGlvbiwgYW5kIFJUSyByZWNvcmRzLjwvdGV4dD4KPC9zdmc+Cg==" width="980" height="380" class="img_ev3q"></p>
<ul>
<li class=""><strong>Routing economics</strong>: the
<a href="https://routeworks.github.io/?p=/leaderboard" target="_blank" rel="noopener noreferrer" class="">Routeworks leaderboard</a> makes the
inference-cost tradeoff visible on a log scale. At similar accuracy, routed
paths can sit at <strong>1/10</strong> or even <strong>1/100</strong> of a frontier-heavy route's cost.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Routeworks routing leaderboard" src="https://inferoa.agentic-in.ai/assets/images/routeworks-routing-leaderboard-3f39d4d783a3b77e17e1adf0eee0d63a.png" width="2112" height="1298" class="img_ev3q"></p>
<p>The exact numbers will move with workload, model pricing, and local RTK command
corpus. The direction is the important part: long-horizon loops need a runtime
that protects stability, preserves continuity through compression, and uses
every inference surface available.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="built-with-the-inference-stack">Built With The Inference Stack<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#built-with-the-inference-stack" class="hash-link" aria-label="Direct link to Built With The Inference Stack" title="Direct link to Built With The Inference Stack" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Inferoa built with the inference stack" src="https://inferoa.agentic-in.ai/assets/images/inferoa-stack-e103c330b4c3be4729f894c68af109ca.png" width="1672" height="941" class="img_ev3q"></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="vllm-ecosystem">vLLM Ecosystem<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#vllm-ecosystem" class="hash-link" aria-label="Direct link to vLLM Ecosystem" title="Direct link to vLLM Ecosystem" translate="no">​</a></h3>
<p>Inferoa starts with the vLLM ecosystem because vLLM exposes the right surfaces:
serving behavior, routing, multimodal paths, endpoint signals, and prefix-cache
economics.</p>
<ul>
<li class=""><a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class=""><strong>vLLM Engine</strong></a> provides
high-performance OpenAI-compatible inference and the prefix-cache behavior
Inferoa protects across long sessions.</li>
<li class=""><a href="https://github.com/vllm-project/semantic-router" target="_blank" rel="noopener noreferrer" class=""><strong>vLLM Semantic Router</strong></a>
brings model routing into the agent loop so routes can respond to cost,
safety, privacy, capability, and session pressure.</li>
<li class=""><a href="https://github.com/vllm-project/vllm-omni" target="_blank" rel="noopener noreferrer" class=""><strong>vLLM Omni</strong></a> brings image,
video, and audio understanding or generation into the same durable agent
contract.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="context-optimization">Context Optimization<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#context-optimization" class="hash-link" aria-label="Direct link to Context Optimization" title="Direct link to Context Optimization" translate="no">​</a></h3>
<p>Inferoa also uses the context optimization projects that make long-horizon loops
practical:</p>
<ul>
<li class=""><a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class=""><strong>CodeGraph</strong></a>
turns repository context into graph-shaped symbol and range evidence.</li>
<li class=""><a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class=""><strong>RTK</strong></a> rewrites command-heavy tool output
into compact records that preserve evidence while reducing token pressure.</li>
</ul>
<p>Inferoa is the harness layer above that stack: the place where long-horizon
agent behavior and inference behavior meet.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="try-it">Try It<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#try-it" class="hash-link" aria-label="Direct link to Try It" title="Direct link to Try It" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6dde8;--prism-background-color:#101419"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#d6dde8;background-color:#101419"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#d6dde8"><span class="token plain">npm install -g inferoa@dev</span><br></div><div class="token-line" style="color:#d6dde8"><span class="token plain">inferoa setup</span><br></div><div class="token-line" style="color:#d6dde8"><span class="token plain">inferoa</span><br></div></code></pre></div></div>
<p>The larger goal is simple: agents should not waste the inference stack they are
already paying for. Inferoa makes those signals native to the loop.</p>]]></content>
        <category label="inferoa" term="inferoa"/>
        <category label="tokenmaxxing" term="tokenmaxxing"/>
        <category label="agents" term="agents"/>
        <category label="inference" term="inference"/>
        <category label="vllm" term="vllm"/>
    </entry>
</feed>