Factory Router
By Factory - June 1, 2026 - 4 minute read -
Share
Product
Research
New
Frontier performance at lower cost, custom done for you. In private research preview today.
By Factory - June 1, 2026 - 4 minute read -
Share
Product
Research
New
Frontier performance at lower cost, custom done for you. In private research preview today.
Engineers track leaderboards and spend critical hours evaluating incremental performance gains between models. The most performant models are being used for simple queries, and cost-conscious engineers aren't effectively able to adjust when they need performance. Each model differs in performance, cost, reasoning capabilities, and latency, so each should be used where it shines.
Today, we're announcing Factory Router.
Factory Router cuts token spend by 20-25% while maintaining frontier performance. To enable enterprises to achieve the best quality at the lower cost with highest reliability, it automatically selects the right model for each task, and routes across providers if an endpoint degrades.
On our enterprise-grade engineering benchmarks:
At enterprise scale, those savings compound across every Droid session your engineers run. By tying spend to the work actually being done, lower per-session costs make continuous autonomous software engineering easier to scale across the organization.
Enterprise AI costs are rising, and usage alone does not prove value. A higher token bill does not mean more engineering work is getting done. Often, it means simple work is running through the most expensive models unnecessarily.
Engineers are defaulting to the most performant models for fear of losing on performance. Simple questions, mechanical refactors, documentation updates, small bug fixes, search-heavy investigations, and other routine work end up on the same premium path as work that truly needs frontier performance.
As a result, organizations are seeing rapidly exhausted AI budgets without clear increases in organization-level output.
Instead of expecting every engineer to always choose the best model manually, Factory Router automatically picks the best model for each Droid session.
Automatic model selection. Factory Router chooses the optimal model for each task, drawing from a diverse pool of frontier and efficient models. If the selected model struggles to complete the task, Factory Router moves the session to a more capable model to reliably ensure high-quality outcomes.
Lower-cost with frontier execution. Efficient models handle work that does not need frontier capability, while frontier models remain available for work that does.
Droid sessions keep working when providers degrade, rate limits hit, or capacity gets constrained. Factory Router provides 99.9%+ request reliability by routing across models, providers, and capacity sources, more than any single-provider platform can.
Work keeps moving through provider issues, capacity limits, and model availability changes.
Every organization has its own shape of work, and the "best" model for a task varies by context. Your team knows which workflows are routine, which codepaths require deeper reasoning, and which model choices fit your cost and performance goals. In Factory Router, you can give Admin routing guidance so that automatic model selection reflects how work actually happens inside your organization.
Enterprise teams also need standard control over model availability. The same policy surfaces that govern other Factory models apply to Factory Router. Admins can manage access, compliance, and automatic-routing eligibility without creating a separate control plane.
Factory Router is in private research preview in the Factory CLI and Desktop App. Once enabled for your org, it appears in the model picker for every user with no setup required. Mission workers can use it too, which means long-running autonomous work benefits from the same automatic model selection and savings as interactive and headless sessions.
With Factory Router enabled, your Software Factory routes each Droid session to the model that fits the work, with provider routing for reliability and Enterprise Controls for governance. Enable Factory Router to make best quality and lowest cost the operating default across your organization.
If you'd like Factory Router enabled for your organization, reach out to our team.
Factory Router only routes a session to a cheaper model when that model can handle the work. The savings come from sessions that did not need a frontier model, and every session that does keeps it. The Pareto frontier traces how far that holds: the boundary of the cost/performance trade, or the best performance available at each level of cost. To map it, we ran Factory Router across the full range, from keeping every session on a frontier model to shifting as much work as possible to cheaper models, and plotted each result as pass rate against full-session cost, relative to a Claude Opus 4.7 baseline.
As cost comes down, performance doesn't fall evenly. Near the top the curve is nearly flat: cost drops sharply while performance barely moves, because the first work to leave the frontier model is the work cheaper models handle just as well. Lower down, the curve bends, and what's left is the work that genuinely needed the frontier model.
Factory Router operates on that flat stretch, just before the bend. As it ships today, cost falls 20-25% while pass rate holds at 99% of Opus on Terminal-Bench 2 and 96% on Legacy-Bench. Push past the bend and every further dollar saved costs far more performance, because cheaper models start taking on work they can't finish. The most aggressive routing we measured cut Terminal-Bench 2 to 56% of Opus cost but dropped pass rate to 81%; on Legacy-Bench, 30% of Opus cost came at a pass rate of just 49%.
Any fixed model is one point on this curve, either too expensive for easy work or too weak for hard work. Staying on the frontier means choosing per session.
The savings also hold when cost is charged only against tasks that finished: cost per successful run is 80.5% of Opus on Terminal-Bench 2 and 78.0% on Legacy-Bench. A router that saved money by abandoning hard sessions early would look worse by that measure, since it would still pay for attempts that never completed.
Terminal-Bench 2 averages all 89 tasks and Legacy-Bench its full suite, both across multiple runs, reported relative to Claude Opus 4.7 with cost measured as full-session cost.
start building
Start building