How it works
What InferOps actually does.
And what it does not.
No jargon. No overclaiming. Just an honest explanation of what InferOps does, what it finds, and where the boundaries are.
InferOps runs a continuous loop: detect waste in production traffic, verify each finding is safe to act on, confirm the saving from your own data when you fix it.
The problem
One invoice. No breakdown.
Every month your Anthropic or OpenAI invoice arrives. It shows a total. Most tools can show per-feature costs — but only if you have already tagged every AI call manually. InferOps discovers your features automatically from day one.
Most teams have some visibility into total AI spend. Almost none have visibility into spend by feature, waste by feature, or confirmed savings by feature. That gap is what InferOps fills.
84%
of AI teams say API costs are cutting into gross margins by more than 6%
3-5x
more expensive to generate output tokens than input tokens. Most teams only optimise one side.
< 1 in 3
teams can identify which AI features are generating return on their API costs
Installation
What happens when you install InferOps.
You add two lines to each application's startup code. Once per application. No feature registration. No configuration. No changes to your existing AI calls.
Within minutes
InferOps starts watching.
InferOps begins watching production traffic from every AI feature in your stack simultaneously. Two lines in your startup code. Add optional feature tags to your code for cleaner naming in your dashboard. No other configuration required.
Within 24 hours
Your dashboard populates.
Every AI feature ranked by monthly API cost. The breakdown you have never had from your provider invoice.
Ongoing
InferOps keeps watching.
If something changes unexpectedly in your costs, InferOps surfaces it within hours. Not at the end of the month when the invoice arrives. As your product evolves, new calls are captured automatically.
Detect
Waste found automatically
Verify
Zero risk or check first
Confirm
Saving measured from production
No other tool closes this loop automatically.
Detection
What InferOps finds.
InferOps analyses your production traffic and surfaces patterns where API costs are higher than they need to be. Every finding includes the specific feature, the estimated monthly saving, and what to change to capture it.
What a finding looks like
Each finding appears as a card in your dashboard. The card tells you which feature has the pattern, what InferOps detected, why it represents waste, and what the estimated saving would be if you address it.
InferOps does not tell you to make the change. It surfaces the evidence and the saving estimate. You review it, decide whether it makes sense for your product, and implement it in your own pipeline if you choose to.
Some findings are direct recommendations with high confidence. Others are patterns InferOps has flagged for your review. InferOps identifies the pattern but you know your feature better than we do.
What each finding includes
InferOps suggests. You decide.
Nothing is changed in your stack without your explicit action in your own pipeline.
Direct recommendations carry zero quality risk. Anything with a trade-off comes with the checks to run first. Investigation prompts include explicit quality checks to complete before acting.
Confirmation
How InferOps confirms a saving.
This is the part that makes InferOps different from every other cost visibility tool. When you implement a change, InferOps measures the cost saving from your production data, automatically, without any additional steps from you.
You implement the change.
In your own codebase, in your own pipeline, on your own schedule. InferOps does not do this for you.
InferOps detects it automatically.
No trigger required. InferOps observes the change in your production event stream within hours of deployment.
A 7-day measurement window opens.
InferOps measures your actual production costs for 7 days after the change. It compares them against the 14 days before the change on that specific feature.
The saving is confirmed.
The confirmed saving in your dashboard is not a projection or an estimate. It is a measurement from your own production traffic. Exact figures. Not rounded.
This confirmation mechanism is why InferOps exists. Cost visibility tools show you what you are spending. InferOps shows you what changed and confirms the cost saving from production data. During the window it also watches obvious regression proxies — stop_reason mix, response length, and parse-failure rate — and flags them. It confirms the saving, not quality: on changes that carry a trade-off, verifying quality stays your call.
Where InferOps stops
Honest about the boundaries.
We think it is important to be specific about what InferOps covers, what it does not cover, and how your data is handled. These are not small-print caveats. They are design decisions we made deliberately.
API token costs only.
InferOps covers what you pay Anthropic, OpenAI, and compatible providers per API call. It does not cover engineering salaries, cloud infrastructure, GPU costs, or tooling licences. When InferOps shows you a cost figure, it is your API token spend. Nothing else.
Your content stays local by default.
By default, your prompt and response text never leaves your environment. InferOps analyses patterns locally and transmits only structured findings.
You implement every change.
InferOps detects patterns and suggests actions. It never deploys changes to your stack, modifies your prompts, or makes API calls on your behalf. Every recommendation is implemented by you, in your own codebase, on your own timeline.
Verify before you trust.
Run inferops inspect before installing. It prints every field the SDK would transmit to your terminal without sending anything. You can see exactly what InferOps captures before a single event leaves your environment.
You are always in control.
Every finding can be declined with a reason. If a recommendation does not apply to your feature, dismiss it. If a pattern is intentional, mark it as such. Dismissed findings do not reappear unless your underlying data changes meaningfully.
Coming next
What comes after detection.
The first release establishes the detection and confirmation loop. Once that is running, the next release adds tools that help you act on what InferOps finds. Running entirely in your environment, using your credentials.
Agentic waste detection
Find the recurring waste in agent workflows.
Multi-step agents generate recurring structural waste — history inflation, tool-call loops, redundant reasoning. This is on the roadmap as the next detection layer beyond single-shot features.
Prompt optimisation
Evolve prompts to work on cheaper models.
When InferOps identifies a feature as a candidate for model right-sizing, you can trigger an optimisation run that evolves the prompt to perform equivalently on a less expensive model. The run happens on your servers, using your API key. InferOps scores the quality of the evolved prompt against your real production examples so you can decide whether to deploy.
Shadow model testing
Test a cheaper model before committing.
InferOps runs parallel calls to a candidate cheaper model alongside your production traffic. InferOps measures how closely the outputs match before you commit to any switch. You see the quality distribution from your own traffic. Not a synthetic benchmark.
Enhanced analysis
More precise recommendations when you need them.
At sign-up, you can choose to enable enhanced analysis. This allows InferOps to analyse your prompt content directly, producing more specific findings. Content is stored encrypted and deleted after 30 days. You can revoke access at any time.
Ready to see what InferOps finds in your stack?
We are onboarding teams with active Anthropic or OpenAI spend.
Join the waitlist