Stop debugging your agents.

With Kalibr, your agents learn from past executions and route around failing paths autonomously.

Kalibr captures step-level telemetry on your agentic systems and exposes routing recommendations your agents query before each run.

Start free View on GitHub

$ pip install kalibr

How Kalibr fits into your agent

Wrap your LLM and tool calls with the Kalibr SDK. Every step records latency, cost, and success or failure.

Before the next step, your agent queries Kalibr with a goal (for example: goal="book_meeting"). Kalibr returns which models and tools are currently succeeding. Your agent routes to the optimal path.

That's it.

What Kalibr Does

Trace

Auto-instrument every LLM call. Latency, tokens, cost, success/failure — logged automatically with zero code changes.

OpenAI, Anthropic, Google LangChain, CrewAI, Agents SDK Cross-vendor execution graphs Step-level cost attribution

Learn

Kalibr aggregates outcomes by model and task type. It knows which models succeed at "book_meeting" vs "write_code" — with statistical confidence.

Wilson score confidence intervals Pareto-optimal model selection Cost-quality tradeoff analysis Failure pattern detection

Recommend

Your agent calls get_policy("book_meeting") before acting. Kalibr returns the model with the best success rate for that goal.

Runtime API for agents Goal-based model routing Confidence scores and risk Outcome feedback loop

Observability data transformed into runtime optimization signals

Workflow: content-generation

Active

Research Agent Parent

Model Claude Sonnet

Duration 2.4s

Cost $0.12

Tokens 8.2K

web_search

0.8s $0.003 Success

Draft Agent Child

Model GPT-4o-mini

Duration 1.2s

Cost $0.018

Tokens 3.1K

Polish Agent Child

Model Claude Sonnet

Duration 1.8s

Cost $0.09

Tokens 5.4K

Total Duration

6.2s

Total Cost

$0.231

Total Tokens

16.7K

Cost by Agent

Research Agent $58.20

41% of total

Polish Agent $42.15

30% of total

Draft Agent $28.40

20% of total

Quality Check $13.60

9% of total

Total (24h) $142.35

Intelligence: Optimization Recommendations

High Impact $34/day savings

Switch Draft Model

GPT-4o-mini → Claude Haiku maintains 94% quality at 62% lower cost

Current

$0.018/exec

Recommended

$0.007/exec

Medium Impact $12/day savings

Optimize Agent Order

Start with cheaper models. 73% of tasks complete without expensive polish step

Current

4 steps

Optimized

2.7 avg

Low Latency -1.2s avg

Cache Search Results

89% of searches repeat within 2 hours. Cache results to reduce latency

Current

0.8s

With Cache

0.02s

Monday

Session #1247

Content Agent

Used GPT-4o for all drafts

Cost $0.18

Quality 87%

Tuesday

Session #1312

Content Agent

Dashboard showed Haiku pattern → switched

Cost $0.07

Quality 89%

Today

Session #1385

Content Agent

Optimized routing from patterns

Cost $0.05

Quality 91%

Patterns

Pattern Detection

Learn from every workflow

Production patterns automatically influence future routing and execution decisions.

Pattern detection across thousands of runs

Model performance insights by task type

Actionable optimization recommendations

Getting Started

Simple Integration

Connect Kalibr to your agent systems. Every workflow is automatically tracked - no refactoring required.

Complete Visibility

See entire agent executions with all models, tools, costs, and decisions - not just isolated API calls.

Query Runtime API

Agents query Kalibr's runtime API before executing steps to decide which models, tools, or paths to use.

Continuous Improvement

Use production patterns to identify cost savings and performance improvements.

Built for Multi-Agent Systems

Workflow-Level Tracking

See complete agent workflows with all models, tools, and decisions - not isolated API calls.

Cross-Model Visibility

One workflow uses OpenAI, Anthropic, and Google? We track it all in a single unified view.

Multi-Agent Coordination

Agents see what other agents did and learn from their decisions - without manual handoff code.

Pattern Analysis

Detect which model combinations work best, which routing strategies succeed, where failures occur.

Cost Attribution

Know what complete workflows cost by agent and business outcome - not just API spend.

Production Optimization

Execution data influences future routing: which models to use, where to cut costs, how to improve performance.

What changes when you use Kalibr

Agents fail less in production

Model drift stops breaking workflows

Cost spikes become visible before they matter

You stop retuning prompts and routing rules manually

Pricing

Start free. Scale as you grow.

Free

$0/month

10K traces · 1M tokens

Full tracing
Cost tracking
Basic analytics

Get started

Popular

Pro

$99/month

100K traces · 10M tokens

Everything in Free
Runtime optimization API
Advanced analytics
Priority support

Start Pro

Enterprise

Custom

Unlimited

Everything in Pro
SSO
SLA
On-prem available

Stop debugging your agents.

How Kalibr fits into your agent

What Kalibr Does

Trace

Learn

Recommend

Not just observability. Decision support for agents.

Workflow: content-generation

Cost by Agent

Switch Draft Model

Optimize Agent Order

Cache Search Results

Learn from every workflow

Getting Started

Simple Integration

Complete Visibility

Query Runtime API

Continuous Improvement

Built for Multi-Agent Systems

Workflow-Level Tracking

Cross-Model Visibility

Multi-Agent Coordination

Pattern Analysis

Cost Attribution

Production Optimization

What changes when you use Kalibr

Pricing

Free

Pro

Enterprise

Need more to evaluate?

Let your agents avoid production failures.