Get Kalibr routing your model calls in 5 minutes. Four steps.
Using an agent? Pick your setup path:
OpenClaw / Hermes (zero human steps):
pip install kalibr kalibr prompt --openclaw --email you@example.com
Paste the output into your agent. It creates your account, installs dependencies, configures the plugin, and verifies — zero human steps after that.
Claude Code / Cursor / Windsurf:
Read https://kalibr.systems/llms.txt and integrate Kalibr into this project.
Python:
pip install kalibr
TypeScript:
npm install @kalibr/sdk
You also need provider SDKs for the models you want to route between:
# Install whichever providers you'll use pip install openai # for gpt-4o, o1, etc. pip install anthropic # for claude-sonnet, etc.
kalibr auth # Opens your browser. Sign in or create an account, enter the code shown in your terminal. # KALIBR_API_KEY and KALIBR_TENANT_ID saved to .env automatically.
Get your Kalibr credentials from dashboard.kalibr.systems/settings, then set them alongside your provider keys:
export KALIBR_API_KEY=sk_... # from your Kalibr dashboard export KALIBR_TENANT_ID=your-tenant # from your Kalibr dashboard export OPENAI_API_KEY=sk-... # if using OpenAI models export ANTHROPIC_API_KEY=sk-ant-... # if using Anthropic models export DEEPSEEK_API_KEY=sk-... # if using DeepSeek models (deepseek-chat, deepseek-reasoner) export HF_API_TOKEN=hf_... # if using HuggingFace models (private or rate-limit bypass)
You need API keys for each provider in your paths. Using gpt-4o? You need OPENAI_API_KEY. Using claude-sonnet-4-20250514? You need ANTHROPIC_API_KEY.
Before, hardcoded to one model (Python):
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Extract the company: Hi from Stripe"}]
)
print(response.choices[0].message.content)After, Kalibr picks the best model and learns from outcomes (Python):
from kalibr import Router
router = Router(
goal="extract_company",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: len(output) > 0
)
response = router.completion(
messages=[{"role": "user", "content": "Extract the company: Hi from Stripe"}]
)
print(response.choices[0].message.content)
# That's it. Kalibr picked the model, made the call, and reported the outcome.Before, hardcoded to one model (TypeScript):
import OpenAI from 'openai';
const client = new OpenAI();
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Extract the company: Hi from Stripe' }],
});After, Kalibr picks the best model and learns from outcomes (TypeScript):
import { Router } from '@kalibr/sdk';
const router = new Router({
goal: 'extract_company',
paths: ['gpt-4o', 'claude-sonnet-4-20250514'],
successWhen: (output) => output.length > 0,
});
const response = await router.completion([
{ role: 'user', content: 'Extract the company: Hi from Stripe' }
]);
console.log(response.choices[0].message.content);
// That's it. Kalibr picked the model, made the call, and reported the outcome.What changed: you swapped 3 lines. Router instead of OpenAI, router.completion() instead of client.chat.completions.create(). The response object is the same, response.choices[0].message.content works exactly like before.
success_when tells Kalibr how to auto-evaluate each response. For simple checks (non-empty, contains "@", valid JSON), this is all you need. For complex validation, skip success_when and call router.report() manually, see step 4.
Want finer-grained quality signals? Add score_when for continuous scoring (0.0-1.0). This lets Kalibr distinguish between "barely passed" and "excellent", routing toward higher quality, not just success:
router = Router(
goal="extract_company",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: len(output) > 0,
score_when=lambda output: min(1.0, len(output) / 500), # quality score 0-1
)success_when and score_when, Kalibr still auto-scores every completion using built-in heuristics (response length, structure, finish reason). You get routing intelligence from day one with zero evaluation code. Add success_when or score_when when you want custom quality signals.Go to dashboard.kalibr.systems. You should see:
extract_company) registeredAfter 20-50 outcomes per path, routing stabilizes and Kalibr will favor the model that works best for your goal. Early on, it explores both paths to gather data.
That's it. You're routing.
Every call now goes through Kalibr. When a provider degrades, Kalibr reroutes to the one that's working, before your users notice.
If your success criteria is too complex for a lambda (needs API calls, multi-step checks, human review), skip success_when and call report() yourself:
Python:
router = Router(
goal="book_meeting",
paths=["gpt-4o", "claude-sonnet-4-20250514"]
# no success_when, we'll report manually
)
response = router.completion(messages=[...])
result = response.choices[0].message.content
# ... your validation logic ...
meeting_booked = check_calendar(result)
if meeting_booked:
router.report(success=True, score=0.9)
else:
router.report(success=False, score=0.1, reason="meeting not found in calendar")TypeScript:
const router = new Router({
goal: 'book_meeting',
paths: ['gpt-4o', 'claude-sonnet-4-20250514']
// no successWhen, we'll report manually
});
const response = await router.completion(messages);
const result = response.choices[0].message.content;
// ... your validation logic ...
const meetingBooked = await checkCalendar(result);
if (meetingBooked) {
await router.report(true);
} else {
await router.report(false, 'meeting not found in calendar');
}Kalibr integrates with LangChain, CrewAI, and OpenAI Agents SDK. See Framework Integrations for setup instructions.
Or if you just want tracing without routing, add one line to the top of your entry point:
import kalibr # must be the first import, auto-patches OpenAI, Anthropic, Google SDKs
All LLM calls are now traced automatically to your dashboard. No other code changes needed.
completion(), Kalibr picked a model (exploring early on, exploiting the best one later)report() (or via success_when / score_when), Kalibr recorded the outcomeRouter instances are not thread-safe. Create one Router instance per request context, not one shared instance per process.
In async Python applications, create the Router inside your async handler, not at module level:
# Do this
async def handle_request(messages):
router = Router(goal="my_goal", paths=["gpt-4o-mini", "claude-haiku"])
return await router.completion(messages)
# Not this (shared instance, not safe in concurrent use)
router = Router(goal="my_goal", paths=["gpt-4o-mini"]) # module level
async def handle_request(messages):
return await router.completion(messages)Router creation is cheap. The path registration call is async and non-blocking.
pip install anthropicsuccess_when or call report()KALIBR_API_KEY and KALIBR_TENANT_ID are required