Get Kalibr routing your LLM calls in 5 minutes. Four steps.
Python:
pip install kalibr
TypeScript:
npm install @kalibr/sdk
You also need the provider SDKs for whichever models you want to route between:
# Install whichever providers you'll use pip install openai # for gpt-4o, o1, etc. pip install anthropic # for claude-sonnet, etc.
kalibr auth # Opens your browser. Sign in or create an account, enter the code shown in your terminal. # KALIBR_API_KEY and KALIBR_TENANT_ID saved to .env automatically.
Get your Kalibr credentials from dashboard.kalibr.systems/settings, then set them alongside your provider keys:
export KALIBR_API_KEY=sk_... # from your Kalibr dashboard export KALIBR_TENANT_ID=your-tenant # from your Kalibr dashboard export OPENAI_API_KEY=sk-... # if using OpenAI models export ANTHROPIC_API_KEY=sk-ant-... # if using Anthropic models export DEEPSEEK_API_KEY=sk-... # if using DeepSeek models
You need API keys for each provider in your paths. Using gpt-4o? You need OPENAI_API_KEY. Using claude-sonnet-4-20250514? You need ANTHROPIC_API_KEY.
Before (hardcoded to one model):
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize: ..."}]
)
print(response.choices[0].message.content)After (Kalibr picks the best model and learns from outcomes):
import kalibr # must be first import
from kalibr import Router
router = Router(
goal="summarize",
paths=["gpt-4o", "claude-sonnet-4-20250514", "deepseek-chat"],
success_when=lambda out: len(out) > 50
)
response = router.completion(
messages=[{"role": "user", "content": "Summarize: ..."}]
)
print(response.choices[0].message.content)
# router.report() is called automatically when success_when is setThe same thing in TypeScript:
import kalibr from "@kalibr/sdk";
import { Router } from "@kalibr/sdk";
const router = new Router({
goal: "summarize",
paths: ["gpt-4o", "claude-sonnet-4-20250514"],
successWhen: (out) => out.length > 50,
});
const response = await router.completion([
{ role: "user", content: "Summarize: ..." }
]);
console.log(response.choices[0].message.content);What changed: you replaced OpenAI with Router, and client.chat.completions.create() with router.completion(). The response object is the same. response.choices[0].message.content works exactly like before.
success_when tells Kalibr how to evaluate each response automatically. For simple checks (non-empty, contains a keyword, valid JSON), this is all you need. For complex validation, skip success_when and call router.report() manually instead.
score_when for a 0.0 to 1.0 quality signal instead of binary pass/fail. This lets Kalibr distinguish "barely passed" from "excellent" and route toward higher quality.score_when=lambda out: min(1.0, len(out) / 800)Go to dashboard.kalibr.systems. You should see:
summarize) registeredAfter 20 to 50 outcomes per path, routing stabilizes and Kalibr favors the model that works best for your goal. Early on, it explores all paths to gather data.
That's it. You're routing.
Every call now goes through Kalibr. When a model degrades, Kalibr reroutes to one that's working, before your users notice.
If your success criteria is too complex for a lambda (needs API calls, multi-step validation, human review), skip success_when and call report() yourself:
Python:
router = Router(
goal="book_meeting",
paths=["gpt-4o", "claude-sonnet-4-20250514"]
)
response = router.completion(messages=[...])
result = response.choices[0].message.content
# your validation logic
meeting_booked = check_calendar(result)
router.report(success=meeting_booked)TypeScript:
const router = new Router({
goal: "book_meeting",
paths: ["gpt-4o", "claude-sonnet-4-20250514"],
});
const response = await router.completion([
{ role: "user", content: "Book a meeting with..." }
]);
const result = response.choices[0].message.content;
const meetingBooked = await checkCalendar(result);
await router.report(meetingBooked);Pass healing=True to let Kalibr automatically recover from a failed call. If the response fails the success contract, Kalibr classifies the failure, repairs the meta prompt, or swaps to an alternative model, then retries — all in one call.
from kalibr import Router
router = Router(
goal="summarize",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda out: len(out) > 50,
)
response = router.completion(
messages=[{"role": "user", "content": "Summarize: ..."}],
healing=True,
)For finer control over retry behavior, pass a HealConfig:
from kalibr import Router, HealConfig
config = HealConfig(
max_retries=2,
gate2_enabled=True,
meta_prompt_enabled=True,
)
response = router.completion(
messages=[{"role": "user", "content": "Summarize: ..."}],
healing=True,
heal_config=config,
)max_retries: how many heal attempts before giving up (default 2)gate2_enabled: run an LLM-judge quality gate in addition to the structural gatemeta_prompt_enabled: let Kalibr repair the meta prompt before swapping modelsUse router.pipeline() to run an end-to-end workflow where each step routes, evals, and heals on its own. Set "chain": True on any step to feed the previous step's output into it.
result = router.pipeline(
[
{"goal": "research", "messages": [...]},
{"goal": "outreach_generation", "messages": [...], "chain": True},
],
healing=True,
pipeline_id="my-pipeline",
)Every step runs the full self-healing loop independently. If one step fails irrecoverably, the pipeline returns the partial result with the failure attached so you can decide what to do next.
Passing pipeline_id scopes outcome learning to that pipeline. Two agents that share a goal but live in different pipelines won't bleed routing signals into each other — useful when one pipeline runs on production data and another runs on tests, or when separate teams share goals but want isolated bandits.
router.completion(
messages=[...],
healing=True,
pipeline_id="sales-outreach-prod",
)Use the same pipeline_id across the calls that should share a learning context, and a different one for anything you want kept separate.
completion(), Kalibr picked a model. Early on it explores all paths. Later it exploits the best one.report() (or via success_when / score_when), Kalibr recorded the outcome.Router instances are not thread-safe. Create one Router per request context, not one shared instance per process.
In async Python, create the Router inside your handler:
# Correct: one router per request
async def handle_request(messages):
router = Router(goal="my_goal", paths=["gpt-4o-mini", "deepseek-chat"])
return await router.completion(messages)
# Wrong: shared instance across concurrent requests
router = Router(goal="my_goal", paths=["gpt-4o-mini"])
async def handle_request(messages):
return await router.completion(messages)Router creation is cheap. The path registration call is async and non-blocking.
pip install anthropic.success_when or call report().KALIBR_API_KEY and KALIBR_TENANT_ID are required.