Quickstart

Get Kalibr routing your LLM calls in 5 minutes. Four steps.

1. Install

Python:

shell
pip install kalibr

TypeScript:

shell
npm install @kalibr/sdk

You also need the provider SDKs for whichever models you want to route between:

shell
# Install whichever providers you'll use
pip install openai        # for gpt-4o, o1, etc.
pip install anthropic     # for claude-sonnet, etc.

2. Set your keys

Option A: Link via terminal (recommended)

shell
kalibr auth
# Opens your browser. Sign in or create an account, enter the code shown in your terminal.
# KALIBR_API_KEY and KALIBR_TENANT_ID saved to .env automatically.

Option B: Manual setup

Get your Kalibr credentials from dashboard.kalibr.systems/settings, then set them alongside your provider keys:

shell
export KALIBR_API_KEY=sk_...           # from your Kalibr dashboard
export KALIBR_TENANT_ID=your-tenant    # from your Kalibr dashboard
export OPENAI_API_KEY=sk-...           # if using OpenAI models
export ANTHROPIC_API_KEY=sk-ant-...    # if using Anthropic models
export DEEPSEEK_API_KEY=sk-...         # if using DeepSeek models

You need API keys for each provider in your paths. Using gpt-4o? You need OPENAI_API_KEY. Using claude-sonnet-4-20250514? You need ANTHROPIC_API_KEY.

3. Replace your LLM call

Before (hardcoded to one model):

python
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize: ..."}]
)
print(response.choices[0].message.content)

After (Kalibr picks the best model and learns from outcomes):

python
import kalibr  # must be first import
from kalibr import Router

router = Router(
    goal="summarize",
    paths=["gpt-4o", "claude-sonnet-4-20250514", "deepseek-chat"],
    success_when=lambda out: len(out) > 50
)
response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}]
)
print(response.choices[0].message.content)
# router.report() is called automatically when success_when is set

The same thing in TypeScript:

typescript
import kalibr from "@kalibr/sdk";
import { Router } from "@kalibr/sdk";

const router = new Router({
  goal: "summarize",
  paths: ["gpt-4o", "claude-sonnet-4-20250514"],
  successWhen: (out) => out.length > 50,
});
const response = await router.completion([
  { role: "user", content: "Summarize: ..." }
]);
console.log(response.choices[0].message.content);

What changed: you replaced OpenAI with Router, and client.chat.completions.create() with router.completion(). The response object is the same. response.choices[0].message.content works exactly like before.

success_when tells Kalibr how to evaluate each response automatically. For simple checks (non-empty, contains a keyword, valid JSON), this is all you need. For complex validation, skip success_when and call router.report() manually instead.

i
Continuous scoring: Use score_when for a 0.0 to 1.0 quality signal instead of binary pass/fail. This lets Kalibr distinguish "barely passed" from "excellent" and route toward higher quality.

score_when=lambda out: min(1.0, len(out) / 800)

4. Check your dashboard

Go to dashboard.kalibr.systems. You should see:

After 20 to 50 outcomes per path, routing stabilizes and Kalibr favors the model that works best for your goal. Early on, it explores all paths to gather data.

i

That's it. You're routing.

Every call now goes through Kalibr. When a model degrades, Kalibr reroutes to one that's working, before your users notice.

Manual outcome reporting

If your success criteria is too complex for a lambda (needs API calls, multi-step validation, human review), skip success_when and call report() yourself:

Python:

python
router = Router(
    goal="book_meeting",
    paths=["gpt-4o", "claude-sonnet-4-20250514"]
)
response = router.completion(messages=[...])
result = response.choices[0].message.content
# your validation logic
meeting_booked = check_calendar(result)
router.report(success=meeting_booked)

TypeScript:

typescript
const router = new Router({
  goal: "book_meeting",
  paths: ["gpt-4o", "claude-sonnet-4-20250514"],
});
const response = await router.completion([
  { role: "user", content: "Book a meeting with..." }
]);
const result = response.choices[0].message.content;
const meetingBooked = await checkCalendar(result);
await router.report(meetingBooked);

Auto-healing

Pass healing=True to let Kalibr automatically recover from a failed call. If the response fails the success contract, Kalibr classifies the failure, repairs the meta prompt, or swaps to an alternative model, then retries — all in one call.

python
from kalibr import Router

router = Router(
    goal="summarize",
    paths=["gpt-4o", "claude-sonnet-4-20250514"],
    success_when=lambda out: len(out) > 50,
)

response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}],
    healing=True,
)

For finer control over retry behavior, pass a HealConfig:

python
from kalibr import Router, HealConfig

config = HealConfig(
    max_retries=2,
    gate2_enabled=True,
    meta_prompt_enabled=True,
)

response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}],
    healing=True,
    heal_config=config,
)

Multi-step pipelines

Use router.pipeline() to run an end-to-end workflow where each step routes, evals, and heals on its own. Set "chain": True on any step to feed the previous step's output into it.

python
result = router.pipeline(
    [
        {"goal": "research", "messages": [...]},
        {"goal": "outreach_generation", "messages": [...], "chain": True},
    ],
    healing=True,
    pipeline_id="my-pipeline",
)

Every step runs the full self-healing loop independently. If one step fails irrecoverably, the pipeline returns the partial result with the failure attached so you can decide what to do next.

Pipeline isolation with pipeline_id

Passing pipeline_id scopes outcome learning to that pipeline. Two agents that share a goal but live in different pipelines won't bleed routing signals into each other — useful when one pipeline runs on production data and another runs on tests, or when separate teams share goals but want isolated bandits.

python
router.completion(
    messages=[...],
    healing=True,
    pipeline_id="sales-outreach-prod",
)

Use the same pipeline_id across the calls that should share a learning context, and a different one for anything you want kept separate.

What just happened

Thread safety

Router instances are not thread-safe. Create one Router per request context, not one shared instance per process.

In async Python, create the Router inside your handler:

python
# Correct: one router per request
async def handle_request(messages):
    router = Router(goal="my_goal", paths=["gpt-4o-mini", "deepseek-chat"])
    return await router.completion(messages)

# Wrong: shared instance across concurrent requests
router = Router(goal="my_goal", paths=["gpt-4o-mini"])

async def handle_request(messages):
    return await router.completion(messages)

Router creation is cheap. The path registration call is async and non-blocking.

Common mistakes

Next steps