i
This page has moved. The developer quickstart is now at Path 1: Developer integration.

Quickstart

Get Kalibr routing your model calls in 5 minutes. Four steps.

i

Using an agent? Pick your setup path:

OpenClaw / Hermes (zero human steps):

terminal
pip install kalibr
kalibr prompt --openclaw --email you@example.com

Paste the output into your agent. It creates your account, installs dependencies, configures the plugin, and verifies — zero human steps after that.

Claude Code / Cursor / Windsurf:

prompt
Read https://kalibr.systems/llms.txt and integrate Kalibr into this project.

1. Install

Python:

shell
pip install kalibr

TypeScript:

shell
npm install @kalibr/sdk

You also need provider SDKs for the models you want to route between:

shell
# Install whichever providers you'll use
pip install openai        # for gpt-4o, o1, etc.
pip install anthropic     # for claude-sonnet, etc.

2. Set your keys

Option A, Link via terminal (recommended)

shell
kalibr auth
# Opens your browser. Sign in or create an account, enter the code shown in your terminal.
# KALIBR_API_KEY and KALIBR_TENANT_ID saved to .env automatically.

Option B, Manual setup

Get your Kalibr credentials from dashboard.kalibr.systems/settings, then set them alongside your provider keys:

shell
export KALIBR_API_KEY=sk_...           # from your Kalibr dashboard
export KALIBR_TENANT_ID=your-tenant    # from your Kalibr dashboard
export OPENAI_API_KEY=sk-...           # if using OpenAI models
export ANTHROPIC_API_KEY=sk-ant-...    # if using Anthropic models
export DEEPSEEK_API_KEY=sk-...         # if using DeepSeek models (deepseek-chat, deepseek-reasoner)
export HF_API_TOKEN=hf_...             # if using HuggingFace models (private or rate-limit bypass)

You need API keys for each provider in your paths. Using gpt-4o? You need OPENAI_API_KEY. Using claude-sonnet-4-20250514? You need ANTHROPIC_API_KEY.

3. Replace your LLM call

Before, hardcoded to one model (Python):

python
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract the company: Hi from Stripe"}]
)
print(response.choices[0].message.content)

After, Kalibr picks the best model and learns from outcomes (Python):

python
from kalibr import Router
router = Router(
    goal="extract_company",
    paths=["gpt-4o", "claude-sonnet-4-20250514"],
    success_when=lambda output: len(output) > 0
)
response = router.completion(
    messages=[{"role": "user", "content": "Extract the company: Hi from Stripe"}]
)
print(response.choices[0].message.content)
# That's it. Kalibr picked the model, made the call, and reported the outcome.

Before, hardcoded to one model (TypeScript):

typescript
import OpenAI from 'openai';
const client = new OpenAI();
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Extract the company: Hi from Stripe' }],
});

After, Kalibr picks the best model and learns from outcomes (TypeScript):

typescript
import { Router } from '@kalibr/sdk';
const router = new Router({
  goal: 'extract_company',
  paths: ['gpt-4o', 'claude-sonnet-4-20250514'],
  successWhen: (output) => output.length > 0,
});
const response = await router.completion([
  { role: 'user', content: 'Extract the company: Hi from Stripe' }
]);
console.log(response.choices[0].message.content);
// That's it. Kalibr picked the model, made the call, and reported the outcome.

What changed: you swapped 3 lines. Router instead of OpenAI, router.completion() instead of client.chat.completions.create(). The response object is the same, response.choices[0].message.content works exactly like before.

success_when tells Kalibr how to auto-evaluate each response. For simple checks (non-empty, contains "@", valid JSON), this is all you need. For complex validation, skip success_when and call router.report() manually, see step 4.

Want finer-grained quality signals? Add score_when for continuous scoring (0.0-1.0). This lets Kalibr distinguish between "barely passed" and "excellent", routing toward higher quality, not just success:

python
router = Router(
    goal="extract_company",
    paths=["gpt-4o", "claude-sonnet-4-20250514"],
    success_when=lambda output: len(output) > 0,
    score_when=lambda output: min(1.0, len(output) / 500),  # quality score 0-1
)
i
Auto-scoring: When you omit both success_when and score_when, Kalibr still auto-scores every completion using built-in heuristics (response length, structure, finish reason). You get routing intelligence from day one with zero evaluation code. Add success_when or score_when when you want custom quality signals.

4. Check your dashboard

Go to dashboard.kalibr.systems. You should see:

After 20-50 outcomes per path, routing stabilizes and Kalibr will favor the model that works best for your goal. Early on, it explores both paths to gather data.

i

That's it. You're routing.

Every call now goes through Kalibr. When a provider degrades, Kalibr reroutes to the one that's working, before your users notice.

Manual outcome reporting

If your success criteria is too complex for a lambda (needs API calls, multi-step checks, human review), skip success_when and call report() yourself:

Python:

python
router = Router(
    goal="book_meeting",
    paths=["gpt-4o", "claude-sonnet-4-20250514"]
    # no success_when, we'll report manually
)
response = router.completion(messages=[...])
result = response.choices[0].message.content
# ... your validation logic ...
meeting_booked = check_calendar(result)
if meeting_booked:
    router.report(success=True, score=0.9)
else:
    router.report(success=False, score=0.1, reason="meeting not found in calendar")

TypeScript:

typescript
const router = new Router({
  goal: 'book_meeting',
  paths: ['gpt-4o', 'claude-sonnet-4-20250514']
  // no successWhen, we'll report manually
});
const response = await router.completion(messages);
const result = response.choices[0].message.content;
// ... your validation logic ...
const meetingBooked = await checkCalendar(result);
if (meetingBooked) {
  await router.report(true);
} else {
  await router.report(false, 'meeting not found in calendar');
}

Using a framework?

Kalibr integrates with LangChain, CrewAI, and OpenAI Agents SDK. See Framework Integrations for setup instructions.

Or if you just want tracing without routing, add one line to the top of your entry point:

python
import kalibr  # must be the first import, auto-patches OpenAI, Anthropic, Google SDKs

All LLM calls are now traced automatically to your dashboard. No other code changes needed.

What just happened

Thread safety

Router instances are not thread-safe. Create one Router instance per request context, not one shared instance per process.

In async Python applications, create the Router inside your async handler, not at module level:

python
# Do this
async def handle_request(messages):
    router = Router(goal="my_goal", paths=["gpt-4o-mini", "claude-haiku"])
    return await router.completion(messages)

# Not this (shared instance, not safe in concurrent use)
router = Router(goal="my_goal", paths=["gpt-4o-mini"])  # module level

async def handle_request(messages):
    return await router.completion(messages)

Router creation is cheap. The path registration call is async and non-blocking.

Common mistakes

Next steps