API Reference

Python SDK v1.14.3 TypeScript SDK v1.11.0

Router

What this does: Routes requests to the best model based on learned outcomes. Works across any modality.

When to use: Create one Router per goal. Reuse for multiple requests in the same thread/async context.

python

from kalibr import Router

router = Router(
    goal="extract_company",           # required
    paths=["gpt-4o", "claude-sonnet-4-20250514"],  # required
    success_when=lambda x: len(x) > 0,  # optional, bool
    score_when=lambda x: min(1.0, len(x) / 500),  # optional, float 0-1
    auto_register=True,               # optional, default True
)

typescript

import { Router } from '@kalibr/sdk';

const router = new Router({
  goal: 'extract_company',           // required
  paths: ['gpt-4o', 'claude-sonnet-4-20250514'], // required
  successWhen: (output) => output.length > 0, // optional, boolean
  scoreWhen: (output) => Math.min(1.0, output.length / 500), // optional, float 0-1
  autoRegister: true,                // optional, default true
});

Required arguments

Argument	Type	Description
goal	str / string	Name of the goal (e.g., "extract_company")
paths	list / array	List of models or path configs

Optional arguments

Python	TypeScript	Type	Default	Description
success_when	successWhen	callable / function	None	Function that takes output string and returns bool. Auto-calls report().
score_when	scoreWhen	callable / function	None	Function that takes output string and returns float (0.0-1.0). Provides continuous quality signal to routing. Takes priority over success_when when both are set. Success is derived as score >= 0.5. Score is clamped to [0, 1].
session_id	sessionId	str / string	None	Optional session identifier for session-aware routing. When provided, the intelligence service reads recent session momentum and may escalate the model if the session is widening (user frustrated). Falls back to the KALIBR_SESSION_ID environment variable when not passed.
prefer_cached	preferCached	bool / boolean	False	When True, future routing decisions will prefer providers that have warm prompt caches for this goal. Wired in for forward compatibility.
judge_model	—	str	None	Python only. Model ID to use as a Gate 2 quality judge (e.g. "deepseek-chat"). When set, Kalibr runs an LLM quality check on each response; if the score falls below judge_threshold, it tries the next path (or repairs the prompt if repair_prompt=True). Requires DEEPSEEK_API_KEY or OPENAI_API_KEY in env.
judge_threshold	—	float	0.7	Python only. Quality score threshold for Gate 2. Responses scoring below this value trigger a model swap (or prompt repair). Range: 0.0–1.0.
repair_prompt	—	bool	False	Python only. When True and a Gate 2 quality check fails, Kalibr rewrites the user prompt using the judge model before trying the next path. The rewritten prompt is derived from the original request and the bad output. Only active when a judge_model is also set.
exploration_rate	explorationRate	float / number	None / 0.1	Override the Thompson Sampling exploration rate (0.0–1.0). Higher values explore less-proven paths more aggressively. When None (Python default), the intelligence service controls exploration automatically.
auto_register	autoRegister	bool / boolean	True	Register paths on init

Thread Safety

Router is NOT thread-safe. Internal state (trace_id) will be corrupted if used across threads/async contexts.

Python: Create one Router instance per thread.
TypeScript: Create one Router instance per async context (request handler, worker, etc.)

Common mistakes

Sharing one Router across concurrent threads or async contexts. Create a separate instance per thread or request handler.
Forgetting to call report()
Using same goal for different tasks
Using Router across threads/async contexts -- Create separate instances

Router.completion()

What this does: Makes a completion request with intelligent routing.

When to call: Every time your agent needs a model response for this goal.

Example

python

response = router.completion(
    messages=[
        {"role": "system", "content": "Extract company names."},
        {"role": "user", "content": "Hi, I'm Sarah from Stripe."}
    ],
    max_tokens=100
)
print(response.choices[0].message.content)

typescript

const response = await router.completion(
  [
    { role: 'system', content: 'Extract company names.' },
    { role: 'user', content: "Hi, I'm Sarah from Stripe." },
  ],
  { maxTokens: 100 }
);

console.log(response.choices[0].message.content);

Required arguments

Argument	Type	Description
messages	list / array	OpenAI-format messages

Optional arguments

Python	TypeScript	Type	Description
force_model	forceModel	str / string	Override routing, use this model
max_tokens	maxTokens	int / number	Maximum tokens in response
healing	healing	bool / boolean	Default False. When True, the Router runs the structural gate after each call, classifies failures, repairs the meta prompt or swaps to the next path, and retries inside the same call. New in v1.14.
heal_config	healConfig	HealConfig	Optional. Tunes retry budget and which gates run during healing. See HealConfig below.
pipeline_id	pipelineId	str / string	Optional. Scopes outcome learning to this pipeline so routing signals don't bleed between unrelated agents that share a goal. New in v1.14.
**kwargs	options	any / object	Passed to provider (temperature, etc.)

Common mistakes

Passing model in kwargs (Kalibr picks the model; use force_model/forceModel to override)
Not handling exceptions (provider errors still raise)

Exceptions

Provider errors propagate to caller:

Python: openai.OpenAIError, anthropic.AnthropicError
TypeScript: OpenAI.APIError, Anthropic.APIError

Intelligence service failures do NOT raise -- Router falls back to first path.

HealConfig

What this does: Configures how router.completion(healing=True) retries failed calls. Pass via heal_config=.

When to use: When the default heal budget or gate set does not fit the goal. For example: to enable an LLM-judge quality gate, disable meta-prompt repair, or change the retry count.

python

from kalibr import Router, HealConfig

config = HealConfig(
    max_retries=2,            # heal attempts before giving up
    gate2_enabled=True,       # LLM-judge quality gate
    meta_prompt_enabled=True, # repair meta prompt before model swap
)

response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}],
    healing=True,
    heal_config=config,
)

Fields

Field	Type	Default	Description
max_retries	int	2	Maximum heal attempts before returning the last response.
gate2_enabled	bool	False	Run the LLM-judge quality gate in addition to the structural gate. Uses the caller's model/keys.
judge_model	str	"deepseek-chat"	Model used for the Gate 2 LLM judge when gate2_enabled=True.
repair_model	str or None	None	Optional override model for repair calls. When None, reuses the same model that produced the failing output.
meta_prompt_enabled	bool	False	Generate a task-specific system prompt via a cheap LLM before each heal step. Combined with repair prompts on retry. Fails open.

Router.pipeline()

What this does: Runs a sequence of routed, gated, healed steps as a single call. Each step picks its own goal and messages and runs the full self-healing loop independently.

When to use: Multi-step agent workflows (research → outreach, extract → enrich → classify) where you want every intermediate step to be evaluated and healed without writing orchestration code.

python

result = router.pipeline(
    [
        {"goal": "research", "messages": [...]},
        {"goal": "outreach_generation", "messages": [...], "chain": True},
    ],
    healing=True,
    pipeline_id="sales-outreach-prod",
)

Required arguments

Argument	Type	Description
steps	list[dict]	Ordered list of step specs. Each step requires goal and messages. Set "chain": True to feed the previous step's output into the current step.

Optional arguments

Argument	Type	Default	Description
healing	bool	False	Enable the self-healing loop for every step.
heal_config	HealConfig	None	Override default heal budget / gates for every step.
pipeline_id	str	None	Scope outcome learning to this pipeline so routing signals don't bleed between unrelated pipelines.

Returns: Pipeline result with per-step outputs and metadata. If a step fails after exhausting retries, the pipeline returns a partial result with the failure attached.

as_langchain()

Returns a LangChain-compatible LLM that uses Kalibr for routing. Use this to integrate with frameworks like CrewAI and LangChain.

python

from kalibr import Router

router = Router(
    goal="my_task",
    paths=["gpt-4o", "claude-sonnet-4-20250514"]
)

# Returns a LangChain BaseChatModel
llm = router.as_langchain()

# Use with any LangChain chain or CrewAI agent
from langchain_core.prompts import ChatPromptTemplate
chain = ChatPromptTemplate.from_template("{input}") | llm

Note: You still need to call router.report() after the chain completes to report the outcome.

Router.execute()

What this does: Routes any HuggingFace task with the same outcome-learning loop as completion(). Works for transcription, image generation, embeddings, classification, and all 17 HuggingFace task types.

When to call: When your agent needs to run a non-chat model task (audio, image, embedding, classification).

Parameter	Type	Required	Description
task	str	Yes	HuggingFace task type (e.g. "automatic_speech_recognition", "text_to_image")
input_data	Any	Yes	Task-appropriate input (audio bytes, text prompt, image, etc.)
**kwargs		No	Passed to HuggingFace provider

Returns: Task-native response (transcription text, PIL image, embedding vector, classification labels, etc.)

Examples

python

# Transcription
result = router.execute(task="automatic_speech_recognition", input_data=audio_bytes)

# Image generation
result = router.execute(task="text_to_image", input_data="a product photo of a laptop")

# Embeddings
result = router.execute(task="feature_extraction", input_data="search query text")

# Classification
result = router.execute(task="text_classification", input_data="This product is amazing!")

Supported task types: chat_completion, text_generation, summarization, translation, fill_mask, table_question_answering, automatic_speech_recognition, text_to_speech, audio_classification, text_to_image, image_to_text, image_classification, image_segmentation, object_detection, feature_extraction, text_classification, token_classification

Router.synthesize()

What this does: Routes a text-to-speech (TTS) request to the best voice model, with the same outcome-learning loop as completion(). Supports ElevenLabs, OpenAI TTS, and Deepgram Aura.

python

router = Router(
    goal="narrate",
    paths=["tts-1", "eleven_multilingual_v2"],  # OpenAI TTS or ElevenLabs
)
result = router.synthesize("Hello world", voice="alloy")
# result.audio     , audio bytes
# result.cost_usd  , cost of this call
# result.model     , which model was selected
# result.kalibr_trace_id , for manual report() if needed

Provider detection: tts-* / whisper-* -- OpenAI · eleven_* -- ElevenLabs · nova-* / aura-* -- Deepgram

Required env vars: OPENAI_API_KEY for OpenAI TTS · ELEVENLABS_API_KEY for ElevenLabs · DEEPGRAM_API_KEY for Deepgram

Install: pip install kalibr[voice] (includes ElevenLabs + Deepgram SDK)

Router.transcribe()

What this does: Routes a speech-to-text (STT) request to the best model. Supports OpenAI Whisper and Deepgram Nova.

python

router = Router(
    goal="transcribe_meeting",
    paths=["whisper-1", "nova-2"],  # OpenAI Whisper or Deepgram Nova
)
result = router.transcribe(audio_bytes, audio_duration_seconds=120.0)
# result.text      , transcribed text
# result.cost_usd  , cost of this call
# result.kalibr_trace_id , for manual report() if needed

Provider detection: whisper-* -- OpenAI · nova-* / enhanced / base -- Deepgram

Router.report()

What this does: Reports outcome for the last completion. This is how Kalibr learns.

When to call: After you know whether the task succeeded or failed.

Example

python

# Success
router.report(success=True)

# Failure with reason
router.report(success=False, reason="invalid_json")

# Success with quality score
router.report(success=True, score=0.8)

# Failure with structured category
router.report(success=False, failure_category="timeout", reason="Provider timed out after 30s")

typescript

// Success
await router.report(true);

// Failure with reason
await router.report(false, 'invalid_json');

// Success with score
await router.report(true, undefined, 0.8);

Required arguments

Argument	Type	Description
success	bool / boolean	Whether the task succeeded

Optional arguments

Argument	Type	Description
reason	str / string	Failure reason (for debugging)
score	float / number	Quality score 0.0-1.0. Feeds directly into routing. A path that consistently scores 0.85 will be preferred over one scoring 0.6, even if both "succeed." When provided, score is used as a continuous signal in the routing engine, giving finer-grained path selection than boolean alone. A score of 0.85 counts as 0.85 successes and 0.15 failures.
failure_category	str / string	Structured failure category for clustering. Valid: timeout, context_exceeded, tool_error, rate_limited, validation_failed, hallucination_detected, user_unsatisfied, empty_response, malformed_output, auth_error, provider_error, healed, unknown. Raises ValueError if invalid.

Common mistakes

Calling report() multiple times for one completion (second call is ignored)
Not calling report() at all (routing never improves)
Reporting success based on "response exists" instead of "task actually worked"

When to call

After you know if task succeeded/failed
Once per completion() call
For multi-turn, report once at end

Validation

Calling report() without a prior completion() raises an error. Calling report() twice for the same completion logs a warning and ignores the second call.

get_policy()

Get the recommended path for a goal without making a completion call. Useful for inspecting what Kalibr would choose or for custom routing logic.

python

from kalibr import get_policy

policy = get_policy(goal="book_meeting")

print(policy["recommended_model"])       # "gpt-4o"
print(policy["recommended_tool"])        # "calendar_api"
print(policy["outcome_success_rate"])    # 0.87
print(policy["confidence"])              # 0.92

typescript

import { getPolicy } from '@kalibr/sdk';

const policy = await getPolicy({ goal: 'book_meeting' });

console.log(policy.recommendedModel);      // "gpt-4o"
console.log(policy.recommendedTool);       // "calendar_api"
console.log(policy.outcomeSuccessRate);    // 0.87
console.log(policy.confidence);            // 0.92

With Constraints

python

policy = get_policy(
    goal="book_meeting",
    constraints={
        "max_cost_usd": 0.05,
        "max_latency_ms": 2000,
        "min_quality": 0.8
    }
)

typescript

const policy = await getPolicy({
    goal: 'book_meeting',
    constraints: {
        maxCostUsd: 0.05,
        maxLatencyMs: 2000,
        minQuality: 0.8,
    },
});

Kalibr will only recommend paths that meet all constraints. If no paths meet the constraints, the response will indicate no recommendation is available.

Parameters

Argument	Required	Description
goal	Yes	Goal name
constraints	No	Object with max_cost_usd, max_latency_ms, min_quality

Returns

Field	Description
recommended_model	Best model for this goal
recommended_tool	Best tool (if tools are tracked)
recommended_params	Best parameters (if params are tracked)
outcome_success_rate	Historical success rate for this path
confidence	Statistical confidence (0-1)
alternatives	Other viable paths ranked by performance

decide()

Get a routing decision for a goal. Returns the routing decision for a goal. This is what Router.completion() calls internally, but available for low-level control.

python

from kalibr import decide

decision = decide(goal="book_meeting", task_risk_level="low")

print(decision["model_id"])    # "gpt-4o"
print(decision["tool_id"])     # "calendar_api" or None
print(decision["params"])      # {"temperature": 0.3} or {}
print(decision["trace_id"])    # "abc123...", pass this to report_outcome
print(decision["confidence"])  # 0.85

Parameters

Argument	Required	Default	Description
goal	Yes	-	Goal name
task_risk_level	No	"low"	Risk tolerance: "low", "medium", or "high"

report_outcome()

Report execution outcome directly (without using Router). The feedback loop that teaches Kalibr what works.

python

from kalibr import report_outcome

report_outcome(
    trace_id="abc123",
    goal="book_meeting",
    success=True,
    score=0.95,                    # optional quality score 0-1
    failure_category="timeout",    # optional structured category
    model_id="gpt-4o",            # optional
)

Parameters

Argument	Required	Default	Description
trace_id	Yes	-	Trace ID from decide() or completion
goal	Yes	-	Goal name
success	Yes	-	Whether the goal was achieved
score	No	None	Quality score 0-1
failure_reason	No	None	Free-text failure reason
failure_category	No	None	Structured failure category (see FAILURE_CATEGORIES)
metadata	No	None	Additional context as dict
model_id	No	None	Model used
tool_id	No	None	Tool used
execution_params	No	None	Parameters used

update_outcome()

Update an existing outcome with a late-arriving signal. Use when the real success signal arrives after the initial report, for example, updating 48 hours later when a customer reopens a ticket that was initially reported as resolved.

Only fields that are explicitly passed (not None) will be updated. Other fields retain their original values.

python

from kalibr import update_outcome

# Customer reopened ticket 48 hours after "resolution"
result = update_outcome(
    trace_id="abc123",
    goal="resolve_ticket",
    success=False,
    failure_category="user_unsatisfied",
)
print(result["fields_updated"])  # ["success", "failure_category"]

Parameters

Argument	Required	Default	Description
trace_id	Yes	-	Trace ID of the outcome to update
goal	Yes	-	Goal (must match original outcome)
success	No	None	Updated success status
score	No	None	Updated quality score 0-1
failure_reason	No	None	Updated failure reason
failure_category	No	None	Updated failure category
metadata	No	None	Additional metadata to merge

Returns 404 if no outcome exists for the given trace_id + goal combination.

get_insights()

Get structured diagnostics about what Kalibr has learned. Returns machine-readable intelligence per goal, designed for coding agents that need to decide what to improve.

Response includes schema_version: "1.0" for forward compatibility.

python

from kalibr import get_insights

# All goals
insights = get_insights()

# Single goal, custom window
insights = get_insights(goal="resolve_ticket", window_hours=48)

for goal in insights["goals"]:
    print(f"{goal['goal']}: {goal['status']} ({goal['success_rate']:.0%})")
    for signal in goal["actionable_signals"]:
        if signal["severity"] == "critical":
            print(f"  {signal['type']}: {signal['data']}")

Parameters

Argument	Required	Default	Description
goal	No	None	Filter to a specific goal (returns all if None)
window_hours	No	168	Time window for analysis (default 1 week)

Response structure (per goal)

Field	Description
goal	Goal name
status	healthy, degrading, failing, or insufficient_data
success_rate	Overall success rate (0-1)
sample_count	Total outcomes in window
trend	improving, stable, or degrading
confidence	Statistical confidence (0-1)
top_failure_modes	Failure categories ranked by frequency
paths	Per-path performance (success_rate, trend, cost, latency)
param_sensitivity	Parameters that significantly affect outcomes
actionable_signals	Machine-readable signals (see below)

Actionable signal types

Signal	Description
path_underperforming	A path is >15pp below the best path
failure_mode_dominant	One failure category accounts for >50% of failures
param_sensitivity_detected	A parameter value significantly affects outcomes (>10pp spread)
drift_detected	Path performance is degrading over time
cost_inefficiency	A cheaper path has similar success rate (within 5pp)
low_confidence	Path has fewer than 20 samples
goal_healthy	No action needed, goal is performing well

register_path()

python

from kalibr import register_path

result = register_path(
    goal="book_meeting",
    model_id="gpt-4o",
    tool_id="calendar_api",              # optional
    params={"temperature": 0.3},         # optional
    risk_level="low",                    # optional: "low", "medium", "high"
)
print(result["path_id"])

FAILURE_CATEGORIES

Constant containing all valid failure category values. Import and use for client-side validation.

python

from kalibr import FAILURE_CATEGORIES

print(FAILURE_CATEGORIES)
# ["timeout", "context_exceeded", "tool_error", "rate_limited",
#  "validation_failed", "hallucination_detected", "user_unsatisfied",
#  "empty_response", "malformed_output", "auth_error", "provider_error", "healed", "unknown"]

# Used by report() and report_outcome(), raises ValueError if invalid category passed

Intelligence API (TypeScript)

The TypeScript SDK exports convenience functions for direct access to the Intelligence API.

typescript

import {
  KalibrIntelligence,
  getPolicy,
  reportOutcome,
  registerPath,
  decide,
  getRecommendation,
  listPaths,
  disablePath,
  setExplorationConfig,
  getExplorationConfig,
} from '@kalibr/sdk';

// Initialize singleton
KalibrIntelligence.init({
  apiKey: process.env.KALIBR_API_KEY!,
  tenantId: process.env.KALIBR_TENANT_ID!,
});

// Get routing decision
const decision = await decide('extract_company');
console.log(decision.model_id, decision.confidence);

// Report outcome directly
await reportOutcome(traceId, 'extract_company', true, {
  score: 0.95,
  modelId: 'gpt-4o',
});

// List registered paths
const { paths } = await listPaths({ goal: 'extract_company' });

// List registered paths
const allPaths = await listPaths({ goal: 'extract_company' });

Auto-Instrumentation (TypeScript)

Wrap OpenAI or Anthropic clients to automatically trace all LLM calls.

typescript

import { createTracedOpenAI, createTracedAnthropic } from '@kalibr/sdk';

// Wrap OpenAI client, all calls auto-traced
const openai = createTracedOpenAI();
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Wrap Anthropic client
const anthropic = createTracedAnthropic();
const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

Context Management (TypeScript)

Run code within goal or trace contexts using async local storage.

typescript

import { withGoal, withTraceId, traceContext } from '@kalibr/sdk';

// Run code within a goal context
await withGoal('extract_company', async () => {
  // All Kalibr operations inherit this goal
  const response = await openai.chat.completions.create({...});
});

// Run code with a specific trace ID
await withTraceId('my-custom-trace-id', async () => {
  // All operations use this trace ID
});

// Combined trace context
await traceContext({ traceId: 'my-trace', goal: 'summarize' }, async () => {
  // Both trace ID and goal available
});

Environment Variables

Variable	Required	Default	Description
KALIBR_API_KEY	Yes	-	Your API key from dashboard
KALIBR_TENANT_ID	Yes	default	Your tenant ID
KALIBR_AUTO_INSTRUMENT	No	true	Auto-instrument OpenAI/Anthropic/Google SDKs
KALIBR_INTELLIGENCE_URL	No	https://kalibr-intelligence.fly.dev	Intelligence service endpoint
OPENAI_API_KEY	For OpenAI	-	OpenAI models (gpt-4o, tts-1, whisper-1)
ANTHROPIC_API_KEY	For Anthropic	-	Claude models
GOOGLE_API_KEY	For Google	-	Gemini models
DEEPSEEK_API_KEY	For DeepSeek	-	deepseek-chat, deepseek-reasoner, deepseek-coder
HF_API_TOKEN	For HuggingFace	-	Private models or free-tier rate limit bypass
ELEVENLABS_API_KEY	For ElevenLabs	-	ElevenLabs TTS (eleven_multilingual_v2, etc)
DEEPGRAM_API_KEY	For Deepgram	-	Deepgram STT (nova-2) and TTS (aura-*)

REST API Endpoints

Intelligence service: https://kalibr-intelligence.fly.dev

All endpoints except GET /health require two headers: X-API-Key and X-Tenant-ID.

POST /api/v1/routing/decide

Get a routing decision for a goal. Selects the best registered path based on outcome history. Returns a trace_id that must be passed to report-outcome.

curl

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/decide \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company"}'

Request

Field	Required	Default	Description
goal	Yes	-	The goal to achieve (e.g. "extract_company")
task_risk_level	No	"low"	Risk level: "low", "medium", or "high"

Response

Field	Type	Description
trace_id	string	Unique ID -- pass this to report-outcome
path_id	string	The selected path identifier
model_id	string	Model to use (e.g. "gpt-4o")
tool_id	string \| null	Tool to use, if any
params	object	Execution parameters for this path
goal	string	Echo of the requested goal
reason	string	Why this path was chosen: "optimal" (best known path by success rate), "cost_optimized" (tied on quality, lower cost wins), or "fallback" (learning in progress, not enough data yet)
confidence	float	Confidence in this path (0-1)
exploration	bool	True if this is an exploration decision
success_rate	float	Historical success rate for this path

POST /api/v1/intelligence/report-outcome

Report execution outcome. This is the feedback loop that teaches Kalibr what works. Updates both ClickHouse (durable) and Redis (real-time).

curl

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/report-outcome \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"trace_id": "abc123", "goal": "extract_company", "success": true}'

Request

Field	Required	Default	Description
trace_id	Yes	-	Trace ID from decide() or Router.completion()
goal	Yes	-	The goal this execution was trying to achieve
success	Yes	-	Whether the goal was achieved (boolean)
score	No	null	Quality score 0.0-1.0
model_id	No	null	Model that was used. If omitted, looked up from trace.
failure_reason	No	null	Free-text failure description
failure_category	No	null	Structured category -- must be a value from FAILURE_CATEGORIES
tool_id	No	null	Tool that was used
execution_params	No	null	Parameters used (e.g. {"temperature": 0.3})
metadata	No	null	Additional context as an object

Response

Field	Type	Description
status	string	Always "accepted" on success
trace_id	string	Echo of the submitted trace ID
goal	string	Echo of the submitted goal

POST /api/v1/intelligence/update-outcome

Update an existing outcome with a late-arriving signal. Only fields explicitly passed (not null) are updated. Use for async validation, user feedback, or downstream confirmation.

curl

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/update-outcome \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"trace_id": "abc123", "goal": "resolve_ticket", "success": false, "failure_category": "user_unsatisfied"}'

Request

Field	Required	Default	Description
trace_id	Yes	-	Trace ID of the outcome to update
goal	Yes	-	Goal name -- must match the original
success	No	null	Updated success status
score	No	null	Updated quality score 0.0-1.0
failure_reason	No	null	Updated failure reason
failure_category	No	null	Updated failure category
metadata	No	null	Additional metadata to merge into existing

Response

Field	Type	Description
status	string	"updated" or "no_changes" if nothing changed
trace_id	string	Echo of trace ID
goal	string	Echo of goal
fields_updated	string[]	List of field names that were actually changed

GET /api/v1/intelligence/insights

Get structured diagnostics about what Kalibr has learned. Returns health status, failure mode breakdown, path comparisons, and actionable signals per goal.

curl

curl "https://kalibr-intelligence.fly.dev/api/v1/intelligence/insights?window_hours=168&goal=resolve_ticket" \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

Query params

Param	Default	Description
window_hours	168	Time window in hours (default 1 week)
goal	null	Filter to a specific goal. Omit for all goals.

Response shape

Field	Type	Description
schema_version	string	Always "1.0"
tenant_id	string	Your tenant ID
generated_at	string	ISO timestamp
goals	object[]	Per-goal insight objects
cross_goal_summary	object	Aggregate counts: total_goals, healthy, degrading, failing, insufficient_data, total_outcomes

Each goals[] entry contains: goal, status ("healthy" / "degrading" / "failing" / "insufficient_data"), success_rate, sample_count, trend, trend_delta, confidence, top_failure_modes, paths, and actionable_signals.

POST /api/v1/intelligence/policy

Get the historically best-performing execution path for a goal. Unlike /decide, this is deterministic -- returns the historically best path with no sampling. Returns 404 if no execution data exists yet.

curl

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/policy \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "summarize_ticket"}'

Request

Field	Required	Default	Description
goal	Yes	-	The goal to get policy for
task_type	No	null	Optional task type filter
window_hours	No	168	Time window for pattern analysis
include_tools	No	true	Whether to include tool recommendations
include_params	No	[]	Parameter keys to include recommendations for
constraints	No	null	Cost/latency/quality constraints object

Response

Field	Type	Description
goal	string	Echo of the goal
recommended_model	string	Best model ID based on outcomes
recommended_provider	string	Provider name
outcome_success_rate	float	Historical success rate for this path
outcome_sample_count	int	Number of outcome reports used
confidence	float	Statistical confidence 0-1
risk_score	float	Risk score 0-1, lower is better
reasoning	string	Human-readable explanation
alternatives	object[]	Other viable paths
source	string	"realtime" or "historical"
recommended_tool	string \| null	Best tool for this goal
recommended_params	object \| null	Recommended parameter values

POST /api/v1/intelligence/get-alternative

Get the next-best path after a primary model fails. Use for retry logic: call /decide, execute, fail, then call this with the failed models in exclude_models. Returns 404 if all registered paths are exhausted.

curl

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/get-alternative \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company", "exclude_models": ["gpt-4o"]}'

Request

Field	Required	Default	Description
goal	Yes	-	The goal to get an alternative for
exclude_models	Yes	-	List of model IDs already tried
task_type	No	null	Optional task type filter
window_hours	No	168	Time window for pattern analysis
constraints	No	null	Cost/latency/quality constraints

Response

Field	Type	Description
goal	string	Echo of the goal
recommended_model	string	Next-best model ID
recommended_provider	string	Provider name
outcome_success_rate	float	Historical success rate
confidence	float	Statistical confidence 0-1
reasoning	string	Why this alternative was chosen
remaining_alternatives	int	Number of other alternatives still available

POST /api/v1/routing/paths

curl

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/paths \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company", "model_id": "gpt-4o"}'

Request

Field	Required	Default	Description
goal	Yes	-	The goal this path achieves
model_id	Yes	-	Model to use (e.g. "gpt-4o")
tool_id	No	null	Tool to use
params	No	{}	Execution parameters
risk_level	No	"low"	Risk level: "low", "medium", or "high"

GET /api/v1/routing/paths

List registered paths for a goal.

curl

curl "https://kalibr-intelligence.fly.dev/api/v1/routing/paths?goal=extract_company" \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

DELETE /api/v1/routing/paths/{path_id}

Disable a path. Soft-deleted -- marked disabled, traffic stops, outcome history preserved.

curl

curl -X DELETE https://kalibr-intelligence.fly.dev/api/v1/routing/paths/path_abc123 \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

GET /api/v1/routing/stats

Get goal-level routing statistics including success rates, sample counts, and path performance.

curl

curl "https://kalibr-intelligence.fly.dev/api/v1/routing/stats?goal=extract_company" \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

GET /api/v1/intelligence/health

Check service health. Does not require authentication.

curl

curl https://kalibr-intelligence.fly.dev/api/v1/intelligence/health

Returns {"status": "healthy", "clickhouse": "connected", "redis": "connected", "last_aggregation": "..."}.

Default Values

Parameter	Default	Description
min_samples	20	Outcomes needed per path before stable routing
success_when	None	Heuristic auto-scoring (response length, structure, finish reason)
score_when	None	Heuristic auto-scoring used when both callbacks are None

Production Guide. Failure modes, monitoring

API Reference

Router

Required arguments

Optional arguments

Thread Safety

Common mistakes

Router.completion()

Example

Required arguments

Optional arguments

Common mistakes

Exceptions

HealConfig

Fields

Router.pipeline()

Required arguments

Optional arguments

as_langchain()

Router.execute()

Examples

Router.synthesize()

Router.transcribe()

Router.report()

Example

Required arguments

Optional arguments

Common mistakes

When to call

Validation

get_policy()

With Constraints

Parameters

Returns

decide()

Parameters

report_outcome()

Parameters

update_outcome()

Parameters

get_insights()

Parameters

Response structure (per goal)

Actionable signal types

register_path()

FAILURE_CATEGORIES

Intelligence API (TypeScript)

Auto-Instrumentation (TypeScript)

Context Management (TypeScript)

Environment Variables

REST API Endpoints

POST /api/v1/routing/decide

Request

Response

POST /api/v1/intelligence/report-outcome

Request

Response

POST /api/v1/intelligence/update-outcome

Request

Response

GET /api/v1/intelligence/insights

Query params

Response shape

POST /api/v1/intelligence/policy

Request

Response

POST /api/v1/intelligence/get-alternative

Request

Response

POST /api/v1/routing/paths

Request

GET /api/v1/routing/paths

DELETE /api/v1/routing/paths/{path_id}

GET /api/v1/routing/stats

GET /api/v1/intelligence/health

Default Values

Next