API Overview
Pellet AI provides an OpenAI-compatible inference API with intelligent routing. Send your prompts to one endpoint and Pellet automatically selects the best model based on task type, complexity, and your routing preferences.
10+
Models available
~200ms
Routing overhead
$2.50
Free tier credit
Base URL
https://getpellet.io/v1Authentication
All API requests require a valid API key sent in the Authorization header. You can create API keys from the API Keys page.
curl https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'Keep your API key secret. Do not expose it in client-side code, public repositories, or browser requests. All API calls should be made from your server.
POST /v1/chat/completions
/v1/chat/completionsOpenAI-compatible chat completions endpoint. Drop-in replacement for the OpenAI SDK. Supports text, vision (images), streaming, and intelligent routing.
Request body
messagesarrayrequiredList of message objects. Each message has a role (system, user, assistant, or tool) and content (string or array of content parts for multimodal). Tool-role messages must include a tool_call_id. Assistant messages may include a tool_calls array.
modelstring | nullDefault: nullModel ID to use. Set to null or omit for auto-routing (recommended). Specify a model ID (e.g. meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) to bypass routing.
streambooleanDefault: falseIf true, responses are streamed as server-sent events (SSE).
temperaturefloatDefault: 0.7Sampling temperature between 0 and 2. Lower values are more deterministic.
max_tokensintegerDefault: 1024Maximum number of tokens to generate.
top_pfloatDefault: 1.0Nucleus sampling threshold.
stopstring | array | nullDefault: nullStop sequence(s). Generation stops when encountered.
toolsarray | nullDefault: nullList of tool (function) definitions the model may call. Each tool has a type of function and a function object with name, description, and parameters (JSON Schema). See Tool Use section below.
tool_choicestring | object | nullDefault: nullControls tool calling behavior. auto lets the model decide, required forces a tool call, none disables tools. Pass an object like {"type":"function","function":{"name":"..."}} to force a specific tool.
pellet_configobjectDefault: {}Pellet-specific routing configuration. See Routing Modes below.
tagsstring[] | nullDefault: nullOptional list of labels to attach to this request. Tags are stored with the request log and can be filtered in the dashboard. Max 10 tags, each max 64 characters, lowercase alphanumeric with hyphens, underscores, dots, or colons. Tags are echoed in pellet_metadata for non-streaming responses.
Example — text completion
curl -X POST https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
],
"temperature": 0.5,
"max_tokens": 256
}'Response
{
"id": "pel_a1b2c3d4e5f6g7h8",
"object": "chat.completion",
"created": 1710000000,
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits (qubits)..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 64,
"total_tokens": 92
},
"pellet_metadata": {
"routing_decision": "auto",
"task_type": "qa",
"complexity_score": 2.1,
"model_confidence": 0.87,
"latency_ms": 342,
"cost_usd": 0.000031
}
}Tool Use (Function Calling)
Enable your model to call functions by passing tools in the request. When the model decides to use a tool, the response includes a tool_calls array instead of text content. You execute the function, then send the result back as a tool-role message to continue the conversation.
OpenAI-compatible. Tool use follows the same format as the OpenAI API. Existing code using the OpenAI SDK works without changes. For best results, use models with strong tool-use support such as meta-llama/Llama-3.3-70B-Instruct-Turbo or deepseek-ai/DeepSeek-V3.1.
Tool definition format
typestringrequiredMust be function.
function.namestringrequiredThe function name the model will reference when calling this tool.
function.descriptionstringDescribes what the function does. Helps the model decide when to use it.
function.parametersobjectA JSON Schema object defining the function's parameters.
Step 1 — Send tools with your request
curl -X POST https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"messages": [
{"role": "user", "content": "What is the weather in Delhi?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}'Response — model calls the tool
{
"id": "pel_a1b2c3d4e5f6g7h8",
"object": "chat.completion",
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"Delhi\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": { "prompt_tokens": 85, "completion_tokens": 22, "total_tokens": 107 }
}Step 2 — Send the tool result back
curl -X POST https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"messages": [
{"role": "user", "content": "What is the weather in Delhi?"},
{"role": "assistant", "tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"Delhi\"}"}}]},
{"role": "tool", "tool_call_id": "call_abc123", "content": "{\"temp_c\": 38, \"condition\": \"Sunny\"}"}
],
"tools": [...]
}'Example — Python with OpenAI SDK
from openai import OpenAI
import json
client = OpenAI(
base_url="https://getpellet.io/v1",
api_key="pk_live_your_key"
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}]
# First call — model may request a tool call
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "What's the weather in Delhi?"}],
tools=tools,
tool_choice="auto",
)
message = response.choices[0].message
if message.tool_calls:
# Execute the function (your logic here)
tool_call = message.tool_calls[0]
result = {"temp_c": 38, "condition": "Sunny"}
# Second call — send the result back
final = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[
{"role": "user", "content": "What's the weather in Delhi?"},
message, # assistant message with tool_calls
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result),
},
],
tools=tools,
)
print(final.choices[0].message.content)
else:
print(message.content)Vision (Image Understanding)
/v1/chat/completionsSend images alongside text using the same chat completions endpoint. Use multimodal content parts in the content field. When image content is detected, Pellet automatically routes to a vision model.
Vision models: Pellet routes to Llama 3.2 11B Vision (fastest/auto) or Llama 3.2 90B Vision (quality mode). You can also specify a vision model directly via the model parameter.
Content parts format
typestringrequiredEither text or image_url.
textstringText content (when type is text).
image_urlobjectImage object with a url field. Supports base64 data URLs:data:image/png;base64,...
Example — describe an image
curl -X POST https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgo..."
}
},
{
"type": "text",
"text": "What is in this image? Extract any text you see."
}
]
}
],
"max_tokens": 512
}'Example — Python with base64 image
import base64
import requests
# Read image and encode to base64
with open("receipt.png", "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")
resp = requests.post(
"https://getpellet.io/v1/chat/completions",
headers={"Authorization": "Bearer pk_live_your_key"},
json={
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
{"type": "text", "text": "Extract all items and totals from this receipt as JSON."}
]
}],
"max_tokens": 1024
}
)
print(resp.json()["choices"][0]["message"]["content"])POST /v1/audio/transcriptions
/v1/audio/transcriptionsOpenAI-compatible speech-to-text endpoint. Upload an audio file and get a text transcription powered by Whisper models for ultra-fast results.
Request (multipart form)
filefilerequiredThe audio file to transcribe. Max 25MB. Supported formats: mp3, wav, ogg, flac, m4a, webm.
modelstringDefault: whisper-large-v3-turboSTT model to use. Options: whisper-large-v3-turbo (fastest), whisper-large-v3 (highest quality), distil-whisper-large-v3-en (English-only, fast).
languagestringDefault: auto-detectISO 639-1 language code (e.g. en, es, fr). Auto-detected if not specified.
response_formatstringDefault: jsonResponse format: json (text only), verbose_json (with segments and metadata), text (plain text).
Example — cURL
curl -X POST https://getpellet.io/v1/audio/transcriptions \ -H "Authorization: Bearer pk_live_your_key" \ -F file=@meeting_recording.mp3 \ -F model=whisper-large-v3-turbo \ -F response_format=verbose_json
Example — Python
import requests
resp = requests.post(
"https://getpellet.io/v1/audio/transcriptions",
headers={"Authorization": "Bearer pk_live_your_key"},
files={"file": open("recording.mp3", "rb")},
data={"model": "whisper-large-v3-turbo"}
)
print(resp.json()["text"])Response (verbose_json)
{
"text": "Hello, welcome to our weekly standup meeting...",
"task": "transcribe",
"language": "en",
"duration": 127.5,
"segments": [
{
"start": 0.0,
"end": 3.2,
"text": "Hello, welcome to our weekly standup meeting."
},
{
"start": 3.5,
"end": 7.1,
"text": "Let's start with updates from the engineering team."
}
],
"pellet_metadata": {
"id": "pel_a1b2c3d4e5f6g7h8",
"model": "whisper-large-v3-turbo",
"latency_ms": 1842
}
}whisper-large-v3-turbo
Fastest. Multilingual.
whisper-large-v3
Highest accuracy. Multilingual.
distil-whisper-large-v3-en
English-only. Very fast.
Streaming
Set stream: true to receive responses as server-sent events (SSE). Works with both text and vision requests. Each event contains a delta chunk.
curl -X POST https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Write a haiku about APIs"}],
"stream": true
}'SSE event format
data: {"id":"pel_...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Silent"},"index":0}]}
data: {"id":"pel_...","object":"chat.completion.chunk","choices":[{"delta":{"content":" endpoints"},"index":0}]}
data: [DONE]Example — Python with OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://getpellet.io/v1",
api_key="pk_live_your_key"
)
stream = client.chat.completions.create(
model=None, # auto-route
messages=[{"role": "user", "content": "Explain REST APIs"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Routing Modes
Control how Pellet selects models using the pellet_config object. The routing engine classifies your task (12+ types), scores complexity (1-5), and picks the optimal model.
Full pellet_config options
{
"pellet_config": {
"routing_mode": "auto",
"max_latency_ms": 500,
"max_cost_per_token": 0.0001,
"model_allowlist": ["meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"],
"model_blocklist": ["deepseek-ai/DeepSeek-R1"]
}
}routing_modestringDefault: autoOne of auto, fastest, cheapest, quality.
max_latency_msinteger | nullMaximum acceptable latency. Filters out slower models.
max_cost_per_tokenfloat | nullMaximum cost per token. Filters out expensive models.
model_allowlistarray | nullOnly consider these models. Overrides routing engine selection.
model_blocklistarray | nullExclude these models from consideration.
Task types detected
POST /v1/routing/explain
/v1/routing/explainPreview what model Pellet would select without making an inference call. Useful for debugging routing decisions and understanding model selection. No tokens are consumed.
curl -X POST https://getpellet.io/v1/routing/explain \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Translate this to French: Hello world"}
]
}'Response
{
"model": "Qwen/Qwen2.5-7B-Instruct-Turbo",
"confidence": 0.87,
"task_type": "translation",
"complexity_score": 1.5,
"alternatives": [
{"model": "llama-3.1-8b-instant", "confidence": 0.68}
],
"reasoning": "Detected task type 'translation' with complexity 1.5/5.0. Selected Qwen/Qwen2.5-7B-Instruct-Turbo with confidence 0.87."
}GET /v1/models
/v1/modelsList all available models. Returns an OpenAI-compatible model list.
curl https://getpellet.io/v1/models \ -H "Authorization: Bearer pk_live_your_key"
Datasets
Curate training datasets from your production logs, JSONL uploads, or manual entries. Export as JSONL for fine-tuning. Accessed via the dashboard at /dashboard/datasets or programmatically via the dashboard API (JWT auth).
Creating datasets
Three ways to populate a dataset:
- From tagged logs — select tags when creating a dataset; the backend pulls matching logged requests and converts them to training examples (up to 1000 per creation).
- JSONL upload — drop a
.jsonlor.jsonfile (max 10MB) where each line is an OpenAI-format example:{"messages": [...]}. - Manual entry — add examples one at a time via the dashboard row editor.
Example format
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is..."}
]
}REST API endpoints
All dataset endpoints use the dashboard API with JWT authentication (obtained via OAuth login). Base path: /api/datasets.
| Method | Path | Purpose |
|---|---|---|
| POST | /api/datasets | Create (empty or from tags) |
| GET | /api/datasets | List user's datasets |
| GET | /api/datasets/:id | Get dataset details |
| PATCH | /api/datasets/:id | Update name/description |
| DELETE | /api/datasets/:id | Delete dataset + examples |
| GET | /api/datasets/:id/examples | List examples (paginated) |
| POST | /api/datasets/:id/examples/manual | Add manual example |
| POST | /api/datasets/:id/upload | Upload JSONL file |
| PATCH | /api/datasets/:id/examples/:eid | Edit example messages |
| DELETE | /api/datasets/:id/examples/:eid | Delete example |
| GET | /api/datasets/:id/export | Download as JSONL |
Organizations
Pellet uses a multi-tenant organization model. Each org is a shared workspace with its own API keys, datasets, billing wallet, and usage analytics. Users can belong to multiple orgs and switch between them via the dashboard sidebar.
Roles
| Role | Permissions |
|---|---|
| Admin | Everything — members, datasets, API keys, billing, settings, delete org |
| Editor | Dataset CRUD, fine-tuning; read-only API keys; no billing access |
| Viewer | Read-only datasets, logs, usage analytics; no API keys or billing |
Key Concepts
- Every user gets a Personal workspace on signup (cannot be deleted or left)
- Create additional orgs for team collaboration via the org switcher
- Invite members by email — pending invites auto-link when the invitee signs up
- All resource endpoints are scoped:
/api/orgs/:slug/keys,/api/orgs/:slug/datasets, etc. - Dashboard URLs are org-scoped:
/dashboard/:slug/datasets - Orgs are soft-deleted — data is preserved and the slug is freed for reuse
Invite Flow
Admins invite members via the Members page. If the invitee already has a Pellet account, they see a banner on their dashboard with Accept/Decline buttons. If the invitee doesn't have an account yet, the invitation waits — when they sign up with the matching email, they're automatically added to the org.
Error Handling
Pellet uses standard HTTP status codes. Errors return a JSON body with an error object.
400Bad request — invalid parameters or unsupported file type401Unauthorized — missing or invalid API key422Validation error — invalid request body or message role413File too large — exceeds 25MB limit429Rate limited — too many requests502Upstream error — model returned an error503Service unavailable — feature not configuredError response format
{
"detail": {
"error": {
"message": "Rate limit exceeded. Try again in 3 seconds.",
"type": "rate_limit_error"
}
}
}Rate Limits
Rate limits are applied per API key using a sliding window. When rate limited, the API returns a 429 status code.
| Plan | Requests / min | Requests / day |
|---|---|---|
| Free | 100 | 1,000 |
| Developer | 100 | 50,000 |
SDK / Libraries
Use the official Pellet SDKs for typed responses, intelligent routing configuration, and Pellet-specific features like pellet_metadata and routing.explain().
Python — pellet-ai on PyPI · GitHub
pip install pellet-ai
from pellet import Pellet
client = Pellet(api_key="pk_live_your_key")
response = client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
],
tags=["production", "onboarding-v2"]
)
print(response.choices[0].message.content)
print(response.pellet_metadata.task_type) # e.g. "general_qa"
print(response.pellet_metadata.cost_usd) # e.g. 0.00012
print(response.pellet_metadata.tags) # ["production", "onboarding-v2"]Node.js / TypeScript — pellet-ai on npm · GitHub
npm install pellet-ai
import Pellet from "pellet-ai";
const client = new Pellet({ apiKey: "pk_live_your_key" });
const response = await client.chat.completions.create({
messages: [
{ role: "user", content: "What is machine learning?" }
],
tags: ["production", "onboarding-v2"],
});
console.log(response.choices[0].message.content);
console.log(response.pelletMetadata?.taskType); // e.g. "general_qa"
console.log(response.pelletMetadata?.costUsd); // e.g. 0.00012
console.log(response.pelletMetadata?.tags); // ["production", "onboarding-v2"]cURL
curl -X POST https://getpellet.io/v1/chat/completions \
-H "Authorization: Bearer pk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}],
"pellet_config": {"routing_mode": "fastest"}
}'OpenAI SDK compatible: You can also use the official OpenAI Python or Node SDK with base_url="https://getpellet.io/v1".