Models

Model Comparison

14 open-source models across Groq, Together, and Fireworks — text LLMs and speech-to-text. Pellet routes each request to the best model for cost, speed, and quality.

Get Started Read the Docs

Whisper Large v3

whisper-large-v3

Speech

Params

1.5B

Avg Score

95%

$/hr

$0.111

Output

—

Latency

~500ms Med

Best At

Speech

Whisper Large v3 Turbo

whisper-large-v3-turbo

Speech

Params

809M

Avg Score

90%

$/hr

$0.040

Output

—

Latency

~300ms Fast

Best At

Speech

DeepSeek R1

deepseek-ai/DeepSeek-R1

Flagship

Params

685B

Avg Score

89%

Input $/1M

$0.55

Output $/1M

$2.19

Latency

~2.5s Slow

Best At

ReasoningClassificationExtraction

Llama 3.3 70B

llama-3.3-70b-versatile

Flagship

Params

70B

Avg Score

88%

Input $/1M

$0.59

Output $/1M

$0.79

Latency

~400ms Fast

Best At

FormattingStructuredReasoning

Llama 3.3 70B Turbo

meta-llama/Llama-3.3-70B-Instruct-Turbo

Flagship

Params

70B

Avg Score

88%

Input $/1M

$0.59

Output $/1M

$0.79

Latency

~1.5s Slow

Best At

FormattingStructuredReasoning

DeepSeek V3.1

deepseek-ai/DeepSeek-V3.1

Flagship

Params

685B

Avg Score

86%

Input $/1M

$0.50

Output $/1M

$1.50

Latency

~2.0s Slow

Best At

ClassificationExtractionModeration

Distil Whisper v3 EN

distil-whisper-large-v3-en

Speech

Params

756M

Avg Score

85%

$/hr

$0.020

Output

—

Latency

~200ms Fast

Best At

Speech

Mistral Small 24B

mistralai/Mistral-Small-24B-Instruct-2501

Power

Params

24B

Avg Score

80%

Input $/1M

$0.10

Output $/1M

$0.30

Latency

~1.2s Slow

Best At

Code GenReasoningQ&A

Qwen 3.5 9B

Qwen/Qwen3.5-9B

Mid-Range

Params

Avg Score

78%

Input $/1M

$0.06

Output $/1M

$0.10

Latency

~900ms Med

Best At

ReasoningCode GenQ&A

#10

Mixtral 8x7B

mistralai/Mixtral-8x7B-Instruct-v0.1

MoE

Params

47B

Avg Score

72%

Input $/1M

$0.24

Output $/1M

$0.24

Latency

~1.0s Med

Best At

ReasoningContent GenQ&A

#11

Qwen 2.5 7B

Qwen/Qwen2.5-7B-Instruct-Turbo

Mid-Range

Params

Avg Score

72%

Input $/1M

$0.05

Output $/1M

$0.08

Latency

~800ms Med

Best At

Code GenReasoningClassification

#12

Llama 3.1 8B

llama-3.1-8b-instant

Mid-Range

Params

Avg Score

71%

Input $/1M

$0.05

Output $/1M

$0.08

Latency

~200ms Fast

Best At

FormattingStructuredReasoning

#13

Llama 3 8B Lite

meta-llama/Meta-Llama-3-8B-Instruct-Lite

Mid-Range

Params

Avg Score

63%

Input $/1M

$0.05

Output $/1M

$0.08

Latency

~800ms Med

Best At

ReasoningClassificationContent Gen

#14

Gemma 3n E4B

google/gemma-3n-E4B-it

Lightweight

Params

Avg Score

61%

Input $/1M

$0.03

Output $/1M

$0.06

Latency

~800ms Med

Best At

ReasoningClassificationQ&A

Rank	Model	Params	Tier	Input $/1M	Output $/1M	Latency	Avg Score	Best At
1	Whisper Large v3 whisper-large-v3	1.5B	Speech	$0.111/hr	—	~500msMed	95%	Speech
2	Whisper Large v3 Turbo whisper-large-v3-turbo	809M	Speech	$0.040/hr	—	~300msFast	90%	Speech
3	DeepSeek R1 deepseek-ai/DeepSeek-R1	685B	Flagship	$0.55	$2.19	~2.5sSlow	89%	ReasoningClassificationExtraction
4	Llama 3.3 70B llama-3.3-70b-versatile	70B	Flagship	$0.59	$0.79	~400msFast	88%	FormattingStructuredReasoning
5	Llama 3.3 70B Turbo meta-llama/Llama-3.3-70B-Instruct-Turbo	70B	Flagship	$0.59	$0.79	~1.5sSlow	88%	FormattingStructuredReasoning
6	DeepSeek V3.1 deepseek-ai/DeepSeek-V3.1	685B	Flagship	$0.50	$1.50	~2.0sSlow	86%	ClassificationExtractionModeration
7	Distil Whisper v3 EN distil-whisper-large-v3-en	756M	Speech	$0.020/hr	—	~200msFast	85%	Speech
8	Mistral Small 24B mistralai/Mistral-Small-24B-Instruct-2501	24B	Power	$0.10	$0.30	~1.2sSlow	80%	Code GenReasoningQ&A
9	Qwen 3.5 9B Qwen/Qwen3.5-9B	9B	Mid-Range	$0.06	$0.10	~900msMed	78%	ReasoningCode GenQ&A
10	Mixtral 8x7B mistralai/Mixtral-8x7B-Instruct-v0.1	47B	MoE	$0.24	$0.24	~1.0sMed	72%	ReasoningContent GenQ&A
11	Qwen 2.5 7B Qwen/Qwen2.5-7B-Instruct-Turbo	7B	Mid-Range	$0.05	$0.08	~800msMed	72%	Code GenReasoningClassification
12	Llama 3.1 8B llama-3.1-8b-instant	8B	Mid-Range	$0.05	$0.08	~200msFast	71%	FormattingStructuredReasoning
13	Llama 3 8B Lite meta-llama/Meta-Llama-3-8B-Instruct-Lite	8B	Mid-Range	$0.05	$0.08	~800msMed	63%	ReasoningClassificationContent Gen
14	Gemma 3n E4B google/gemma-3n-E4B-it	4B	Lightweight	$0.03	$0.06	~800msMed	61%	ReasoningClassificationQ&A

Performance by Task

Routing Confidence Scores

How confidently Pellet routes each task type to each model. Higher = better fit.

90+ Excellent 80-89 Good 70-79 Fair <70 Limited

Model	Classification	Code Gen	Content Gen	Extraction	Formatting	Moderation	Q&A	Reasoning	Sentiment	Speech	Structured	Summary	Translation
Gemma 3n E4B 4B	65	40	60	62	58	60	65	83	62	0	58	60	58
Llama 3.1 8B 8B	68	65	70	68	79	68	72	75	70	0	79	72	68
Qwen 2.5 7B 7B	74	85	72	70	68	70	74	78	70	0	68	70	68
Qwen 3.5 9B 9B	76	82	78	76	74	76	80	88	76	0	74	78	76
Llama 3 8B Lite 8B	65	60	65	62	62	62	65	68	62	0	62	65	62
Mixtral 8x7B 47B	72	70	74	72	70	72	74	76	72	0	70	74	72
Mistral Small 24B 24B	81	85	80	79	78	79	82	85	79	0	78	80	79
Llama 3.3 70B 70B	87	88	86	87	92	87	88	90	87	0	92	86	85
Llama 3.3 70B Turbo 70B	87	88	86	87	92	87	88	90	87	0	92	86	85
DeepSeek V3.1 685B	89	87	84	89	82	89	89	88	85	0	82	84	82
DeepSeek R1 685B	91	89	86	91	85	91	91	95	88	0	85	86	91
Whisper Large v3 Turbo 809M	0	0	0	0	0	0	0	0	0	90	0	0	0
Whisper Large v3 1.5B	0	0	0	0	0	0	0	0	0	95	0	0	0
Distil Whisper v3 EN 756M	0	0	0	0	0	0	0	0	0	85	0	0	0

Start building with Pellet

$2.50 in free credits. No credit card required.

Get Started Free