Model Comparison

14 open-source models across Groq, Together, and Fireworks — text LLMs and speech-to-text. Pellet routes each request to the best model for cost, speed, and quality.

#1

Whisper Large v3

whisper-large-v3

Speech
Params

1.5B

Avg Score

95%

$/hr

$0.111

Output

Latency

~500ms Med

Best At
Speech
#2

Whisper Large v3 Turbo

whisper-large-v3-turbo

Speech
Params

809M

Avg Score

90%

$/hr

$0.040

Output

Latency

~300ms Fast

Best At
Speech
#3

DeepSeek R1

deepseek-ai/DeepSeek-R1

Flagship
Params

685B

Avg Score

89%

Input $/1M

$0.55

Output $/1M

$2.19

Latency

~2.5s Slow

Best At
ReasoningClassificationExtraction
#4

Llama 3.3 70B

llama-3.3-70b-versatile

Flagship
Params

70B

Avg Score

88%

Input $/1M

$0.59

Output $/1M

$0.79

Latency

~400ms Fast

Best At
FormattingStructuredReasoning
#5

Llama 3.3 70B Turbo

meta-llama/Llama-3.3-70B-Instruct-Turbo

Flagship
Params

70B

Avg Score

88%

Input $/1M

$0.59

Output $/1M

$0.79

Latency

~1.5s Slow

Best At
FormattingStructuredReasoning
#6

DeepSeek V3.1

deepseek-ai/DeepSeek-V3.1

Flagship
Params

685B

Avg Score

86%

Input $/1M

$0.50

Output $/1M

$1.50

Latency

~2.0s Slow

Best At
ClassificationExtractionModeration
#7

Distil Whisper v3 EN

distil-whisper-large-v3-en

Speech
Params

756M

Avg Score

85%

$/hr

$0.020

Output

Latency

~200ms Fast

Best At
Speech
#8

Mistral Small 24B

mistralai/Mistral-Small-24B-Instruct-2501

Power
Params

24B

Avg Score

80%

Input $/1M

$0.10

Output $/1M

$0.30

Latency

~1.2s Slow

Best At
Code GenReasoningQ&A
#9

Qwen 3.5 9B

Qwen/Qwen3.5-9B

Mid-Range
Params

9B

Avg Score

78%

Input $/1M

$0.06

Output $/1M

$0.10

Latency

~900ms Med

Best At
ReasoningCode GenQ&A
#10

Mixtral 8x7B

mistralai/Mixtral-8x7B-Instruct-v0.1

MoE
Params

47B

Avg Score

72%

Input $/1M

$0.24

Output $/1M

$0.24

Latency

~1.0s Med

Best At
ReasoningContent GenQ&A
#11

Qwen 2.5 7B

Qwen/Qwen2.5-7B-Instruct-Turbo

Mid-Range
Params

7B

Avg Score

72%

Input $/1M

$0.05

Output $/1M

$0.08

Latency

~800ms Med

Best At
Code GenReasoningClassification
#12

Llama 3.1 8B

llama-3.1-8b-instant

Mid-Range
Params

8B

Avg Score

71%

Input $/1M

$0.05

Output $/1M

$0.08

Latency

~200ms Fast

Best At
FormattingStructuredReasoning
#13

Llama 3 8B Lite

meta-llama/Meta-Llama-3-8B-Instruct-Lite

Mid-Range
Params

8B

Avg Score

63%

Input $/1M

$0.05

Output $/1M

$0.08

Latency

~800ms Med

Best At
ReasoningClassificationContent Gen
#14

Gemma 3n E4B

google/gemma-3n-E4B-it

Lightweight
Params

4B

Avg Score

61%

Input $/1M

$0.03

Output $/1M

$0.06

Latency

~800ms Med

Best At
ReasoningClassificationQ&A

Routing Confidence Scores

How confidently Pellet routes each task type to each model. Higher = better fit.

90+ Excellent 80-89 Good 70-79 Fair <70 Limited
ModelClassificationCode GenContent GenExtractionFormattingModerationQ&AReasoningSentimentSpeechStructuredSummaryTranslation
Gemma 3n E4B
4B
6540606258606583620586058
Llama 3.1 8B
8B
6865706879687275700797268
Qwen 2.5 7B
7B
7485727068707478700687068
Qwen 3.5 9B
9B
7682787674768088760747876
Llama 3 8B Lite
8B
6560656262626568620626562
Mixtral 8x7B
47B
7270747270727476720707472
Mistral Small 24B
24B
8185807978798285790788079
Llama 3.3 70B
70B
8788868792878890870928685
Llama 3.3 70B Turbo
70B
8788868792878890870928685
DeepSeek V3.1
685B
8987848982898988850828482
DeepSeek R1
685B
9189869185919195880858691
Whisper Large v3 Turbo
809M
00000000090000
Whisper Large v3
1.5B
00000000095000
Distil Whisper v3 EN
756M
00000000085000

Start building with Pellet

$2.50 in free credits. No credit card required.

Get Started Free