Stagehand Evals | Stagehand

Stagehand Agent Evals

See how your favorite models perform on various computer use benchmarks. Compare across accuracy, cost, and speed.

Accuracy

Rank	Model	Provider	Accuracy (%)	Cost/Task ($)	Speed (S)
1	claude-fable-5	Anthropic	90.62	0.522	530.00
2	claude-opus-5	Anthropic	85.71	0.528	220.00
3	gpt-5.6-sol	OpenAI	85.71	0.947	168.00
4	claude-opus-4-8	Anthropic	83.33	0.448	184.68
5	gpt-5.6-terra	OpenAI	78.57	0.369	170.00
6	claude-sonnet-5	Anthropic	78.57	0.427	177.00
7	gpt-5.5	OpenAI	76.19	1.275	149.18
8	gemini-3.5-flash	Google	73.81	0.097	79.80
9	grok-4.5	xAI	73.81	0.611	309.00
10	gemini-3-flash-preview	Google	71.81	0.049	112.57
11	gpt-5.6-luna	OpenAI	71.43	0.191	130.00
12	claude-opus-4-7	Anthropic	69.05	0.768	116.45
13	claude-sonnet-4-6	Anthropic	66.67	0.483	161.11
14	claude-haiku-4-5	Anthropic	61.90	0.172	182.45
15	claude-sonnet-4-5	Anthropic	59.52	0.573	215.41
16	gemini-2.5-computer-use-preview-10-2025	Google	42.86	0.084	161.20
17	gpt-5.4	OpenAI	42.86	0.397	229.29
18	gpt-5.4-mini	OpenAI	38.10	0.058	67.05
19	grok-4.3	xAI	35.71	0.096	178.00

Top performers

Top accuracy

90.6%

claude-fable-5

Lowest cost

$0.0489

gemini-3-flash-preview

Fastest speed

67.05s

gpt-5.4-mini

We can help you train and evaluate your models.

Get a custom evaluation of your models on Stagehand and Browserbase.

Get in touch Read the docs

Stagehand Agent Evals

See how your favorite models perform on various computer use benchmarks. Compare across accuracy, cost, and speed.

Accuracy

Rank	Model	Provider	Accuracy (%)	Cost/Task ($)	Speed (S)
1	claude-fable-5	Anthropic	90.62	0.522	530.00
2	claude-opus-5	Anthropic	85.71	0.528	220.00
3	gpt-5.6-sol	OpenAI	85.71	0.947	168.00
4	claude-opus-4-8	Anthropic	83.33	0.448	184.68
5	gpt-5.6-terra	OpenAI	78.57	0.369	170.00
6	claude-sonnet-5	Anthropic	78.57	0.427	177.00
7	gpt-5.5	OpenAI	76.19	1.275	149.18
8	gemini-3.5-flash	Google	73.81	0.097	79.80
9	grok-4.5	xAI	73.81	0.611	309.00
10	gemini-3-flash-preview	Google	71.81	0.049	112.57
11	gpt-5.6-luna	OpenAI	71.43	0.191	130.00
12	claude-opus-4-7	Anthropic	69.05	0.768	116.45
13	claude-sonnet-4-6	Anthropic	66.67	0.483	161.11
14	claude-haiku-4-5	Anthropic	61.90	0.172	182.45
15	claude-sonnet-4-5	Anthropic	59.52	0.573	215.41
16	gemini-2.5-computer-use-preview-10-2025	Google	42.86	0.084	161.20
17	gpt-5.4	OpenAI	42.86	0.397	229.29
18	gpt-5.4-mini	OpenAI	38.10	0.058	67.05
19	grok-4.3	xAI	35.71	0.096	178.00

Top performers

Top accuracy

90.6%

claude-fable-5

Lowest cost

$0.0489

gemini-3-flash-preview

Fastest speed

67.05s

gpt-5.4-mini

We can help you train and evaluate your models.

Get a custom evaluation of your models on Stagehand and Browserbase.

Get in touch Read the docs