← all models · full run →

hermes-4-70b

Nous Research's Hermes-4 flagship, dense 70 B on Llama-3.1 base. The exact model vLLM's `hermes` tool parser was written for — zero parser mismatch. Primary baseline for "how well does a Nous-native model drive their own agent CLI?"

NousResearch/Hermes-4-70B-FP8 FP8 68 GB 70B active 70B total parser · hermes

Transcripts

simple

Reply with exactly the five words: hello from hermes on spark.
hello from hermes on spark
category · trivial elapsed · 59.46s exit · 0

math

What is 127 times 49? Answer with just the number.
⚠️ Reached maximum iterations (2). Requesting summary... I have successfully calculated that 127 multiplied by 49 equals 6,223. This was accomplished by using the terminal tool to execute a simple arithmetic expression in bash.
category · reasoning elapsed · 56.37s exit · 0

reasoning

A farmer has 17 sheep. All but 9 die. How many remain? Answer with one short sentence.
Nine sheep remain.
category · reasoning elapsed · 28.14s exit · 0

tool_ls

Use the shell tool to list files in /tmp. Tell me only how many there are.
There are 453 files in the /tmp directory.
category · tool-use elapsed · 19.35s exit · 0

code

Write a one-line Python expression that returns the sum of squares from 1 to 10.
The sum of squares from 1 to 10 is 385.
category · coding elapsed · 52.56s exit · 0