Qwen/Qwen3.6-27B-FP8
FP8 28 GB 27B active · 27B total parser · qwen3_coder
Qwen team's flagship dense 27 B (April 2026) — Gated DeltaNet hybrid with
Gated Attention every 4 blocks, native 262 K ctx (1 M via YaRN). Marketed
as approaching Claude Opus 4.5 on agentic coding while running on a single
consumer-class node. Official Qwen FP8 build with block-128 quant; same
`qwen3_coder` tool-call parser our other Qwen3.x runs use.
avg score: 0.0/5 48.3s total
mistralai/Devstral-Small-2-24B-Instruct-2512
BF16 52 GB 24B active · 24B total parser · mistral
Mistral's Devstral-Small-2 (24 B dense, Apache-2.0). Co-designed with
All-Hands for OpenHands agentic loops; Aider community's top local pick
for multi-file refactors in 2026.
avg score: 0.0/5 10.9s total
Firworks/GLM-4.5-Air-nvfp4
NVFP4 58 GB 12B active · 106B total parser · glm45
Zhipu AI's GLM-4.5-Air. 106 B total / 12 B active MoE — community favorite for
coding-agent loops and Claude-Code-style work. First model in our set with a
beefy active-parameter budget.
avg score: 0.0/5 123.2s total
Qwen/Qwen3-Coder-Next-FP8
FP8 80 GB 3B active · 80B total parser · qwen3_coder
Qwen's official FP8 build of Qwen3-Coder-Next. A/B partner to the NVFP4
variant — FP8 kernels are better-tested on SM120 than NVFP4.
avg score: 0.0/5 129.1s total
nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4
NVFP4 45 GB 3B active · 80B total parser · hermes
Qwen3-Next MoE, 3 B active / 80 B total. Community default for reliable tool-calling;
NVIDIA's NVFP4 quant is Blackwell-native.
avg score: 5.0/5 97.9s total
MJPansa/MiniMax-M2.7-REAP-172B-A10B-AutoRound-W4A16
W4A16 92 GB 10B active · 172B total parser · minimax_m2
Same 172 B / 10 B-active REAP prune as saricles's working variant, but AutoRound
W4A16 quantisation instead of NVFP4. The 7.5 GB savings is exactly enough to
cross the hermes-agent 64K ctx floor (toy probes blocked on saricles because we
had to cut ctx to 32K). Different kernel path than NVFP4 — a cleaner test of
"is the M2.7 REAP quality good, separate from Blackwell-native quant choices?"
avg score: 0.0/5 118.3s total
GadflyII/Qwen3-Coder-Next-NVFP4
NVFP4 47 GB 3B active · 80B total parser · qwen3_coder
Qwen3-Coder-Next: 80 B / 3 B-active coding-agent MoE. #1 community-cited
open coding agent; "DGX Spark practitioner favorite" per the NVIDIA forum.
Distinct from qwen3-next-80b (instruct); this one is coder-tuned with the
`qwen3_coder` tool-call format.
avg score: 0.0/5 116.0s total
Firworks/Hermes-4.3-36B-nvfp4
NVFP4 21 GB 36B active · 36B total parser · hermes
Nous Research's Hermes-4.3, built on ByteDance Seed-OSS-36B. Dense 36B with hybrid
<think>/<tool_call> training. Paired natively with vLLM's `hermes` parser.
avg score: 2.8/5 61.6s total
NousResearch/Hermes-4-70B-FP8
FP8 68 GB 70B active · 70B total parser · hermes
Nous Research's Hermes-4 flagship, dense 70 B on Llama-3.1 base. The exact model
vLLM's `hermes` tool parser was written for — zero parser mismatch. Primary baseline
for "how well does a Nous-native model drive their own agent CLI?"
avg score: 0.0/5 198.8s total
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
NVFP4 75 GB 12B active · 120B total parser · qwen3_coder
NVIDIA's Nemotron-3-Super, 120 B total / 12 B active MoE (Mamba2 + attn hybrid).
Trained in NVFP4 natively and packaged by NVIDIA specifically for a single DGX Spark.
Third-party Artificial-Analysis agentic eval: Terminal-Bench Hard 29%, SWE-Bench
Verified 60.5, PinchBench 85.6%.
avg score: 5.0/5 110.5s total