A comprehensive index of open-weight models people mention using with
hermes-agent and peer agentic CLIs (Cline, Roo-Code, Aider, OpenHands,
opencode, Goose, mini-SWE-agent, Qwen-Code). Every row is sourced. For each
entry we note where it fits on our own hardware roadmap — already
benchmarked, planned, or out-of-reach. Compiled 2026-04-22 from two parallel
research passes across GitHub issues, the NVIDIA DGX Spark forum, HuggingFace
discussions, r/LocalLLaMA, Hacker News, Unsloth/vLLM/SGLang recipe docs,
Latent.Space's top-local-models roundup, and agentic-CLI documentation.
Status legend
Our correspondence for each model — does it fit on our current or planned
hardware, and at what phase of testing?
benchmarked
We've run TBLite 20-task pilot against this model — result link goes to transcripts.
planned
Has a slot on the hardware roadmap. Will run once the machine joins the rig.
candidate
Hasn't made our shortlist yet but fits our existing hardware — worth adding.
oversized
Does not fit any current or planned machine we control (single 119 GB Spark, 96 GB Dell, 238 GB 2× Spark, 512 GB Mac).
known-broken
Community reports say agentic tool-calling fails reliably or model architecture lacks tool support.
retired
Superseded by later versions and rarely cited in 2026 sources.
What the community converges on (and where it disagrees)
Convergence
Qwen3-Coder family is the default. Cline, Roo-Code, Aider,
opencode, Goose, Hermes Agent, and Qwen-Code docs all recommend
Qwen3-Coder-30B-A3B or Qwen3-Coder-Next first for local. Qwen3-Coder-Next
specifically is the
DGX Spark practitioner pick
per the April 2026 NVIDIA forum thread.
MoE with low active-param count (the A3B pattern) is the dominant
architecture for local agentic: Qwen3-Coder-30B-A3B,
Qwen3-Coder-Next-80B-A3B, GLM-4.7-Flash-30B-A3B, Nemotron-3-Nano-30B-A3B,
LongCat-Flash-Lite-68B-A3B, Qwen3.5-35B-A3B, Gemma-4-26B-A4B. Strongest
single trend in the corpus.
vLLM is the production default. Nearly every Hermes-agent,
OpenHands, Goose, and Roo-Code thread assumes vLLM or SGLang. Ollama is
treated as "easy mode" for smaller models; llama.cpp for Apple Silicon
fallback.
Tool-call parsers matter more than raw model score.
Multiple Cline issues (#1828,
#8130,
#8365) attribute
DeepSeek brokenness to parser mismatch, not model quality. We've seen the
same pattern: wrong tool parser = 0/20 regardless of model capability.
Disagreement
GLM-4.7 vs Qwen3-Coder-Next. Z.AI benchmarks claim GLM-4.7
leads on SWE-Bench Multilingual + Terminal-Bench; Qwen blog + DGX Spark
practitioners prefer Qwen3-Coder-Next for "experience quality." Both camps
cite their own benchmarks.
DeepSeek: great model, broken agent. DeepSeek-V3.2 shows up
on leaderboards but Cline's issue tracker is full of tool-calling failures.
Community split between "parser will improve" and "move on to GLM/Qwen."
Gemma-4 reception. Google + HN call it a breakthrough
(86.4% tool calling vs Gemma-3's 6.6%); r/LocalLLaMA hasn't internalized it
yet as of April 2026 — mostly Google/Android-blog evangelism, thin
practitioner adoption.
Hermes-4.3 as driver vs Hermes-Agent as runtime. Some users
treat Hermes-4.3 as the natural driver; the Hermes Agent docs themselves
recommend Kimi-K2.6, MiniMax-M2.5, or GLM-4.7 as the driver, treating the
"Hermes" brand as the agent runtime only.
Rising vs falling
Rising: GLM-4.7 / GLM-5.1 (biggest mover in Q1 2026),
MiniMax-M2.5 / M2.7, Qwen3-Coder-Next / Qwen3.5-35B-A3B, Seed-OSS-36B
(real vLLM parser upstreamed), Hermes-4.3 (50× agentic-trace post-training
expansion), Apriel-Nemotron-15B-Thinker.
Falling: Llama 3.x as agent driver (still the
Hermes-agent README default, almost no 2026 anecdotes), DeepSeek-R1 / V3
self-hosted (known-broken tool-calling), Codestral / Magistral (displaced by
Devstral for agents), Yi-Coder / Yi-Lightning (absent from 2026 discussion),
Granite-Code (appears only in aggregator lists).
Nous Research
Model
Size
Status
Community framing
Source
NousResearch/Hermes-4-405B-FP8
dense 405B · FP8
planned 2× Spark
Nous flagship; native <tool_call> tags, built-in vLLM/SGLang hermes parser.
Based on mention frequency across the surveyed threads, the clearest gaps — models
we should be running but aren't — are:
Qwen3-Coder-Next 80B-A3B — the single most-cited open coding
agent, explicitly the DGX Spark practitioner pick, ~43 t/s FP8 on a single
Spark. Distinct from Qwen3-Next-80B-A3B-Instruct (which we have).
highest priority
GLM-4.7-Flash 30B-A3B — purpose-built 24 GB-class agentic
model; fits Dell Blackwell comfortably; our existing GLM-4.5-Air leader
ought to compare against it.
MiMo-V2-Flash 309B/15A — claimed #1 SWE-Bench Verified
among open weights (73.4%); fits 2× Spark.
Devstral 2 / Devstral-Small-24B — co-designed with All
Hands for OpenHands; strong tool-use posterior. Small variant fits single
Spark.
Gemma-4 26B — the tool-calling reliability leap makes it a
must-include lightweight baseline, even if the community hasn't
internalized it yet.
Nemotron-3-Nano-30B-A3B — same lineage as our current
TBLite leader (Nemotron-3-Super), but 4× smaller — useful to measure how
much of Super's 4/20 came from Nemotron training vs sheer parameter count.
OpenHands-LM-32B — domain-fine-tuned for agent loops;
valuable comparison to general drivers.
LongCat-Flash-Lite 68.5B/A3B — 256K ctx, longcat
parser already upstream in Hermes Agent, very little community
benchmarking yet. Potential surprise.