← all results · nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 · NVFP4 · 12B/120B · parser qwen3_coder · smoke probes · live

nemotron-3-super

NVIDIA's Nemotron-3-Super, 120 B total / 12 B active MoE (Mamba2 + attn hybrid). Trained in NVFP4 natively and packaged by NVIDIA specifically for a single DGX Spark. Third-party Artificial-Analysis agentic eval: Terminal-Bench Hard 29%, SWE-Bench Verified 60.5, PinchBench 85.6%.

4/20 tasks passed · 20% pass rate

4 pass 1 grader fail 15 timeout 0 no tool calls
vLLM metrics · 2396 samples over 68m
gen tokens/s
5.6
prompt tokens/s
0
mean TTFT
mean TPOT
peak concurrency
8
peak KV cache
total gen tokens
23,027
total requests
0
Task Category Status Turns Time Tool calls What the model ended with
amuse-install scientific-computing ✓ pass 60 60 {"output": "Task completed: virtual environment created at /app/amuse/.venv and test_script.py runs successfully.", "exit_code": 0, "error": null}
basic-message-queue backend ✓ pass 44 43 The solution has been successfully created at `/workdir/solution.py`. The implementation meets all requirements: 1. **Core API endpoints**: - POST `/queues` - Creates new queues - POST `/queues/{name}/messages` - Sends messages - …
broken-python software-engineering ✓ pass 23 22 I've successfully fixed your pip installation. Here's what was done and verified: ## Problem Diagnosis - Your system Python 3.13.12 was installed but the `pip` module was missing/corrupted - Running `pip3` or `python3 -m pip` gave `ModuleN…
chained-forensic-extraction_20260101_011957 security ✓ pass 33 32 { "stage1_zip": { "first_zip_offset": 80, "second_zip_offset": 336 }, "stage2_png": { "png_text_offset": 33, "png_crc_valid": true, "png_extracted_text": "LSgsK3VvM28uKCApcRdxEW48GDExAmkMLhdpLmM5aSMscRgTI3FoETwrcT4…
acl-permissions-inheritance system-administration ⏱ timeout 41 3566s 41 [tool_call] terminal({"command": "rm -rf /srv/shared"})
anomaly-detection-ranking general ⏱ timeout 37 3218s 37 [tool_call] terminal({"command": "cd /workdir/data && head -10 movie_test.csv"})
api-endpoint-permission-canonicalizer software-engineering ⏱ timeout 0 no assistant output
application-debug debugging ⏱ timeout 6 3090s 6 [tool_call] terminal({"command": "grep -n \"Exception occurred\" /app/logs/app_5ocyy3.log | head -5"})
auth_token_race_condition backend-engineering ✗ fail 60 60 {"output": "Fix applied successfully. The race condition in token_service.py has been resolved by adding proper lock acquisition and release in the refresh_tokens method.", "exit_code": 0, "error": null}
grader output
grader verdict: FAIL (exit 1)
Starting test suite for concurrent token refresh...
Killing any existing uvicorn processes...
Waiting for port 8000 to be available...
Waiting for Redis to be ready...
/tests/test.sh: line 27: /workspace/start.sh: No such file or directory
Redis failed to start within 60 seconds
bandit-delayed-feedback machine-learning ⏱ timeout 7 3224s 7 [tool_call] terminal({"command": "python3 -c \"import pandas as pd; print(pd.__version__)\""})
bash-log-processor-fix bash_scripting ⏱ timeout 7 3535s 7 [tool_call] terminal({"command": "ls -la /workdir/data/input/"})
battery-charging-optimization general ⏱ timeout 19 3459s 19 [tool_call] terminal({"command": "cd /workdir && head -5 answer.txt"})
bloom-filter-cache-penetration-prevention software-engineering ⏱ timeout 28 3442s 28 [tool_call] terminal({"command": "cd /app && REDIS_URL=fakeredis:// uvicorn main:app --host 0.0.0.0 --port 8000", "background": true, "notify_on_complete": false})
book-portfolio-analysis data_engineering ⏱ timeout 30 3483s 30 [tool_call] write_file({"path": "/workdir/test_fix.py", "content": "import json\nimport sys\n\ndef fix_missing_commas(text):\n lines = text.split('\\\\n')\n fixed_lines = []\n i = 0\n while i < len(lines):\n line = lines[…
bracket-sequence-restoration general ⏱ timeout 0 no assistant output
breast-cancer-mlflow machine-learning ⏱ timeout 50 3480s 50 [tool_call] terminal({"command": "cd /app && timeout 5 python3 -m uvicorn api.app:app --host 0.0.0.0 --port 8000 2>&1", "timeout": 10})
build-merkle-tree-cli-sha512 software-engineering ⏱ timeout 5 3566s 5 [tool_call] read_file({"path": "/app/merkle_cli.py", "offset": 151, "limit": 200})
build-system-task-ordering build-and-dependency-management ⏱ timeout 6 3091s 6 [tool_call] write_file({"path": "/app/solution.py", "content": "import json\nimport re\nfrom collections import defaultdict, deque\n\ndef solve(lines):\n # Phase 1: Input Parsing and Validation\n blocks = [] # each block: (header_typ…
california-housing-api machine-learning ⏱ timeout 51 3512s 51 [tool_call] terminal({"command": "cd /app && timeout 5 python -m uvicorn api.app:app --host 0.0.0.0 --port 8000"})
competitive-programming-solver CP ⏱ timeout 36 3201s 36 [tool_call] terminal({"command": "cd /workdir"})