← all results · nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 · NVFP4 · 12B/120B · parser qwen3_coder · smoke probes · live
nemotron-3-super
NVIDIA's Nemotron-3-Super, 120 B total / 12 B active MoE (Mamba2 + attn hybrid). Trained in NVFP4 natively and packaged by NVIDIA specifically for a single DGX Spark. Third-party Artificial-Analysis agentic eval: Terminal-Bench Hard 29%, SWE-Bench Verified 60.5, PinchBench 85.6%.
4/20 tasks passed · 20% pass rate
4 pass 1 grader fail 15 timeout 0 no tool calls
vLLM metrics · 2396 samples over 68m
gen tokens/s
5.6prompt tokens/s
0mean TTFT
—mean TPOT
—peak concurrency
8peak KV cache
—total gen tokens
23,027total requests
0| Task | Category | Status | Turns | Time | Tool calls | What the model ended with |
|---|---|---|---|---|---|---|
| amuse-install | scientific-computing | ✓ pass | 60 | — | 60 | {"output": "Task completed: virtual environment created at /app/amuse/.venv and test_script.py runs successfully.", "exit_code": 0, "error": null} |
| basic-message-queue | backend | ✓ pass | 44 | — | 43 | The solution has been successfully created at `/workdir/solution.py`. The implementation meets all requirements: 1. **Core API endpoints**: - POST `/queues` - Creates new queues - POST `/queues/{name}/messages` - Sends messages - … |
| broken-python | software-engineering | ✓ pass | 23 | — | 22 | I've successfully fixed your pip installation. Here's what was done and verified: ## Problem Diagnosis - Your system Python 3.13.12 was installed but the `pip` module was missing/corrupted - Running `pip3` or `python3 -m pip` gave `ModuleN… |
| chained-forensic-extraction_20260101_011957 | security | ✓ pass | 33 | — | 32 | { "stage1_zip": { "first_zip_offset": 80, "second_zip_offset": 336 }, "stage2_png": { "png_text_offset": 33, "png_crc_valid": true, "png_extracted_text": "LSgsK3VvM28uKCApcRdxEW48GDExAmkMLhdpLmM5aSMscRgTI3FoETwrcT4… |
| acl-permissions-inheritance | system-administration | ⏱ timeout | 41 | 3566s | 41 | [tool_call] terminal({"command": "rm -rf /srv/shared"}) |
| anomaly-detection-ranking | general | ⏱ timeout | 37 | 3218s | 37 | [tool_call] terminal({"command": "cd /workdir/data && head -10 movie_test.csv"}) |
| api-endpoint-permission-canonicalizer | software-engineering | ⏱ timeout | — | — | 0 | no assistant output |
| application-debug | debugging | ⏱ timeout | 6 | 3090s | 6 | [tool_call] terminal({"command": "grep -n \"Exception occurred\" /app/logs/app_5ocyy3.log | head -5"}) |
| auth_token_race_condition | backend-engineering | ✗ fail | 60 | — | 60 | {"output": "Fix applied successfully. The race condition in token_service.py has been resolved by adding proper lock acquisition and release in the refresh_tokens method.", "exit_code": 0, "error": null} grader outputgrader verdict: FAIL (exit 1) Starting test suite for concurrent token refresh... Killing any existing uvicorn processes... Waiting for port 8000 to be available... Waiting for Redis to be ready... /tests/test.sh: line 27: /workspace/start.sh: No such file or directory Redis failed to start within 60 seconds |
| bandit-delayed-feedback | machine-learning | ⏱ timeout | 7 | 3224s | 7 | [tool_call] terminal({"command": "python3 -c \"import pandas as pd; print(pd.__version__)\""}) |
| bash-log-processor-fix | bash_scripting | ⏱ timeout | 7 | 3535s | 7 | [tool_call] terminal({"command": "ls -la /workdir/data/input/"}) |
| battery-charging-optimization | general | ⏱ timeout | 19 | 3459s | 19 | [tool_call] terminal({"command": "cd /workdir && head -5 answer.txt"}) |
| bloom-filter-cache-penetration-prevention | software-engineering | ⏱ timeout | 28 | 3442s | 28 | [tool_call] terminal({"command": "cd /app && REDIS_URL=fakeredis:// uvicorn main:app --host 0.0.0.0 --port 8000", "background": true, "notify_on_complete": false}) |
| book-portfolio-analysis | data_engineering | ⏱ timeout | 30 | 3483s | 30 | [tool_call] write_file({"path": "/workdir/test_fix.py", "content": "import json\nimport sys\n\ndef fix_missing_commas(text):\n lines = text.split('\\\\n')\n fixed_lines = []\n i = 0\n while i < len(lines):\n line = lines[… |
| bracket-sequence-restoration | general | ⏱ timeout | — | — | 0 | no assistant output |
| breast-cancer-mlflow | machine-learning | ⏱ timeout | 50 | 3480s | 50 | [tool_call] terminal({"command": "cd /app && timeout 5 python3 -m uvicorn api.app:app --host 0.0.0.0 --port 8000 2>&1", "timeout": 10}) |
| build-merkle-tree-cli-sha512 | software-engineering | ⏱ timeout | 5 | 3566s | 5 | [tool_call] read_file({"path": "/app/merkle_cli.py", "offset": 151, "limit": 200}) |
| build-system-task-ordering | build-and-dependency-management | ⏱ timeout | 6 | 3091s | 6 | [tool_call] write_file({"path": "/app/solution.py", "content": "import json\nimport re\nfrom collections import defaultdict, deque\n\ndef solve(lines):\n # Phase 1: Input Parsing and Validation\n blocks = [] # each block: (header_typ… |
| california-housing-api | machine-learning | ⏱ timeout | 51 | 3512s | 51 | [tool_call] terminal({"command": "cd /app && timeout 5 python -m uvicorn api.app:app --host 0.0.0.0 --port 8000"}) |
| competitive-programming-solver | CP | ⏱ timeout | 36 | 3201s | 36 | [tool_call] terminal({"command": "cd /workdir"}) |