← all results · GadflyII/Qwen3-Coder-Next-NVFP4 · NVFP4 · 3B/80B · parser qwen3_coder · smoke probes · live

qwen3-coder-next

Qwen3-Coder-Next: 80 B / 3 B-active coding-agent MoE. #1 community-cited open coding agent; "DGX Spark practitioner favorite" per the NVIDIA forum. Distinct from qwen3-next-80b (instruct); this one is coder-tuned with the `qwen3_coder` tool-call format.

2/20 tasks passed · 10% pass rate

2 pass 3 grader fail 15 timeout 0 no tool calls
vLLM metrics · 788 samples over 28m
gen tokens/s
36.5
prompt tokens/s
1500
mean TTFT
6.83s
mean TPOT
peak concurrency
8
peak KV cache
total gen tokens
61,949
total requests
174
Task Category Status Turns Time Tool calls What the model ended with
broken-python software-engineering ✓ pass 15 14 Your pip installation is now working! The issue was that the pip module files were missing from your Python installation, even though the pip metadata was present. **What I did:** 1. Removed the incomplete pip installation (`pip-25.3.dist-…
competitive-programming-solver CP ✓ pass 18 17 The solution has been written to `/workdir/sol.csv`. The answers for the 10 test cases are: - testcase1: YES - testcase2: NO - testcase3: YES - testcase4: NO - testcase5: NO - testcase6: YES - testcase7: YES - testcase8: NO - testcase9: YE…
acl-permissions-inheritance system-administration ✗ fail 7 6 The shared directory structure has been created successfully: **Created `/srv/shared` with:** - Group ownership: `research` - Permissions: `2770` (rwxrws---) with setgid bit set - ACLs configured for: - Group `research`: rwx (current and…
grader output
grader verdict: FAIL (exit 0)
tests/test_outputs.py::test_bob_can_create_and_alice_can_access
PASSED ../tests/test_outputs.py::test_subdirectory_inheritance
PASSED ../tests/test_outputs.py::test_outside_users_denied_access
PASSED ../tests/test_outputs.py::test_execute_permissions_on_scripts
FAILED ../tests/test_outputs.py::test_directory_exists_and_basic_permissions
FAILED ../tests/test_outputs.py::test_file_permissions_inheritance - Assertio...
========================= 2 failed, 7 passed in 0.52s ==========================
amuse-install scientific-computing ⏱ timeout 46 947s 46 Excellent! All imports work. Let me also run the test script one more time to confirm everything works:
anomaly-detection-ranking general ⏱ timeout 11 1139s 19 The results look good. Let me verify the anomaly detection logic by checking the actual values that were predicted as anomalies:
api-endpoint-permission-canonicalizer software-engineering ⏱ timeout 5 682s 5 Now let me create a test input file to verify the implementation works:
application-debug debugging ✗ fail 7 10 The analysis is complete. Both output files have been generated successfully: **Summary of Analysis:** - **4 log files** analyzed with **143 total exceptions** - **Critical Failures (ERROR+FATAL):** 143 - **Most Common Exception:** `java.u…
grader output
grader verdict: FAIL (exit 0)
SED ../tests/test_outputs.py::test_critical_failures_calculation
PASSED ../tests/test_outputs.py::test_stack_traces_correspond_to_top_exceptions
PASSED ../tests/test_outputs.py::test_stack_traces_include_line_numbers
PASSED ../tests/test_outputs.py::test_percentage_calculations
PASSED ../tests/test_outputs.py::test_anti_cheat_validation
FAILED ../tests/test_outputs.py::test_stack_traces_present - AssertionError: ...
========================= 1 failed, 12 passed in 0.13s =========================
auth_token_race_condition backend-engineering ✗ fail 10 9 The fix has been successfully applied. The token service now properly handles concurrent requests by: 1. Acquiring a lock after validating user credentials (lines 47-51) 2. Releasing the lock in a finally block to ensure it's always releas…
grader output
grader verdict: FAIL (exit 1)
Starting test suite for concurrent token refresh...
Killing any existing uvicorn processes...
Waiting for port 8000 to be available...
Waiting for Redis to be ready...
/tests/test.sh: line 27: /workspace/start.sh: No such file or directory
Redis failed to start within 60 seconds
bandit-delayed-feedback machine-learning ⏱ timeout 14 1117s 14 The delay model is correctly computed. Now let me verify the arm statistics are reasonable and check if there's anything that needs improvement:
bash-log-processor-fix bash_scripting ⏱ timeout 20 1188s 22 Interesting - the backslash-n is being treated as literal `\n` instead of newline. This is because bash's echo builtin doesn't interpret `\n` by default unless `-e` is used. Let me use `printf` instead which is more reliable:
basic-message-queue backend ⏱ timeout 48 1103s 48 I see - the server is starting but it's blocking. Let me try a different approach with a script:
battery-charging-optimization general ⏱ timeout 8 1058s 8 Both approaches work but have too many moves. The key insight is that we need to stop when we reach 159 in A, but the simulation continues. Looking at the second approach, I can see it reaches A=159 at the end. Let me analyze the pattern m…
bloom-filter-cache-penetration-prevention software-engineering ⏱ timeout 14 1170s 14 The Bloom Filter is loading from the file correctly. Now let me test the API endpoints:
book-portfolio-analysis data_engineering ⏱ timeout 13 718s 13 Now I have a good understanding of the data. Let me write a Python script to solve this problem. The data files have JSON format issues (missing commas), so I need to handle that:
bracket-sequence-restoration general ⏱ timeout 6 974s 6 Let me test this:
breast-cancer-mlflow machine-learning ⏱ timeout 44 1172s 44 The processes are exiting immediately. Let me check if there's an issue with the app or if it's a different problem.
build-merkle-tree-cli-sha512 software-engineering ⏱ timeout 33 1186s 33 Now let's test with an empty directory:
build-system-task-ordering build-and-dependency-management ⏱ timeout 13 1190s 13 I need to fix the variable name issue:
california-housing-api machine-learning ⏱ timeout 27 889s 27 Let me try a different approach - run the server directly in a subprocess:
chained-forensic-extraction_20260101_011957 security ⏱ timeout 20 1144s 20 Key 0 means no XOR was applied, which is suspicious. Let me look at the raw data more carefully - maybe the encoding is different. Let me check if the "Comment\0" marker is part of the encoding or if the text after it is the encoded data: