← all results · GadflyII/Qwen3-Coder-Next-NVFP4 · NVFP4 · 3B/80B · parser qwen3_coder · smoke probes · live
qwen3-coder-next
Qwen3-Coder-Next: 80 B / 3 B-active coding-agent MoE. #1 community-cited open coding agent; "DGX Spark practitioner favorite" per the NVIDIA forum. Distinct from qwen3-next-80b (instruct); this one is coder-tuned with the `qwen3_coder` tool-call format.
2/20 tasks passed · 10% pass rate
2 pass 3 grader fail 15 timeout 0 no tool calls
vLLM metrics · 788 samples over 28m
gen tokens/s
36.5prompt tokens/s
1500mean TTFT
6.83smean TPOT
—peak concurrency
8peak KV cache
—total gen tokens
61,949total requests
174| Task | Category | Status | Turns | Time | Tool calls | What the model ended with |
|---|---|---|---|---|---|---|
| broken-python | software-engineering | ✓ pass | 15 | — | 14 | Your pip installation is now working! The issue was that the pip module files were missing from your Python installation, even though the pip metadata was present. **What I did:** 1. Removed the incomplete pip installation (`pip-25.3.dist-… |
| competitive-programming-solver | CP | ✓ pass | 18 | — | 17 | The solution has been written to `/workdir/sol.csv`. The answers for the 10 test cases are: - testcase1: YES - testcase2: NO - testcase3: YES - testcase4: NO - testcase5: NO - testcase6: YES - testcase7: YES - testcase8: NO - testcase9: YE… |
| acl-permissions-inheritance | system-administration | ✗ fail | 7 | — | 6 | The shared directory structure has been created successfully:
**Created `/srv/shared` with:**
- Group ownership: `research`
- Permissions: `2770` (rwxrws---) with setgid bit set
- ACLs configured for:
- Group `research`: rwx (current and… grader outputgrader verdict: FAIL (exit 0) tests/test_outputs.py::test_bob_can_create_and_alice_can_access PASSED ../tests/test_outputs.py::test_subdirectory_inheritance PASSED ../tests/test_outputs.py::test_outside_users_denied_access PASSED ../tests/test_outputs.py::test_execute_permissions_on_scripts FAILED ../tests/test_outputs.py::test_directory_exists_and_basic_permissions FAILED ../tests/test_outputs.py::test_file_permissions_inheritance - Assertio... ========================= 2 failed, 7 passed in 0.52s ========================== |
| amuse-install | scientific-computing | ⏱ timeout | 46 | 947s | 46 | Excellent! All imports work. Let me also run the test script one more time to confirm everything works: |
| anomaly-detection-ranking | general | ⏱ timeout | 11 | 1139s | 19 | The results look good. Let me verify the anomaly detection logic by checking the actual values that were predicted as anomalies: |
| api-endpoint-permission-canonicalizer | software-engineering | ⏱ timeout | 5 | 682s | 5 | Now let me create a test input file to verify the implementation works: |
| application-debug | debugging | ✗ fail | 7 | — | 10 | The analysis is complete. Both output files have been generated successfully:
**Summary of Analysis:**
- **4 log files** analyzed with **143 total exceptions**
- **Critical Failures (ERROR+FATAL):** 143
- **Most Common Exception:** `java.u… grader outputgrader verdict: FAIL (exit 0) SED ../tests/test_outputs.py::test_critical_failures_calculation PASSED ../tests/test_outputs.py::test_stack_traces_correspond_to_top_exceptions PASSED ../tests/test_outputs.py::test_stack_traces_include_line_numbers PASSED ../tests/test_outputs.py::test_percentage_calculations PASSED ../tests/test_outputs.py::test_anti_cheat_validation FAILED ../tests/test_outputs.py::test_stack_traces_present - AssertionError: ... ========================= 1 failed, 12 passed in 0.13s ========================= |
| auth_token_race_condition | backend-engineering | ✗ fail | 10 | — | 9 | The fix has been successfully applied. The token service now properly handles concurrent requests by:
1. Acquiring a lock after validating user credentials (lines 47-51)
2. Releasing the lock in a finally block to ensure it's always releas… grader outputgrader verdict: FAIL (exit 1) Starting test suite for concurrent token refresh... Killing any existing uvicorn processes... Waiting for port 8000 to be available... Waiting for Redis to be ready... /tests/test.sh: line 27: /workspace/start.sh: No such file or directory Redis failed to start within 60 seconds |
| bandit-delayed-feedback | machine-learning | ⏱ timeout | 14 | 1117s | 14 | The delay model is correctly computed. Now let me verify the arm statistics are reasonable and check if there's anything that needs improvement: |
| bash-log-processor-fix | bash_scripting | ⏱ timeout | 20 | 1188s | 22 | Interesting - the backslash-n is being treated as literal `\n` instead of newline. This is because bash's echo builtin doesn't interpret `\n` by default unless `-e` is used. Let me use `printf` instead which is more reliable: |
| basic-message-queue | backend | ⏱ timeout | 48 | 1103s | 48 | I see - the server is starting but it's blocking. Let me try a different approach with a script: |
| battery-charging-optimization | general | ⏱ timeout | 8 | 1058s | 8 | Both approaches work but have too many moves. The key insight is that we need to stop when we reach 159 in A, but the simulation continues. Looking at the second approach, I can see it reaches A=159 at the end. Let me analyze the pattern m… |
| bloom-filter-cache-penetration-prevention | software-engineering | ⏱ timeout | 14 | 1170s | 14 | The Bloom Filter is loading from the file correctly. Now let me test the API endpoints: |
| book-portfolio-analysis | data_engineering | ⏱ timeout | 13 | 718s | 13 | Now I have a good understanding of the data. Let me write a Python script to solve this problem. The data files have JSON format issues (missing commas), so I need to handle that: |
| bracket-sequence-restoration | general | ⏱ timeout | 6 | 974s | 6 | Let me test this: |
| breast-cancer-mlflow | machine-learning | ⏱ timeout | 44 | 1172s | 44 | The processes are exiting immediately. Let me check if there's an issue with the app or if it's a different problem. |
| build-merkle-tree-cli-sha512 | software-engineering | ⏱ timeout | 33 | 1186s | 33 | Now let's test with an empty directory: |
| build-system-task-ordering | build-and-dependency-management | ⏱ timeout | 13 | 1190s | 13 | I need to fix the variable name issue: |
| california-housing-api | machine-learning | ⏱ timeout | 27 | 889s | 27 | Let me try a different approach - run the server directly in a subprocess: |
| chained-forensic-extraction_20260101_011957 | security | ⏱ timeout | 20 | 1144s | 20 | Key 0 means no XOR was applied, which is suspicious. Let me look at the raw data more carefully - maybe the encoding is different. Let me check if the "Comment\0" marker is part of the encoding or if the text after it is the encoded data: |