← all results · saricles/MiniMax-M2.7-REAP-172B-A10B-NVFP4-GB10 · NVFP4 · 10B/172B · parser minimax_m2 · smoke probes · live
minimax-m2.7-reap-saricles
saricles's REAP-pruned MiniMax-M2.7, 172 B total / 10 B active. Advertised as "single-Spark ready" at 99 GB. Recommended by the community over the dervig variant — retains 25 % more experts and comes with cudagraph-disable recipe baked in.
0/20 tasks passed · 0% pass rate
0 pass 6 grader fail 14 timeout 0 no tool calls
| Task | Category | Status | Turns | Time | Tool calls | What the model ended with |
|---|---|---|---|---|---|---|
| acl-permissions-inheritance | system-administration | ⏱ timeout | 41 | 965s | 41 | <think>Let me try a different approach. I'll use the setfacl with the correct format for setting ACLs. The format for setting ACLs might be different. </think> |
| amuse-install | scientific-computing | ✗ fail | 9 | — | 21 | <think>Let me read the installation documentation to understand how to install AMUSE.
</think> grader outputgrader verdict: FAIL (exit 0) 4c45f38c825f9549d2bc941de9fdc85 Expected file hash: eb181d4c3ea88b13e6486c731f2bdd7c84c45f38c825f9549d2bc941de9fdc85 =========================== short test summary info ============================ PASSED ../tests/test_outputs.py::test_test_script_unchanged FAILED ../tests/test_outputs.py::test_virtual_environment_created - Assertion... FAILED ../tests/test_outputs.py::test_test_script_executes - AssertionError: ... ========================= 2 failed, 1 passed in 0.04s ========================== |
| anomaly-detection-ranking | general | ✗ fail | 12 | — | 11 | <think>Empty. Let me look at the images folder.
</think> grader outputgrader verdict: FAIL (exit 0) ============ ____________________________ test_grader_validation ____________________________ ----------------------------- Captured stdout call ----------------------------- Feedback: results.json not found Score: 0.0 ✓ Grader validation completed. Score: 0.0 written to reward.txt =========================== short test summary info ============================ PASSED ../tests/test_outputs.py::test_grader_validation ============================== 1 passed in 0.36s =============================== |
| api-endpoint-permission-canonicalizer | software-engineering | ⏱ timeout | 2 | 127s | 2 | <think>Let me check if the directory exists and what's in it. </think> |
| application-debug | debugging | ✗ fail | 5 | — | 7 | <think>I have 4 log files generated:
1. app_jon7f5.log
2. app_tirsd2.log
3. app_uepnrv.log
4. app_w7kju6.log
Now I need to analyze all these files. Let me read them all to understand the content and extract Java exceptions, stack traces, l… grader outputgrader verdict: FAIL (exit 0) on - FileNot... FAILED ../tests/test_outputs.py::test_stack_traces_present - FileNotFoundErro... FAILED ../tests/test_outputs.py::test_stack_traces_correspond_to_top_exceptions FAILED ../tests/test_outputs.py::test_stack_traces_include_line_numbers - Fil... FAILED ../tests/test_outputs.py::test_percentage_calculations - FileNotFoundE... FAILED ../tests/test_outputs.py::test_anti_cheat_validation - FileNotFoundErr... ============================== 13 failed in 0.16s ============================== |
| auth_token_race_condition | backend-engineering | ✗ fail | 10 | — | 9 | <think>The file has been created with the fix. Let me verify the fix is correct by checking that:
1. After checking `user_id` and `username` (lines 49-54), we acquire a lock using `await self.acquire_lock(f"refresh_lock:{refresh_token}")` … grader outputgrader verdict: FAIL (exit 1) Starting test suite for concurrent token refresh... Killing any existing uvicorn processes... Waiting for port 8000 to be available... Waiting for Redis to be ready... /tests/test.sh: line 27: /workspace/start.sh: No such file or directory Redis failed to start within 60 seconds |
| bandit-delayed-feedback | machine-learning | ✗ fail | 12 | — | 11 | <think>Let me run the script again.
</think> grader outputgrader verdict: FAIL (exit 0) reshold ERROR ../tests/test_outputs.py::TestDelayModel::test_conversion_rate_error_threshold FAILED ../tests/test_outputs.py::TestOutputFormat::test_predictions_file_exists FAILED ../tests/test_outputs.py::TestOutputFormat::test_delay_model_file_exists FAILED ../tests/test_outputs.py::TestOutputFormat::test_arm_statistics_columns FAILED ../tests/test_outputs.py::TestArmStatistics::test_conversion_rate_rmse_threshold ==================== 4 failed, 2 passed, 14 errors in 1.94s ==================== |
| bash-log-processor-fix | bash_scripting | ⏱ timeout | 4 | 1344s | 4 | <think>Now I understand the current script. Let me read the current processed.log to see what output exists: </think> |
| basic-message-queue | backend | ⏱ timeout | 38 | 2375s | 38 | <think>Let me write a proper complete FastAPI implementation for the message queue service. </think> |
| battery-charging-optimization | general | ⏱ timeout | — | — | 0 | no assistant output |
| bloom-filter-cache-penetration-prevention | software-engineering | ⏱ timeout | 9 | 146s | 9 | <think>I need to create the main.py file. Let me now implement the complete system. I'll start by creating the main.py file with all the required components. Let me first understand the requirements: 1. FastAPI service with specific struct… |
| book-portfolio-analysis | data_engineering | ⏱ timeout | 2 | 19s | 2 | <think>Good, I can see two JSON files. Let me read them to understand the data structure. </think> |
| bracket-sequence-restoration | general | ⏱ timeout | — | — | 0 | no assistant output |
| breast-cancer-mlflow | machine-learning | ✗ fail | 12 | — | 11 | <think>The script has some issues - I need to check the actual MLflow API and write a proper script. Let me check the MLflow sklearn API.
</think> grader outputgrader verdict: FAIL (exit 0) eNotFoundEr... FAILED ../../tests/test_outputs.py::test_test_predictions_precision_recall - ... FAILED ../../tests/test_outputs.py::test_val_metrics_threshold - FileNotFound... FAILED ../../tests/test_outputs.py::test_val_test_performance_consistency - F... FAILED ../../tests/test_outputs.py::test_class_prediction_distribution - File... FAILED ../../tests/test_outputs.py::test_model_reproducible_from_hyperparameters ======================== 31 failed, 3 passed in 10.25s ========================= |
| broken-python | software-engineering | ⏱ timeout | 15 | 209s | 15 | <think>Let me look for the actual pip module files. </think> |
| build-merkle-tree-cli-sha512 | software-engineering | ⏱ timeout | 3 | 53s | 3 | <think>Good, I can see there's a test_data directory with some files. Let me also check the subdir to understand the nested structure. </think> |
| build-system-task-ordering | build-and-dependency-management | ⏱ timeout | 8 | 2236s | 8 | <think>There are still issues with the code. Let me fix the Kosaraju's algorithm implementation properly. </think> |
| california-housing-api | machine-learning | ⏱ timeout | 11 | 1724s | 17 | <think>I need to create a proper FastAPI application with Pydantic models for request validation. Let me rewrite this properly. </think> Let me create a proper FastAPI application with correct Pydantic models and request/response schemas. |
| chained-forensic-extraction_20260101_011957 | security | ⏱ timeout | 6 | 503s | 6 | <think>Python's `zip` module is not available. Let me try using `zipfile` module instead. </think> |
| competitive-programming-solver | CP | ⏱ timeout | 5 | 146s | 13 | <think>Let me read all test cases to understand the pattern. </think> |