← all results · saricles/MiniMax-M2.7-REAP-172B-A10B-NVFP4-GB10 · NVFP4 · 10B/172B · parser minimax_m2 · smoke probes · live

minimax-m2.7-reap-saricles

saricles's REAP-pruned MiniMax-M2.7, 172 B total / 10 B active. Advertised as "single-Spark ready" at 99 GB. Recommended by the community over the dervig variant — retains 25 % more experts and comes with cudagraph-disable recipe baked in.

0/20 tasks passed · 0% pass rate

0 pass 6 grader fail 14 timeout 0 no tool calls
Task Category Status Turns Time Tool calls What the model ended with
acl-permissions-inheritance system-administration ⏱ timeout 41 965s 41 <think>Let me try a different approach. I'll use the setfacl with the correct format for setting ACLs. The format for setting ACLs might be different. </think>
amuse-install scientific-computing ✗ fail 9 21 <think>Let me read the installation documentation to understand how to install AMUSE. </think>
grader output
grader verdict: FAIL (exit 0)
4c45f38c825f9549d2bc941de9fdc85
Expected file hash: eb181d4c3ea88b13e6486c731f2bdd7c84c45f38c825f9549d2bc941de9fdc85
=========================== short test summary info ============================
PASSED ../tests/test_outputs.py::test_test_script_unchanged
FAILED ../tests/test_outputs.py::test_virtual_environment_created - Assertion...
FAILED ../tests/test_outputs.py::test_test_script_executes - AssertionError: ...
========================= 2 failed, 1 passed in 0.04s ==========================
anomaly-detection-ranking general ✗ fail 12 11 <think>Empty. Let me look at the images folder. </think>
grader output
grader verdict: FAIL (exit 0)
============
____________________________ test_grader_validation ____________________________
----------------------------- Captured stdout call -----------------------------
Feedback: results.json not found
Score: 0.0
✓ Grader validation completed. Score: 0.0 written to reward.txt
=========================== short test summary info ============================
PASSED ../tests/test_outputs.py::test_grader_validation
============================== 1 passed in 0.36s ===============================
api-endpoint-permission-canonicalizer software-engineering ⏱ timeout 2 127s 2 <think>Let me check if the directory exists and what's in it. </think>
application-debug debugging ✗ fail 5 7 <think>I have 4 log files generated: 1. app_jon7f5.log 2. app_tirsd2.log 3. app_uepnrv.log 4. app_w7kju6.log Now I need to analyze all these files. Let me read them all to understand the content and extract Java exceptions, stack traces, l…
grader output
grader verdict: FAIL (exit 0)
on - FileNot...
FAILED ../tests/test_outputs.py::test_stack_traces_present - FileNotFoundErro...
FAILED ../tests/test_outputs.py::test_stack_traces_correspond_to_top_exceptions
FAILED ../tests/test_outputs.py::test_stack_traces_include_line_numbers - Fil...
FAILED ../tests/test_outputs.py::test_percentage_calculations - FileNotFoundE...
FAILED ../tests/test_outputs.py::test_anti_cheat_validation - FileNotFoundErr...
============================== 13 failed in 0.16s ==============================
auth_token_race_condition backend-engineering ✗ fail 10 9 <think>The file has been created with the fix. Let me verify the fix is correct by checking that: 1. After checking `user_id` and `username` (lines 49-54), we acquire a lock using `await self.acquire_lock(f"refresh_lock:{refresh_token}")` …
grader output
grader verdict: FAIL (exit 1)
Starting test suite for concurrent token refresh...
Killing any existing uvicorn processes...
Waiting for port 8000 to be available...
Waiting for Redis to be ready...
/tests/test.sh: line 27: /workspace/start.sh: No such file or directory
Redis failed to start within 60 seconds
bandit-delayed-feedback machine-learning ✗ fail 12 11 <think>Let me run the script again. </think>
grader output
grader verdict: FAIL (exit 0)
reshold
ERROR ../tests/test_outputs.py::TestDelayModel::test_conversion_rate_error_threshold
FAILED ../tests/test_outputs.py::TestOutputFormat::test_predictions_file_exists
FAILED ../tests/test_outputs.py::TestOutputFormat::test_delay_model_file_exists
FAILED ../tests/test_outputs.py::TestOutputFormat::test_arm_statistics_columns
FAILED ../tests/test_outputs.py::TestArmStatistics::test_conversion_rate_rmse_threshold
==================== 4 failed, 2 passed, 14 errors in 1.94s ====================
bash-log-processor-fix bash_scripting ⏱ timeout 4 1344s 4 <think>Now I understand the current script. Let me read the current processed.log to see what output exists: </think>
basic-message-queue backend ⏱ timeout 38 2375s 38 <think>Let me write a proper complete FastAPI implementation for the message queue service. </think>
battery-charging-optimization general ⏱ timeout 0 no assistant output
bloom-filter-cache-penetration-prevention software-engineering ⏱ timeout 9 146s 9 <think>I need to create the main.py file. Let me now implement the complete system. I'll start by creating the main.py file with all the required components. Let me first understand the requirements: 1. FastAPI service with specific struct…
book-portfolio-analysis data_engineering ⏱ timeout 2 19s 2 <think>Good, I can see two JSON files. Let me read them to understand the data structure. </think>
bracket-sequence-restoration general ⏱ timeout 0 no assistant output
breast-cancer-mlflow machine-learning ✗ fail 12 11 <think>The script has some issues - I need to check the actual MLflow API and write a proper script. Let me check the MLflow sklearn API. </think>
grader output
grader verdict: FAIL (exit 0)
eNotFoundEr...
FAILED ../../tests/test_outputs.py::test_test_predictions_precision_recall - ...
FAILED ../../tests/test_outputs.py::test_val_metrics_threshold - FileNotFound...
FAILED ../../tests/test_outputs.py::test_val_test_performance_consistency - F...
FAILED ../../tests/test_outputs.py::test_class_prediction_distribution - File...
FAILED ../../tests/test_outputs.py::test_model_reproducible_from_hyperparameters
======================== 31 failed, 3 passed in 10.25s =========================
broken-python software-engineering ⏱ timeout 15 209s 15 <think>Let me look for the actual pip module files. </think>
build-merkle-tree-cli-sha512 software-engineering ⏱ timeout 3 53s 3 <think>Good, I can see there's a test_data directory with some files. Let me also check the subdir to understand the nested structure. </think>
build-system-task-ordering build-and-dependency-management ⏱ timeout 8 2236s 8 <think>There are still issues with the code. Let me fix the Kosaraju's algorithm implementation properly. </think>
california-housing-api machine-learning ⏱ timeout 11 1724s 17 <think>I need to create a proper FastAPI application with Pydantic models for request validation. Let me rewrite this properly. </think> Let me create a proper FastAPI application with correct Pydantic models and request/response schemas.
chained-forensic-extraction_20260101_011957 security ⏱ timeout 6 503s 6 <think>Python's `zip` module is not available. Let me try using `zipfile` module instead. </think>
competitive-programming-solver CP ⏱ timeout 5 146s 13 <think>Let me read all test cases to understand the pattern. </think>