← all results · saricles/MiniMax-M2.7-REAP-172B-A10B-NVFP4-GB10 · NVFP4 · 10B/172B · parser minimax_m2 · smoke probes · live

minimax-m2.7-reap-saricles

saricles's REAP-pruned MiniMax-M2.7, 172 B total / 10 B active. Advertised as "single-Spark ready" at 99 GB. Recommended by the community over the dervig variant — retains 25 % more experts and comes with cudagraph-disable recipe baked in.

0/20 tasks passed · 0% pass rate

0 pass 6 grader fail 14 timeout 0 no tool calls

Task	Category	Status	Turns	Time	Tool calls	What the model ended with
acl-permissions-inheritance	system-administration	⏱ timeout	41	965s	41	<think>Let me try a different approach. I'll use the setfacl with the correct format for setting ACLs. The format for setting ACLs might be different. </think>
amuse-install	scientific-computing	✗ fail	9	—	21	<think>Let me read the installation documentation to understand how to install AMUSE. </think> grader output grader verdict: FAIL (exit 0) 4c45f38c825f9549d2bc941de9fdc85 Expected file hash: eb181d4c3ea88b13e6486c731f2bdd7c84c45f38c825f9549d2bc941de9fdc85 =========================== short test summary info ============================ PASSED ../tests/test_outputs.py::test_test_script_unchanged FAILED ../tests/test_outputs.py::test_virtual_environment_created - Assertion... FAILED ../tests/test_outputs.py::test_test_script_executes - AssertionError: ... ========================= 2 failed, 1 passed in 0.04s ==========================
anomaly-detection-ranking	general	✗ fail	12	—	11	<think>Empty. Let me look at the images folder. </think> grader output grader verdict: FAIL (exit 0) ============ ____________________________ test_grader_validation ____________________________ ----------------------------- Captured stdout call ----------------------------- Feedback: results.json not found Score: 0.0 ✓ Grader validation completed. Score: 0.0 written to reward.txt =========================== short test summary info ============================ PASSED ../tests/test_outputs.py::test_grader_validation ============================== 1 passed in 0.36s ===============================
api-endpoint-permission-canonicalizer	software-engineering	⏱ timeout	2	127s	2	<think>Let me check if the directory exists and what's in it. </think>
application-debug	debugging	✗ fail	5	—	7	<think>I have 4 log files generated: 1. app_jon7f5.log 2. app_tirsd2.log 3. app_uepnrv.log 4. app_w7kju6.log Now I need to analyze all these files. Let me read them all to understand the content and extract Java exceptions, stack traces, l… grader output grader verdict: FAIL (exit 0) on - FileNot... FAILED ../tests/test_outputs.py::test_stack_traces_present - FileNotFoundErro... FAILED ../tests/test_outputs.py::test_stack_traces_correspond_to_top_exceptions FAILED ../tests/test_outputs.py::test_stack_traces_include_line_numbers - Fil... FAILED ../tests/test_outputs.py::test_percentage_calculations - FileNotFoundE... FAILED ../tests/test_outputs.py::test_anti_cheat_validation - FileNotFoundErr... ============================== 13 failed in 0.16s ==============================
auth_token_race_condition	backend-engineering	✗ fail	10	—	9	<think>The file has been created with the fix. Let me verify the fix is correct by checking that: 1. After checking `user_id` and `username` (lines 49-54), we acquire a lock using `await self.acquire_lock(f"refresh_lock:{refresh_token}")` … grader output grader verdict: FAIL (exit 1) Starting test suite for concurrent token refresh... Killing any existing uvicorn processes... Waiting for port 8000 to be available... Waiting for Redis to be ready... /tests/test.sh: line 27: /workspace/start.sh: No such file or directory Redis failed to start within 60 seconds
bandit-delayed-feedback	machine-learning	✗ fail	12	—	11	<think>Let me run the script again. </think> grader output grader verdict: FAIL (exit 0) reshold ERROR ../tests/test_outputs.py::TestDelayModel::test_conversion_rate_error_threshold FAILED ../tests/test_outputs.py::TestOutputFormat::test_predictions_file_exists FAILED ../tests/test_outputs.py::TestOutputFormat::test_delay_model_file_exists FAILED ../tests/test_outputs.py::TestOutputFormat::test_arm_statistics_columns FAILED ../tests/test_outputs.py::TestArmStatistics::test_conversion_rate_rmse_threshold ==================== 4 failed, 2 passed, 14 errors in 1.94s ====================
bash-log-processor-fix	bash_scripting	⏱ timeout	4	1344s	4	<think>Now I understand the current script. Let me read the current processed.log to see what output exists: </think>
basic-message-queue	backend	⏱ timeout	38	2375s	38	<think>Let me write a proper complete FastAPI implementation for the message queue service. </think>
battery-charging-optimization	general	⏱ timeout	—	—	0	no assistant output
bloom-filter-cache-penetration-prevention	software-engineering	⏱ timeout	9	146s	9	<think>I need to create the main.py file. Let me now implement the complete system. I'll start by creating the main.py file with all the required components. Let me first understand the requirements: 1. FastAPI service with specific struct…
book-portfolio-analysis	data_engineering	⏱ timeout	2	19s	2	<think>Good, I can see two JSON files. Let me read them to understand the data structure. </think>
bracket-sequence-restoration	general	⏱ timeout	—	—	0	no assistant output
breast-cancer-mlflow	machine-learning	✗ fail	12	—	11	<think>The script has some issues - I need to check the actual MLflow API and write a proper script. Let me check the MLflow sklearn API. </think> grader output grader verdict: FAIL (exit 0) eNotFoundEr... FAILED ../../tests/test_outputs.py::test_test_predictions_precision_recall - ... FAILED ../../tests/test_outputs.py::test_val_metrics_threshold - FileNotFound... FAILED ../../tests/test_outputs.py::test_val_test_performance_consistency - F... FAILED ../../tests/test_outputs.py::test_class_prediction_distribution - File... FAILED ../../tests/test_outputs.py::test_model_reproducible_from_hyperparameters ======================== 31 failed, 3 passed in 10.25s =========================
broken-python	software-engineering	⏱ timeout	15	209s	15	<think>Let me look for the actual pip module files. </think>
build-merkle-tree-cli-sha512	software-engineering	⏱ timeout	3	53s	3	<think>Good, I can see there's a test_data directory with some files. Let me also check the subdir to understand the nested structure. </think>
build-system-task-ordering	build-and-dependency-management	⏱ timeout	8	2236s	8	<think>There are still issues with the code. Let me fix the Kosaraju's algorithm implementation properly. </think>
california-housing-api	machine-learning	⏱ timeout	11	1724s	17	<think>I need to create a proper FastAPI application with Pydantic models for request validation. Let me rewrite this properly. </think> Let me create a proper FastAPI application with correct Pydantic models and request/response schemas.
chained-forensic-extraction_20260101_011957	security	⏱ timeout	6	503s	6	<think>Python's `zip` module is not available. Let me try using `zipfile` module instead. </think>
competitive-programming-solver	CP	⏱ timeout	5	146s	13	<think>Let me read all test cases to understand the pattern. </think>