bloom-filter-cache-penetration-prevention
software-engineering fail 1 turns 0s 0 tool calls
grader output
from
/tests/test.sh in the task sandbox
grader verdict: FAIL (exit 0) ray_operations ERROR ../tests/test_outputs.py::test_double_hashing_formula_verification - Mo... ERROR ../tests/test_outputs.py::test_code_documentation_formulas - ModuleNotF... ERROR ../tests/test_outputs.py::test_redis_client_reuse - ModuleNotFoundError... ERROR ../tests/test_outputs.py::test_performance_bloom_filter_efficiency - Mo... ERROR ../tests/test_outputs.py::test_bloom_filter_false_positive_behavior - M... ======================= 81 warnings, 27 errors in 1.52s ========================
system · msg 0
You are an expert terminal agent. You MUST use the provided tools to complete tasks. Use the terminal tool to run shell commands, read_file to read files, write_file to write files, search_files to search, and patch to edit files. Do NOT write out solutions as text - execute them using the tools. Always start by exploring the environment with terminal commands.
user · msg 1
You are a backend engineer building a **Bloom Filter-based cache penetration prevention system** for an e-commerce platform.
**Core Objective**: The primary goal of this task is to **reduce database roundtrips** by using Bloom Filter to pre-filter non-existent product ID requests. When a product ID is not in the Bloom Filter, the system can immediately return 404 without querying the database, significantly reducing DB load and improving system performance.
**Problem Context:**
An e-commerce platform is experiencing high database load due to cache penetration attacks:
- Redis cache is in use
- Massive requests for non-existent `productId` values
- Redis miss → DB query → No result found
- DB becomes overloaded from these unnecessary queries
**Core Requirements:**
**1. FastAPI Service Setup**
- The service must be runnable with:
```
uvicorn main:app --host 0.0.0.0 --port 8000
```
- **CRITICAL**: The ASGI application must be importable as `main:app`
- Create `/app/main.py` file containing `app = FastAPI()`
- **CRITICAL**: The database and Bloom Filter must be initialized at module load time (when the module is imported), not in startup events, to work with FastAPI's TestClient
- **Data Preparation**: The test framework will provide `/app/data/products.txt` with initial product IDs. Your code must load data from this file if it exists.
**2. Bloom Filter Implementation**
- Implement a Bloom Filter from scratch (do NOT use external Bloom Filter libraries like `pybloom`, `bloom-filter`, etc.)
- **CRITICAL Design Requirements:**
- Expected data size (n): 1,000,000 product IDs
- False Positive rate (p): Design for p ≤ 0.01 (1%)
- **You must calculate the optimal bit array size (m) and hash function count (k) using standard Bloom Filter formulas**
- **The actual false positive probability must be calculated and reported correctly using the standard formula**
- **You MUST document these calculations in code comments, showing the formulas used and the values calculated**
- **CRITICAL Constraints:**
- False Positive is ALLOWED (expected behavior)
- False Negative is ABSOLUTELY FORBIDDEN (must never occur)
- Bloom Filter must be used ONLY as a pre-filter, not as the source of truth
- **Hash Function Implementation:**
- **CRITICAL**: You MUST implement double hashing technique to efficiently generate k hash indices
- Double hashing: Use two base hash functions (h1, h2) and derive k indices as: `h1 + i * h2 mod m` for i in [0, k-1]
- This is REQUIRED for performance - computing k independent hash functions is too slow for production use
- Base hash functions: Use efficient non-cryptographic hash functions (e.g., `hashlib.md5()`, `hashlib.sha256()` with different seeds, or Python's built-in `hash()` with different seeds)
- **Hash Function Requirements**: Hash functions must be deterministic (same input always produces same output) and provide good distribution across the bit array
**3. Request Processing Flow**
The system must follow this exact flow:
```
Client Request (productId)
↓
Bloom Filter Check
↓ false → Return 404 immediately (no DB query)
↓ true
Cache (Redis) Check
↓ hit → Return cached data
↓ miss
Database Query
↓
If found: Update cache + Return data
If not found: Cache negative result (with short TTL) + Return 404
```
**Important Notes:**
- **Negative Caching**: When a product is not found in the database, cache the negative result (e.g., `cache:product:{productId}` → `null` or special marker) with a short TTL (e.g., 60-300 seconds) to prevent repeated database queries for the same non-existent keys
- This complements the Bloom Filter by handling false positives that pass through the filter
**4. Redis Backend Integration**
- Connect to Redis using `REDIS_URL` environment variable (default: `fakeredis://`)
- **CRITICAL**: The code MUST use fakeredis for all Redis operations. When `REDIS_URL=fakeredis://`, use fakeredis library. This simplifies the implementation and eliminate
… (21747 more chars) assistant · msg 2
I'll implement the Bloom Filter-based cache penetration prevention system as per your detailed requirements. Let me start by exploring the environment and then building the solution step-by-step.