
Process thousands of files
Dataset: raw application logs
/workspace/shared/logs/raw/import json
from pathlib import Path
from burla import remote_parallel_map
RAW_DIR = Path("/workspace/shared/logs/raw")
REPORT_DIR = Path("/workspace/shared/logs/reports")
FINAL_DIR = Path("/workspace/shared/logs/final")Step 1: Build the work list
Step 2: Process one file
Step 3: Smoke test a few files
Step 4: Reduce the reports
What's the point?
Last updated