> For the complete documentation index, see [llms.txt](https://docs.burla.dev/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.burla.dev/all-examples/data-processing-examples.md). # Large-Scale Data Processing Large-scale data processing examples for files, corpora, table scans, and ordinary Python data code.


Query 2.4TB Parquet in 76s	Run a DuckDB query over 1,000 Parquet files on 10,000 CPUs and combine the results.	/pages/98L1G4g9GrVTTy641yZv	/files/oFIAsVD6hNnlxdJ44ywi
Distill 571M Amazon reviews	Read 275 GB of JSONL with HTTP Range requests, deterministic scoring, heap reducers, and a second Unhinged Mode pass.	/pages/BWdAJuSYdRcWGvdRIoRu	/files/qRrthFNz7eVhoXp92NW9
Scan 2.76B NYC taxi trips	Scan taxi and FHV month files, keep pickup counts small, and classify zones after the full scan.	/pages/pDdLW6LmaxrXMOGhY2Eh	/files/ogQ4STtDE5DOCK92IkHH
Map geotagged Flickr photos	Reverse-geocode public photos and build country signatures from user-written tags.	/pages/eZdCoQ5oWGW1QiFPYBzl	/files/4FhDbemo1UXKRMRN2E5j
Summarize 1M GitHub READMEs	Shard README Parquet by deterministic ids, score with inspectable rules, and reduce category stats.	/pages/KVTCMNqnZhZJzSFtu5mu	/files/xdBdtqWQyGYdEiQuHIk5
Audit 5,000 Parquet files	Compute one QA row per object so bad shards are easy to triage.	/pages/FCMAhAAfS1i3ncq5HuHQ	/files/T5K2f8GlxT1eGkBNDF8o
Parallelize pandas apply	Partition by user id, keep the row function intact, and write enriched Parquet chunks.	/pages/fFrT39PFVRNoXkzIniJD	/files/S6jlzgy6Ye0nFakABUby