Page cover

NDVI for 2K Sentinel tiles

In this example we:

  • Process 2,000 Sentinel-2 tiles.

  • Read red and near-infrared bands from S3.

  • Compute NDVI and write per-tile GeoTIFF outputs.

  • Return a report with per-tile stats.

The first tile usually works. The full region is where missing bands, bad nodata values, CRS surprises, and requester-pays mistakes show up.

Dataset: Sentinel-2 tile ids

The input is a plain list of tile ids. Each worker owns one tile.

import io
from pathlib import Path

import boto3
import numpy as np
import pandas as pd
import rasterio
from burla import remote_parallel_map
from rasterio.io import MemoryFile

SRC_BUCKET = "sentinel-s2-l2a"
DST_BUCKET = "my-ndvi-outputs"
REPORT_PATH = Path("/workspace/shared/ndvi/ndvi_report.csv")

Step 1: Make one task per tile

Step 2: Compute NDVI in the worker

The worker reads both bands, computes NDVI, writes a compressed GeoTIFF, and returns summary stats.

Step 3: Smoke test one tile

Run one tile with the same Docker image and cloud permissions you will use for the full region.

Step 4: Run the tiles

Each tile gets two CPUs and enough RAM for the bands.

What's the point?

A pretty subset can produce a convincing map and still miss the data-quality problem.

For geospatial work, I want one task to own one tile, scene, or chip group. Keep the source reads and output writes inside the worker. Return enough stats that the report can catch suspicious tiles before they quietly enter a model.

Last updated