
Run batch LLM inference
Dataset: product reviews in Parquet
import json
from pathlib import Path
import pyarrow.dataset as ds
from burla import remote_parallel_map
DATASET = "s3://my-bucket/reviews/"
OUT_PATH = Path("/workspace/shared/batch-inference/review-sentiment.jsonl")
BATCH_SIZE = 10_000
MODEL_NAME = "cardiffnlp/twitter-roberta-base-sentiment-latest"Step 1: Build batches
Step 2: Write the worker function
Step 3: Smoke test one batch
Step 4: Run the full scoring job
What's the point?
Last updated