Hyperparameter Tune XGBoost with 1,000 CPUs
In this example we:
Download this 1.4GB Kaggle Dataset of commercial flight delays.
Train 36 XGBoost models with different parameters using 13, 80-CPU machines.
Identify the best model from training results.
Step 1: Upload your data to the cluster
Download the following CSV file:
Then upload it to your Burla cluster filesystem:

Any files uploaded here will appear in a network attached folder at ./shared inside every container in the cluster. Conversely, any files your code leaves in this folder will appear in the "Filesystem" tab where you can download them later, or store for future work!
Step 2: Boot some VMs
In the "Settings" tab, select the hardware and quantity of machines you want, then hit ⏻ Start ! Here we boot 13, 80-CPU VM's, these VM's delete themself after 15min. of inactivity.

Now that our machines are ready, we can call remote_parallel_map !
You may have noticed in the settings we're using the python:3.12 docker image.
This is the image the code will run inside, and it doesn't come with any of the packages we need (like XGBoost, Pandas, etc). This is ok because Burla detect's local packages at runtime and quickly installs them in all containers, usually in just a few seconds.
Step 3: Write a function to train one model.
This function:
Loads
Combined_Flights_2022.csvfrom the./sharedfolder as a Pandas DataFrame.Cleans and separates data into train / test sets.
Trains one XGBoost model using the provided
paramsdict, and 80 CPUs.Scores the model on the test set, then returns the AUC.
To test out the function we call it on just one machine, by passing it one set of parameters:
We also pass func_cpu=80 to tell Burla this function call should have 80 CPU's made available to it.
We'll need this since we're passing n_jobs=80 to XGBoost inside the train_model function.
Step 4: Call the function in parallel, on 13 separate VMs!
Here we pass 36 sets of parameters to train_model.
Because each function call requires 80 CPUs, and we have 13, 80CPU machines, this will immediately start 13 function calls, and queue the remaining 26. Burla can reliably queue up to 10 million inputs.
Once submitted, we can monitor progress and view logs from the "Jobs" tab in the dashboard:

Last updated