Overview
burla.remote_parallel_map
burla.remote_parallel_map
Run any python function on many remote computers at the same time. See the API-Documentation
Basic use:
remote_parallel_map
requires two arguments:
# Arg 1: Any python function.
def my_function(my_input):
print(my_input)
return my_input * 2
# Arg 2: List of inputs for `my_function`.
my_inputs = [1, 2, 3]
Then remote_parallel_map
can be called like:
from burla import remote_parallel_map
outputs = remote_parallel_map(my_function, my_inputs)
print(list(outputs))
When run, remote_parallel_map
will call my_function
, on every object in my_inputs
, at the same time, each on a separate CPU in the cloud (the max #CPUs depends on your cluster settings).
In under 1 second, the three function calls are made, all at the same time:
my_function(1)
, my_function(2)
, my_function(3)
Stdout produced on the remote machines is streamed back to the client (your machine). The return values of each function are also collected and sent back to the client. The following displays in the client's terminal:
1
2
3
[2, 4, 6]
In the above example, each function call would have been made inside a separate container, each with their own isolated filesystem.
Other arguments:
remote_parallel_map
has a few optional arguments, see API-Reference for the full API-doc.
remote_parallel_map(
function_,
inputs,
func_cpu=1,
func_ram=4,
background=False,
spinner=True,
generator=False,
max_parallelism=None,
)
The func_cpu
and func_ram
arguments can be used to assign more resources to each individual function call, up to max amount your machine type supports.
background
makes remote_parallel_map
exit after having uploaded the function and inputs. After which the job will continue to run independantly on the cluster in the background.
spinner
can be used to turn off the spinner, which also displays status messages from the cluster, like the state of the current job.
generator
makes remote_parallel_map
return a python generator
object instead of a list
containing return values. This generator yields return values as they are produced instead of all at once.
max_parallelism
can be used to limit the number of function calls running at the same time.
By default, the cluster will execute as many parallel functions as possible given available resources.
This can be useful to avoid overloading external systems like databases or other web services.
Questions? Schedule a call with us, or email [email protected]. We're always happy to talk!
Last updated