Analyze 2.5 TB of Python Notebooks

In this example we:

  • Download/unzip the dataset which is served as 27 zipfiles between 20 and 200GB!

  • For each .ipynb (5 million) extract the date and python packages used.

  • Graph trends in python package popularity over time.

Example coming soon!

Last updated