Photo by Taylor Vick on Unsplash

Overcoming Hardware Limitations

For my capstone project at Flatiron school, I worked on an NLP project. It brought up a few hardware issues which forced me to find creative solutions, as the only other alternative was to spend money on a new computer.

Photo by Vincent Botta on Unsplash

File Size Limitation (on GitHub)

There is a soft 50 MB limit, and a hard 100 MB limit on files allowed in repositories (see GitHub Docs).

I would not even notice it if it wasn’t for this little badge!

Git LFS setup

  • Install Git LFS.
  • Check everything is OK by running git lfs install.
  • cd to your local repository.
  • git lfs track [big_file] to tell Git LFS which file to manage.
  • Git LFS lists all the files and associated options in a .gitattribute file, so make sure it is tracked by git with git add .gitattributes.
  • git add [big_file].
  • git commit -m "start tracking big_file".
  • git push origin main.
Photo by Erik Gazi on Unsplash

CPU and RAM Limitation

As said earlier, I managed to (patiently) do most of the work on my laptop. I tried to use the chunksize= parameter, and dask to save some time (and this kind of stuff, as well). Overall, it was not too bad, even without these "accelerators". But when I needed to stitch the prepared data back together to start training the model, the notebook kernel kept on dying. I later realised this last preparation step alone requires about 45 GB of RAM, and then I had to train an XGBoost model on that dataset... I needed more computer power! As I couldn't justify spending thousands of pounds on a new (decent) machine, I had to find something else:

Google Cloud Platform

I chose Google because they offer “free credit” to experiment with their products, and it turns out this $300 credit actually goes a long way (as long as you do not let a VM run when it is not in use). But Microsoft Azure or Amazon Web Services would have done the trick as well.

Creating an Account

You need a google account, then head to to register. Details of the free tier are here.

Create a new project

Once in the Console, you will notice the banner mentions “My First Project”, the one created by default. You can always change it if needed, but it is not necessary right now.

Create a Notebook

Things are getting real now! It is time to create your first Virtual Machine, or instance in the GCP jargon. It will cost you money, so I wouldn’t advise you to go for the 60 CPU with 11 TB of RAM just yet. Spend those $300 wisely!

Creating a notebook in GCP.

First Start

Click the START button, then open JupyterLab. Open a terminal window File > New... > Terminal and run pip list, this command will tell you which packages are available on your VM. If you need other packages, run !pip install [library] in a notebook or use apt, the Debian package manager: sudo apt-get install [package].

First Push

After unleashing the power of cloud computing onto your project, it is time to save your work. In the Git menu (underlined in the following picture), you will find:

  • Git Command in Terminal which opens a terminal in the correct folder, if you prefer to run traditional git commands from the terminal window.
Git Interfaces
Git interfaces.

First Stop

As I mentioned, don’t forget to stop the VM before exiting the website. This picture shows the green check, indicating the instance is up and running (the STOP button is activated).

It's On!
Instance is running.
Stop That VM!
Instance is on stand-by.

Final Thoughts

If you are serious about AI, you might decide to build yourself a rig. The right hardware will depend a lot on the type of work you are doing, and the libraries you are using. However, if this is your case, head to Tim Dettmers’ blog for some sound advice!

Photo by Christine Sandu on Unsplash

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store