Virtualise Everything

Python environments: from virtual to contained environments

I have always liked to be in control and I have always enjoyed reading to understand how things work. So about a decade ago I decided to leave the Windows ecosystem and I hopped on the Linux bandwagon.

I experimented a fair bit with Damn Small Linux, Linux From Scratch, Gentoo, etc. And although having near total control, it was way too technical for my younger self. I then enjoyed Debian for a few months and I settled with Ubuntu for a couple of years. Having already tried a few flavours, I started to appreciate how important package management is and how painful it could sometimes be.

About 6 years ago, I discovered Arch Linux and the concept of rolling release. I’ve never looked back since moving to that distribution as the documentation is absolutely brilliant, the community is very active and the package manager is near perfect. No more package management issues, latest versions of any packages at any time (there is a bit of a lag for some packages), exactly what I was looking for!

You can imagine my shock when I started to program with python. The python ecosystem goes against everything I was used to. Even the first python programming books I bought were advising to learn python 2 as most libraries/frameworks were in python 2 and python 3 was too immature… Shocking! (for me at least)

Anyway, embracing python as a programming language meant I had to get my hands dirty and go back to actively manage packages.

I quickly realised that pip would install packages in different places, depending on how/what you install. (Have you ever done pip install --user [package]?) By default, pip installs system packages, available for all users. The --user flag will install site packages which will be installed in your home directory, available only to you and not requiring any superuser privilege.

Virtual Environments

A virtual environment is a directory tree that will contain a specific version of Python and packages. As said before, Python can run from different locations, a virtual environment will ensure Python runs in a fully controlled manner, from a chosen directory.

Venv

Although shipping with python, I won’t be talking about the venv package as virtualenv is a much more powerful alternative. Comparison here Just know you can create an environment with python -m venv [environmentName].

Virtualenv

One of the lowest level for managing python virtual environments. It is not the most elegant way but it offers a lot of control. By creating a virtual environment, each project will have its own environment and so the package management is rather easy as you can just freeze the packages in the version that works for your project.

To get started, install virtualenv:

  • pip install virtualenv or even better, via your favourite package manager.
pacmanInstall
  • mkdir VirtualEnvironments to create a folder with all your virtual environments.
  • cd VirtualEnvironments to get into the directory.
  • virtualenv [environmentName] to create your environment.
createEnv
  • To activate the environment, run source [environmentName]/bin/activate from the VirtualEnvironments directory.
activateEnv
  • To deactivate the environment: deactivate.

As you can see in the image above, the $PATH and the python interpreters are different: we are switching from the virtual environment env to the system environment. This means if your system is up to date and running python3.8, you can still develop a project using python2.7 for example. Just use the -p switch: virtualenv -p $(which python2.7) [environmentName] to create python 2.7 environment.

Once the environment set up and active (the (environmentName) tag at the beginning of the command prompt tells you which environment is running), go ahead and install the packages you need for your project, they'll be installed in the environment (in the directory). You can then run pip freeze > requirements.txt to save the packages and their versions into the requirements.txt text file.

Freezing packages is particularly useful if you need to transfer your project. Simply create a new virtual environment on the machine you need to import the project/environment to (make sure to initiate the correct version of python) then run pip install -r requirements.txt to upgrade/downgrade the packages to the correct version.

Virtualenvwrapper

Another, higher level, way to organise your virtual environments is to use virtualenvwrapper: pip install virtualenvwrapper Once installed, you can run which virtualenvwrapper.sh to check which its path and add the following lines to your .bashrc file

source /usr/local/bin/virtualenvwrapper.sh

Now source .bashrc to load the new configuration file.

  • Create a new environment with mkvirtualenv [environmentName]. If you have a requirements.txt file: mkvirtualenv [environmentName] -r requirements.txt can be used to install the packages in the required version.
  • List the packages installed with lssitepackages but if you want to create the requirements.txt file, use pip freeze > requirements.txt
  • List your environments with workon
  • Activate a specific environment with workon [environmentName]
  • Change to the environment directory with cdvirtualenv [environmentName] (you can check the path with pwd once the environment is activated)
  • Deactivate the environment with deactivate
  • Remove an environment with rmvirtualenv [environmentName]

Slightly different commands but the result is the same. virtualenvwrapper is a set of extensions to virtualenv.

Other useful packages for managing virtual environments are:

  • pipenv which is combining pip, virtualenv and pipfile (another way to address requirements.txt).
  • pyenv which aims to isolate Python versions, the virtual environments are managed with virtualenv or pyenv-virtualenv

Conda

Conda is very popular amongst data scientists for a few reasons:

  • It bundles Intel MKL which makes some libraries (like numpy) faster.
  • It manages packages locally so there is no need for superuser privilege.
  • It comes with a lot of industry standard packages.
  • Anaconda inc. is a company which offers support contracts.
  • Makes using python on Windows a lot easier.

Not only does it manage packages, it also allows for environment management.

  • conda create --name [environmentName], conda create --name [environmentName] python=2.7 if you need a specific version of python or conda env create -f environment.yml to create an environment from an environment.yml file (similar to the requirements.txt file)
  • conda [environmentName] export > environment.yml will create the environment.yml file.
  • conda install -n [environmentName] package=version to add packages, the version number is optional.
  • conda deactivate to deactivate an environment.
  • conda activate [environmentName] to activate an environment.
  • conda navigator will start the anaconda navigator with all the applications installed in the environment
CondaNavigator

As you can see, no matter what tool you are using, virtual environments work in the same way: creation, activation, installation of packages, creation of text file (so the environment can be replicated), deactivation.

Contained environments

Docker

Another powerful way to address the environment issue is to use containers solutions such as Docker. For most situations, it is a little bit over the top but I will be addressing this because:

  • it is rather easy to implement
  • once you understand how Docker works, you get access to a lot of really useful images such as: Postgres, MongoDB, Redis, PySpark, etc.

First, install Docker.

Install

Add the user to the docker group

Group

And make sure you activate the docker service

Services

You can then execute docker run hello-world

Hello-World!

There you go, you’ve just ran your first container!

A few useful commands:

  • docker pull [container] to download the container image.
  • docker run [container] to execute the container.
  • docker image ls to list the containers locally available.
  • docker rm [container] to remove a container from your machine.

Make sure to visit the Docker hub which lists all the (shared) images available. I would also strongly advise to read the images description as they will always explain how to run the image. For example, you can see above that the postgreSQL container starts with docker run --name dbName -e POSTGRES_PASSWORD=password -d postgres.

If you are interested in working with PySpark (note that you'll have to create an account on Docker Hub, then use docker login) run docker pull pyspark-notebook to download the image then docker run -d -p ***:*** -p 4040:4040 -p 4041:4041 jupyter/pyspark-notebook to start a container with a jupyter notebook.

If you are interested in building your own image, have a look here, this documentation will give you the basics to achieve your first build.

Conclusion

This article was a rather quick overview of the various solutions available to sort out the environment issue. Conda is interesting for all the apps it can manage (Spyder is quite a nice IDE, for example) but virtualenv should be sufficient for most scenarios. We went a little bit off the beaten track by talking about Docker but mastering this tool is really interesting as it opens a whole new world!

--

--

--

Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Log Aggregation, Search, And Alerting On CDH With Pulse

Quick guide to setup MongoDB on macOS & Ubuntu 18.04 / 20.04 LTS server. GUI & authentication.

The shell stores the history of entered commands, defines how you set environment variables, how…

Tutorial Hell : How to avoid it ?

An image showing hole with fire burning, meant to be depicted as hell.

What’s new in MetWork 0.6 ?

Paths of Data Science and Software Engineering

Data Science Curriculum Archetypes

Dos and Don’ts for DevOps in a Microservices project

Demystifying the .git Folder

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Antoine Ghilissen

Antoine Ghilissen

Data Scientist

More from Medium

Python queries to work with multiple instances of Oracle and SQL Server.

Make it easier to customize Matplotlib style with your own Python module

Mocking Python Unit Tests 101

Uncompress Z Files to CSV Using Python