Python environments: from virtual to contained environments
I have always liked to be in control and I have always enjoyed reading to understand how things work. So about a decade ago I decided to leave the Windows ecosystem and I hopped on the Linux bandwagon.
I experimented a fair bit with Damn Small Linux, Linux From Scratch, Gentoo, etc. And although having near total control, it was way too technical for my younger self. I then enjoyed Debian for a few months and I settled with Ubuntu for a couple of years. Having already tried a few flavours, I started to appreciate how important package management is and how painful it could sometimes be.
About 6 years ago, I discovered Arch Linux and the concept of rolling release. I’ve never looked back since moving to that distribution as the documentation is absolutely brilliant, the community is very active and the package manager is near perfect. No more package management issues, latest versions of any packages at any time (there is a bit of a lag for some packages), exactly what I was looking for!
You can imagine my shock when I started to program with python. The python ecosystem goes against everything I was used to. Even the first python programming books I bought were advising to learn python 2 as most libraries/frameworks were in python 2 and python 3 was too immature… Shocking! (for me at least)
Anyway, embracing python as a programming language meant I had to get my hands dirty and go back to actively manage packages.
I quickly realised that pip would install packages in different places, depending on how/what you install. (Have you ever done
pip install --user [package]?) By default,
pip installs system packages, available for all users. The
--user flag will install site packages which will be installed in your home directory, available only to you and not requiring any superuser privilege.
A virtual environment is a directory tree that will contain a specific version of Python and packages. As said before, Python can run from different locations, a virtual environment will ensure Python runs in a fully controlled manner, from a chosen directory.
Although shipping with python, I won’t be talking about the
venv package as
virtualenv is a much more powerful alternative. Comparison here Just know you can create an environment with
python -m venv [environmentName].
One of the lowest level for managing python virtual environments. It is not the most elegant way but it offers a lot of control. By creating a virtual environment, each project will have its own environment and so the package management is rather easy as you can just
freeze the packages in the version that works for your project.
To get started, install virtualenv:
pip install virtualenvor even better, via your favourite package manager.
mkdir VirtualEnvironmentsto create a folder with all your virtual environments.
cd VirtualEnvironmentsto get into the directory.
virtualenv [environmentName]to create your environment.
- To activate the environment, run
source [environmentName]/bin/activatefrom the
- To deactivate the environment:
As you can see in the image above, the
$PATH and the python interpreters are different: we are switching from the virtual environment
env to the system environment. This means if your system is up to date and running
python3.8, you can still develop a project using
python2.7 for example. Just use the
virtualenv -p $(which python2.7) [environmentName] to create python 2.7 environment.
Once the environment set up and active (the
(environmentName) tag at the beginning of the command prompt tells you which environment is running), go ahead and install the packages you need for your project, they'll be installed in the environment (in the directory). You can then run
pip freeze > requirements.txt to save the packages and their versions into the
requirements.txt text file.
Freezing packages is particularly useful if you need to transfer your project. Simply create a new virtual environment on the machine you need to import the project/environment to (make sure to initiate the correct version of python) then run
pip install -r requirements.txt to upgrade/downgrade the packages to the correct version.
Another, higher level, way to organise your virtual environments is to use virtualenvwrapper:
pip install virtualenvwrapper Once installed, you can run
which virtualenvwrapper.sh to check which its path and add the following lines to your
source .bashrc to load the new configuration file.
- Create a new environment with
mkvirtualenv [environmentName]. If you have a requirements.txt file:
mkvirtualenv [environmentName] -r requirements.txtcan be used to install the packages in the required version.
- List the packages installed with
lssitepackagesbut if you want to create the
pip freeze > requirements.txt
- List your environments with
- Activate a specific environment with
- Change to the environment directory with
cdvirtualenv [environmentName](you can check the path with
pwdonce the environment is activated)
- Deactivate the environment with
- Remove an environment with
Slightly different commands but the result is the same.
virtualenvwrapper is a set of extensions to
Other useful packages for managing virtual environments are:
- pipenv which is combining
pipfile(another way to address requirements.txt).
- pyenv which aims to isolate Python versions, the virtual environments are managed with
Conda is very popular amongst data scientists for a few reasons:
- It bundles Intel MKL which makes some libraries (like numpy) faster.
- It manages packages locally so there is no need for superuser privilege.
- It comes with a lot of industry standard packages.
- Anaconda inc. is a company which offers support contracts.
- Makes using python on Windows a lot easier.
Not only does it manage packages, it also allows for environment management.
conda create --name [environmentName],
conda create --name [environmentName] python=2.7if you need a specific version of python or
conda env create -f environment.ymlto create an environment from an
environment.ymlfile (similar to the
conda [environmentName] export > environment.ymlwill create the
conda install -n [environmentName] package=versionto add packages, the version number is optional.
conda deactivateto deactivate an environment.
conda activate [environmentName]to activate an environment.
conda navigatorwill start the anaconda navigator with all the applications installed in the environment
As you can see, no matter what tool you are using, virtual environments work in the same way: creation, activation, installation of packages, creation of text file (so the environment can be replicated), deactivation.
Another powerful way to address the environment issue is to use containers solutions such as Docker. For most situations, it is a little bit over the top but I will be addressing this because:
- it is rather easy to implement
- once you understand how Docker works, you get access to a lot of really useful images such as:
First, install Docker.
Add the user to the docker group
And make sure you activate the docker service
You can then execute
docker run hello-world
There you go, you’ve just ran your first container!
A few useful commands:
docker pull [container]to download the container image.
docker run [container]to execute the container.
docker image lsto list the containers locally available.
docker rm [container]to remove a container from your machine.
Make sure to visit the Docker hub which lists all the (shared) images available. I would also strongly advise to read the images description as they will always explain how to run the image. For example, you can see above that the postgreSQL container starts with
docker run --name dbName -e POSTGRES_PASSWORD=password -d postgres.
If you are interested in working with
PySpark (note that you'll have to create an account on Docker Hub, then use
docker login) run
docker pull pyspark-notebook to download the image then
docker run -d -p ***:*** -p 4040:4040 -p 4041:4041 jupyter/pyspark-notebook to start a container with a jupyter notebook.
If you are interested in building your own image, have a look here, this documentation will give you the basics to achieve your first build.
This article was a rather quick overview of the various solutions available to sort out the environment issue.
Conda is interesting for all the apps it can manage (
Spyder is quite a nice IDE, for example) but
virtualenv should be sufficient for most scenarios. We went a little bit off the beaten track by talking about Docker but mastering this tool is really interesting as it opens a whole new world!