Useful pip commands for Data Science
A look at the most used package management system in Python
An in-depth article was published in the February of 2020 by Sebastian Raschka et al. that studies the role and importance of Python in the Machine Learning ecosystem. The paper titled Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligenceput forward a fascinating observation which I’d like to quote here:
Historically, a wide range of different programming languages and environments have been used to enable machine learning research and application development. However, as the general-purpose Python language has seen a tremendous growth of popularity within the scientific computing community within the last decade, most recent machine learning and deep learning libraries are now Python-based.
Python has truly changed the Data Science landscape and emerged as one of the most used libraries in data science, today. This is also quite evident from the sheer number of python packages being created and used. As of July 2020, over 235,000 Python packages could be accessed through PyPI. So what is PyPI?
The Python Package Index (PyPI) is a repository of software for the Python programming language. This repository houses the packages created and shared by the ever-growing Python community. You can install any package from Pypi using pip which is the package installer for Python. Every Python programmer, new or old, uses
pip install <package name> multiple times. However, there are other useful pip commands,, especially from a data science perspective which can be extremely useful. This article attempts to explain some of the commonly used pip commands along with their frequently used options.
To begin with, we’ll create a virtual environment. This way, it’ll be easier to show the various pip commands in action. Let’s use
venv to create this new virtual environment and name it as
env.The python’s venv module is used for creating lightweight “virtual environments.”
# Creating virtual environment in Mac/Linux
python3 -m venv env# Creating virtual environment in Windows
py -m venv env
Once the env environment has been created, we’ll activate it, and then we are good to go.
Let’s start by checking if pip is installed in our environment or not. Technically, if you are using
Python 2 >=2.7.9 or
Python 3 >=3.4, pip should be already installed. The
pip --version command returns the location as well as the version of the pip installed.
Since everything is in place, let’s now look at a few important and most used pip commands, one by one.
1. pip help
If you type
pip help in your terminal, you’ll get a single-page scrollable document.
It displays the various commands that can be used with pip, as well as how the commands can be used. Additionally, if you wish to see details concerning a single pip command, you can do:
>>> pip help <command_name> example: pip help <install>
This brings up the information on the single commands whose details you are interested in.
2. pip list
If want to take a look at all the installed packages, you can do a
pip list and it will output all the packages that are currently installed in the environment.
>>> pip list
The output above shows that currently, we have only two packages installed, and out of them, pip itself belongs to an outdated version.
pip list can be used with a bunch of options, for instance:
--outdated/ -ofor listing all the outdated packages
>>> pip list --outdated or >>> pip list -o
It looks like both the installed packages are outdated.
- –uptodate/ -u for listing all the up-to-date packages
>>> pip list --uptodate or >>> pip list -u
--formatselects the output format for displaying installed packages on the screen. The available options are — columns (default), freeze, or JSON.
3. pip install
The pip install command is used to install a new package. Let’s install the
pandas , the bread and butter package for data science, in our virtual environment.
To check whether the pandas’ package has been installed or not, we can do a quick
pip list to have a look at all the installed packages.
We can see that
pandas, along with its other dependencies has been installed comfortably in the virtual environment.
pip install also has few useful options to be used along.
--upgrade/ -Ufor upgrading all specified packages to the newest available version.
--requirement <file>/ -rfor installing from the given requirements file. A requirements file is a list of all of a project’s dependencies. This text file contains all the package required including the specific version of each dependency. Here is how a requirement file typically looks like:
To install all the packages mentioned in the requirements.txt file, you can simply do:
>>> pip install -r requirements.txt
4. pip show
This command shows information about the installed packages. One can choose the amount of information to be displayed on the screen. Let’s say we want to know details about the
pandas package which we know is installed in our environment. To show limited details, we can do
pip show pandas:
>>> pip show pandas
In case, you want the complete details, you can use the
verbose option with the
pip show command,
pip show --verbose pandas
5. pip uninstall
As the name suggests,
pip uninstall will uninstall the desired package. As per the documentation, there are few exceptions that cannot be uninstalled. They are:
- Pure distutils packages installed with
python setup.py install, and
- Script wrappers installed by
python setup.py develop.
We’ll now uninstall the pandas’ package that we had recently installed. The process is pretty straight forward, as follows:
>>> pip uninstall pandas
pip uninstall has two options, namely:
--requirement <file>/ -rfor uninstalling packages from the requirements file.
--yes / -y. This option if selected doesn’t ask for confirmation during uninstalling a package.
6. pip freeze
In section 3, we touched upon the need for the
requirements file in a project. Well, pip freeze lets you easily create one. It outputs all the installed packages and their version number in requirements format.
The output of the freeze command can then be piped into a requirements file, as follows:
Conclusion and additional resources
These were some of the useful pip commands in Python, which I use in my day-to-day activities. This could be used as a handy resource to learn about pip. There are other commands too which have not been covered in this article. The official documentation is an excellent resource if you are thinking to go deeper into the details.
Originally published here