Python

Conda-based python stack

Conda, installed through Anaconda or Miniconda, is a Python package and environment manager. Anaconda is a scientific computing toolkit that bundles Conda with packages like numpy, scipy, and matplotlib. Miniconda is a minimal Conda install that has fewer packages by default than Anaconda. Miniconda is a good choice for users who want to use Conda but don’t need many of the default packages. Both options are a good choice for creating Conda-based virtual environments.

Before starting

To verify Conda is available on a Cyberinfrastructure Research Computing (CIRC) system, run the command:

module spider

If Miniconda or Anaconda is available through the module system, load by running module load <package name>. For example:

module load miniconda3

Configuring and managing a Conda environment

Conda environments contain Python versions, files, packages, and their dependencies that don’t interact with other environments. Conda environments let you customize your Python environment for one application or workflow.

Initializing Conda

To use the conda command line interface, you first need to initialize it. This process adds some Conda setup code to your .bashrc. To initialize, run:

conda init bash

At this point, either log out and back in to the CIRC system or source your .bashrc file to use conda. You should see (base) prepended to your shell status.

To undo the changes to your .bashrc file, you can run:

conda init --reverse

Creating environments

The following command creates an environment named myenv:

conda create --name myenv

Replace myenv with your desired environment name.

Conda creates environments in the user-space directory $HOME/.conda/envs/. However, by default no packages are installed in a newly created environment. To install software and packages, first you need to activate the environment:

conda activate myenv

To install Python and other packages, use conda install. For example, the following command installs Python version 3.9, numpy, scipy, and matplotlib.

conda install python=3.9
conda install numpy scipy matplotlib
Conda works out a complex graph of package dependencies, so installation can take a while.

Once Python is installed, you can confirm your shell is accessing the correct python and pip executables by using which:

which pip
which python
python --version

You should see paths to pip and python from $HOME/.conda/envs/myenv.

Once an environment is created, it’s permanently available to you (and your job scripts) until you delete it. Once created, you can install whatever available software you’d like into the environment. You can only have one environment active at a time.

Searching for software

To see if a software package available from the default channel, use the conda search <package_name> command. Any available matches will be displayed along with versions. For example:

(myenv)$ conda search R
Loading channels: done
# Name                       Version           Build  Channel
r                              3.1.2               0  pkgs/r
r                              3.1.2               1  pkgs/r
r                              3.1.2               2  pkgs/r
r                              3.1.2               3  pkgs/r
r                              3.1.3               0  pkgs/r
r                              3.2.0               0  pkgs/r
r                              3.2.1               0  pkgs/r
r                              3.2.2               0  pkgs/r
r                              3.3.1        r3.3.1_0  pkgs/r
r                              3.3.1        r3.3.1_1  pkgs/r
r                              3.3.2        r3.3.2_0  pkgs/r
r                              3.4.1        r3.4.1_0  pkgs/r
r                              3.4.2      h65d9972_0  pkgs/r
r                              3.4.3        mro343_0  pkgs/r
r                              3.4.3          r343_0  pkgs/r
r                              3.5.0        mro350_0  pkgs/r
r                              3.5.0          r350_0  pkgs/r
r                              3.5.1        mro351_0  pkgs/r
r                              3.5.1          r351_0  pkgs/r
r                              3.6.0           r36_0  pkgs/r

conda-forge and bioconda

Conda can access channels that are not part of the main Conda package system. Two popular channels are conda-forge and bioconda. To install the package nextflow from bioconda, for example, you can run:

conda install -c bioconda nextflow

Removing a package

To remove an installed package, use

conda env remove -n <package name>
Your environment must be activated to uninstall a package.

Deactivating the Conda environment

To clear an active environment, use conda deactivate.

conda deactivate

Using and creating environment.yml files

Conda environments can be described in environment.yml files. This lets others (or future you) create a copy of your Conda environment. To create an environment.yml, take a snapshot of your Conda environment with conda env export:

conda env export > environment.yml
The > operator directs the output of conda env export to a file.

environment.yml files can be used to create a new Conda environment. For example, to create the example environment myenv2, run the following command:

conda env create -n myenv2 -f environment.yml

Installing mpi4py in a Conda environment

By default, using conda install to install mpi4py will cause Conda to install its own version of MPI. However, the Conda version of MPI is incompatible with CIRC systems. In order to install mpi4py in a Conda environment, you need to use pip install instead of conda install:

pip install mpi4py

pip install does not install its own MPI and will use the active system MPI.

You might get a compiler_compat warning when compiling mpi4py using system MPI with pip. Unfortunately, the only easy solution for this is deleting the compiler_compat directory from your environment. For example:

rm -r $HOME/.conda/envs/myenv/compiler_compat

This may have unintended consequences, so back up your environment before testing it.