Python

Python

(From Wikipedia, the free encyclopedia)

Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule.

Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a “batteries included” language due to its comprehensive standard library.

Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.

Python consistently ranks as one of the most popular programming languages.

Development tools#

Almost any text editor can be used for writing Python code. Python itself has several options to run the code interactively.

Python packages management#

There are several standards and recommendations how to keep Python libraries. Unfortunately, all them are either inconsistent, non-standard or not accepted by everyone. Effectively, you may expect a full range of various issues with installation and maintaining of Python libraries.

A good practice is to not mix together modules from different sources like Linux distribution, PyPI or Conda.

Package manager of your Linux distribution#

Is a preferable way to install most of base modules. In fact, many system utils in modern Linux distributions are written on Python or use Python modules. Typical examples are: APT package manager in Ubuntu, applets and configuration tools in Gnome and XFCE, the most of Mate Desktop Environment.

Effectively, you might expect to find many standard Python libraries either already presented as part of your base Linux installation, or ready to install via package manager.

In Ubuntu all them usually have a python- or python3- prefix, e.g:

sudo apt search python-numpy
...
python-numpy/focal,now 1:1.16.5-2ubuntu7 amd64 [installed,automatic]
  Numerical Python adds a fast array facility to the Python language
...

Pros:

  • Easy to install
  • No dependency issues
  • Fully compatible with your Linux distribution

Cons:

  • Some (or many) packages are missing
  • Packages are often outdated, or frozen on “Old Stable” version

PyPI#

Python Package Index (PyPI)

PyPI is the default Package Index for the Python community. It is open to all Python developers to consume and distribute their distributions.

pypi.org

pypi.org is the domain name for the Python Package Index (PyPI). It replaced the legacy index domain name, pypi.python.org, in 2017.

Easy_Install#

easy_install, now deprecated, was released in 2004 as part of setuptools. It was notable at the time for installing packages from PyPI using requirement specifiers, and automatically installing dependencies. Generally, it’s better not to use it now. If you find any documentation or installation instructions where it’s mentioned, please don’t follow them.

PIP#

pip came later in 2008, as alternative to easy_install, although still largely built on top of setuptools components. It was notable at the time for not installing packages as Eggs or from Eggs (but rather simply as ‘flat’ packages from sdists), and introducing the idea of Requirements Files, which gave users the power to easily replicate environments.

Currently pip is still the main installation tool for PyPI packages, however, as setuptools are deprecated, pip is deprecated as well. In particular, search function has been dropped: if you run pip search PACKAGENAME, you’ll get an error:

PyPI no longer supports ‘pip search’ (or XML-RPC search). Please use https://pypi.org/search (via a browser) instead. See https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods for more information."

Pros:

  • Still the only “standard” installation tool for Python Community packages

Cons:

  • By default installs packages system-wide (needs admin privileges)
  • Doesn’t care of standard repositories of your Linux distribution
  • May broke your system Python libraries
PIP troubleshhoting#

By default pip installs all packages to /usr/local/{lib/python.x.y,bin,share} so the worst case scenario you always can remove all these directories and got the clean distribution-only Python.

If you don’t want (or can not) install anything system-wide (which makes sense if you e.g. need to install an incompatible module), you might install packages in your home profile. Just run it as: pip install --user PACKAGENAME and it will be installed into ~/.local/lib/python.x.y/site-packages/

It’s not unusual if pip breaks or couldn’t resolve packages or libraries dependencies. Unfortunately there is no simple way to fix broken installation or roll back changes. Often the only full clean of locally installed packages could restore Python functionality. Here are few places to look for and cleanup:

/usr/local/lib/pythonx.y 
~/.local/lib/pythonx.y
~/.cache/pip

(x, y - the major and minor version of your Python, e.g. python3.10)

Also have a look at PYTHONPATH variable: echo $PYTHONPATH - it might contain some custom definitions of extra Python libraries locations.

Setup.py#

setup.py is a python file, the presence of which is an indication that the module/package you are about to install has likely been packaged and distributed with distutils, which is the standard for distributing Python Modules. This allows you to easily install Python packages. Often it’s enough to write:

$ pip install . 

pip will use setup.py to install your module. Avoid calling setup.py directly.

distutils is the original build and distribution system first added to the Python standard library in 1998. While direct use of distutils is being phased out, it still laid the foundation for the current packaging and distribution infrastructure, and it not only remains part of the standard library, but its name lives on in other ways (such as the name of the mailing list used to coordinate Python packaging standards development).

A lot of Python software still only can be installed this way. Sometimes installation process can be pretty complex due to missed or improperly installed dependencies, or incompatible libraries. Generally, installation via setup.py is only recommended for experienced users who are familiar with Python internals.

Virtual Environments#

In attempt to resolve problems with modules incompatibility Python maintainers have proposed the following tool.

The venv module supports creating lightweight “virtual environments”, each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python, and may optionally be isolated from the packages in the base environment, so only those explicitly installed in the virtual environment are available.

When used from within a virtual environment, common installation tools such as pip will install Python packages into a virtual environment without needing to be told to do so explicitly.

As stated above, when you create a virtual environment and switch to it, all packages will be installed inside, whereas base Python installation stays clean. If something went wrong, you always can remove broken environment and re-create it. It also helps if you need to workaround dependency issues.

See PEP-405 for technical details.

Creation of a new virtual environment is pretty straightforward:

python -m venv /path/to/new/virtual/environment/env_name
source /path/to/new/virtual/environment/env_name/bin/activate

More detailed description with examples can be found here

Conda#

https://docs.conda.io/en/latest/

Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

The aim of Conda project is to give user the simple and easy to use instrument for installing and maintaining packages.

Pros:

  • Easy start
  • Simple installation and maintenance of a large number of packages with dependencies
  • Doesn’t require admin privileges
  • Cross-platform (Linux, Windows, MacOS … )

Cons:

  • Base installation (and virtual environments) are insanely large
  • Ignores system libraries
  • Sub-optimal performance might be expected from pre-compiled modules
  • In Linux it may interfere with system Python
  • Painful maintenance in Windows Enterprise

General Conda recommendations#

  1. Linux installation.

Consider to use a light Conda distribution Miniconda https://docs.conda.io/en/latest/miniconda.html

Download a corresponding installer and run it, follow the prompts. Copy the installation path (assume it’s /path/to/conda )

To avoid issues in Linux don’t add init script to ~/.bashrc (it’s already [no] by default in Linux version for a reason, so don’t change it). After installation either define alias or function in ~/.bashrc and run it only if need the Conda

alias myconda='eval "$(/path/to/conda/bin/conda shell.bash hook)"'
startconda() {                                                                                                                                                                                                                              
    local CONDADIR="/path/to/conda"
    if [ -d "$CONDADIR/bin" ]; then
        case $PATH in
            *"$CONDADIR"*) echo "'$CONDADIR' already in path"1>&2;;
            *)  export PATH="$CONDADIR/bin:$PATH"
                PS1="CONDA-${PS1}";;
        esac
    else
        echo "could not find conda installation at '$CONDADIR'" 1>&2
    fi
    [ $# -gt 0 ] && source activate "$1"
}

Keep it clean#

Generally, Conda is just another package manager. It’s easy to break it if mix with other one like e.g. pip. Here are some “words of wisdom”:

  • Keep base Conda clean and small. Don’t install any extra packages into base - if you break it, all descendant virtual environments will be broken too.
  • Always create a new environment when try a new Python package.
  • Don’t mix Conda channels in one environment (e.g. anaconda and conda-forge). Usually it’s safe and easy to switch to conda-forge, but hard to roll back.
  • If they said to use pip install blahblah, that usually must be the last command after you’ve created environment, activated it and installed all dependencies with conda install

Python/Conda Performance#

An oxymoron (plurals: oxymorons and oxymora) is a figure of speech that juxtaposes concepts with opposite meanings within a word or in a phrase that is a self-contradiction. As a rhetorical device, an oxymoron illustrates a point to communicate and reveal a paradox

Python is interpreting language: you might not to expect get high performance with pure Python code. The power of Python is in its simple and clear syntax, strong community, good support and a lot of libraries which can do all heavy work with the help of underlying C or Fortran code.

There are several recommendations which may help to significantly improve native performance:

File system.#

Our Linux domain is configured with network storage as home profile. Unfortunately NFS shows bad performance when accessing the large number of files. Typical Conda environment can be 5-10 GiB size and might contain hundreds of thousands files and directories. That can be noticeably slow even on local spinning HDD.

  • Install Conda on fast local hard drive (SSD is preferable). On your desktop it can be done with:
sudo mkdir /home/USERNAME
sudo chown USERNAME:USERNAME /home/USERNAME

Then install conda into /home/USERNAME/conda

  • There are several pre-installed conda environments on HPC’s available. You can pick up one with need command
  • If you need to create a custom Conda environment on the server, please ask System Administrator to create a directory on local drive of the HPC
  • Prefer MiniConda.
  • Don’t overload your environments with unnecessary packages.
  • Regularly review your installation, remove unneeded environments, clean up Conda cache with conda clean --all
  • Keep your base Conda environment as small and clean as possible. Don’t install any non-standard modules into base.

Memory#

Python is ‘hardware agnostic’ by design. Generally, it assumes the memory amount is unlimited (as well as storage and CPU). As result the Python process may eventually consume all available RAM with no intention to reclaim it. Please keep an eye on your running Python program and take care of memory consumption. There are many proposals and discussions in attempts to resolve it, however it’s still a problem. You might refer for more details here

GIL#

The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter.

This means that only one thread can be in a state of execution at any point in time. The impact of the GIL isn’t visible to developers who execute single-threaded programs, but it can be a performance bottleneck in CPU-bound and multi-threaded code.

Since the GIL allows only one thread to execute at a time even in a multi-threaded architecture with more than one CPU core, the GIL has gained a reputation as an “infamous” feature of Python.

There are many ways to override this behaviour (including regular proposals to remove GIL), but, unfortunately, there is no ultimate solution, so always check if your Python code shows sub-optimal performance on multi-core CPU.

https://wiki.python.org/moin/GlobalInterpreterLock

https://realpython.com/python-gil/