Numerical Python
April 27, 2015 — January 11, 2023
The matrix mathematics side of Python is not as comprehensive as MATLAB. Most of the tools built on core matrix libraries, numpy
and scipy
, do what I want mostly, but when, for example, I really need some fancy spline type I read about on the internet, or you want someone to have really put in some effort into ensuring such-and-such a recursive filter is stable, you might find you need to do it yourself. numpy
though gives us all the classic Fortran linear algebra libraries. There are several underlying numerics libraries which can be invoked from Python, as with any language with a decent FFI. For example tensorflow will invoke Eigen. PyArmadillo invokes Armadillo. See also the strange adjacent system of GPU libraries.
Aside: A lot of useful machine-learning-type functionality, which I won’t discuss in detail here, exists in the Python deep learning toolkits such as Tensorflow, pytorch and jax; you might want to check those pages too. Also graphing is a whole separate issue, as is optimisation.
1 Aesara
Aesara is a Python library that allows one to define, optimize/rewrite, and evaluate mathematical expressions, especially ones involving multi-dimensional arrays (e.g. numpy.ndarrays). Using Aesara, it is possible to attain speeds rivaling hand-crafted C implementations for problems involving large amounts of data.
Aesara combines aspects of a computer algebra system (CAS) with aspects of an optimising compiler. It can also generate customized code for multiple compiled languages and/or their Python-based interfaces, such as C, Numba, and JAX. This combination of CAS features with optimising compilation and transpilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once, Aesara can minimize the amount of compilation and analysis overhead, but still provide symbolic features such as automatic differentiation.
Aesara’s compiler applies many default optimizations of varying complexity. These optimizations include, but are not limited to:
- constant folding
- merging of similar sub-graphs, to avoid redundant calculations
- arithmetic simplifications (e.g. x * y / x -> y, -(-x) -> x)
- inserting efficient BLAS operations (e.g. GEMM) in a variety of contexts
- using memory aliasing to avoid unnecessary calculations
- using in-place operations wherever it does not interfere with aliasing
- loop fusion for element-wise sub-expressions
- improvements to numerical stability (e.g. (1+(x)) and (_i (x[i])))
It compiles to various backends including C, Numba and jax.
2 zarr
Zarr is a format for the storage of chunked, compressed, N-dimensional arrays. …
Highlights:
- Create N-dimensional arrays with any NumPy dtype.
- Chunk arrays along any dimension.
- Compress and/or filter chunks using any NumCodecs codec.
- Store arrays in memory, on disk, inside a Zip file, on S3, …
- Read an array concurrently from multiple threads or processes.
- Write to an array concurrently from multiple threads or processes.
- Organize arrays into hierarchies via groups.
Resembles HDF5 but makes fewer assumptions about the storage backend and more assumptions about the language frontend.
3 Dask
dask
is a flexible library for parallel computing in Python.
See the howto
for some examples of use cases.
Dask is composed of two parts:
Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
“Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.
Dask emphasizes the following virtues:
Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
Native: Enables distributed computing in pure Python with access to the PyData stack.
Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
Scales up: Runs resiliently on clusters with 1000s of cores
Scales down: Trivial to set up and run on a laptop in a single process
Responsive: Designed with interactive computing in mind, it provides rapid feedback and diagnostics to aid humans
4 HDF5
h5py provides a simple, easy and efficient interface to HDF5 files, which are an easy way of loading and saving arrays of numbers and indeed arbitrary data. I use this a lot but sometimes I get glitches on macOS with passing around serialized H5py objects, which works fine on Linux.
4.1 Pytables
PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
I am not sure what the value proposition of Pytables is compared to h5py, which is fast and easy. It seems to add a layer of complexity on top of h5py, and I am not sure what this gains me. Maybe better handling of string data or something?
5 xarray
xarray is a system which labels arrays, which is useful in keeping track of them for statistical analysis.
Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.
This data model is borrowed from the netCDF file format, which also provides xarray with a natural and portable serialization format. NetCDF is very popular in the geosciences, and there are existing libraries for reading and writing netCDF in many programming languages, including Python.
This system seems to be a developing lingua franca for the differentiable learning frameworks.
6 Compiling
Warning: generic Python precompiled binaries are slow and horrible
Specific CPU and GPU support in Python precompiled binaries is weak and outdated. For maximum performance we should be compiling them.
e.g. for performance or invoking external binaries.
See compiling Python.
7 Displaying numbers legibly
Easy, but documentation is hard to find.
7.1 Floats
Sven Marnach distills everything adroitly, e.g.:
means “with 4 decimal points, align x
to fill 10 columns”.
All conceivable alternatives are displayed at pyformat.info.
7.2 Arrays
How I set my numpy
arrays to be displayed big and informative:
Reset to default:
np.set_printoptions(edgeitems=3, infstr='inf',
linewidth=75, nanstr='nan', precision=8,
suppress=False, threshold=1000, formatter=None)
There are a lot of ways to do this one.
See also np.array_str
for one-off formatting.
8 Einstein operations
The Einstein summation convention can be used to compute many multi-dimensional, linear algebraic array operations.
einsum
provides a succinct way of representing these.A non-exhaustive list of these operations, which can be computed by
einsum
, is shown below along with examples:
- Trace of an array,
numpy.trace
.- Return a diagonal,
numpy.diag
.- Array axis summations,
numpy.sum
.- Transpositions and permutations,
numpy.transpose
.- Matrix multiplication and dot product,
numpy.matmul
numpy.dot
.- Vector inner and outer products,
numpy.inner
numpy.outer
.- Broadcasting, element-wise and scalar multiplication,
numpy.multiply
.- Tensor contractions,
numpy.tensordot
.- Chained array operations, in efficient calculation order,
numpy.einsum_path
.The subscripts string is a comma-separated list of subscript labels, where each label refers to a dimension of the corresponding operand. Whenever a label is repeated it is summed, so
np.einsum('i,i', a, b)
is equivalent tonp.inner(a,b)
. If a label appears only once, it is not summed, sonp.einsum('i', a)
produces a view ofa
with no changes. A further examplenp.einsum('ij,jk', a, b)
describes traditional matrix multiplication and is equivalent tonp.matmul(a,b)
. Repeated subscript labels in one operand take the diagonal. For example,np.einsum('ii', a)
is equivalent tonp.trace(a)
.
Also available in all NN frameworks, e.g. in pytorch as torch.einsum.
Here are some introductions to the tool:
- Tim Rocktäschel
- Chris Lemke, einsum - an underestimated function
- Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch
Higher performance, opt_einsum makes sure that the result is easy for the computer and not only the human:
Optimized einsum can significantly reduce the overall execution time of einsum-like expressions by optimising the expression’s contraction order and dispatching many operations to canonical BLAS, cuBLAS, or other specialized routines. Optimized einsum is agnostic to the backend and can handle NumPy, Dask, PyTorch, Tensorflow, CuPy, Sparse, Theano, JAX, and Autograd arrays as well as potentially any library which conforms to a standard API.
Alternate URL?
- dgasmith/opt_einsum: Optimising einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization. (Smith and Gray 2018)
8.1 Einops
Einops (Rogozhnikov 2022) makes life better; it reshapes and operates on arrays in an intuitive way, generalising the traditional einsum implementation and exposing it in a neural-network-like format.
For graphical examples of how einops works see the basic tutorial, which explains stuff more eloquently than I could.