Python, compilation and acceleration of

October 14, 2019 — January 16, 2024

compsci
computers are awful
premature optimization
python
Figure 1

Various tricks to make Python go fast. Fast is popular. Python is not fast.

1 Mojo

fast.ai - Mojo may be the biggest programming language advance in decades. TBC.

1.1 Jax, TensorFlow etc

TBC

2 Compiling

There are too many options for interfacing with external libraries and/or compiling Python code.

FFI, ctypes, Cython, Boost-Python, numba, SWIG…

2.1 Cython

Lowish-friction, well-tested, well-documented, works everywhere that CPython extensions can be compiled. Compiles most Python code (apart from generators and inner functions). Optimises Python code using type definitions and extended syntax. Here, read Max Burstein’s intro.

Highlights: It works seamlessly with NumPy. It makes calling C-code easy.

Problems: No generic dispatch. Debugging is nasty, like debugging C with extra crap in your way.

2.2 Pythran

pythran, a Python library that compiles a subset of Python (including some SciPy) to C++. The documentation could be better, but a researcher I follow made the use-case clear:

As a researcher/engineer (really, an algorithm developer) in the general area of audio and speech processing, I’ve always run into the same difficulty at all the companies I’ve worked at: How to quickly prototype and develop algorithms, and subsequently turn them into efficient code that can be deployed to customers.[…]

But when came time to deploy the prototype algorithm, we typically had to bite the bullet and rewrite it in C or C++: For one thing, it was not possible or practical to ship Matlab or NumPy to our customers so our prototype code could be run in their environment. It was also crucial to squeeze every drop of efficiency out of the processor, and predictability and reproducibility of performance was also very important![…]

Then I discovered Pythran. Unlike Cython and Numba, Pythran not only accelerates your Python code (by compiling modules into fast .so files) but Pythran also generates self-contained C++ code that implements your Python/NumPy algorithm. The C++ code is fully portable, does not require any Python or NumPy libraries, does not rely at all on Python, and can easily be incorporated into a C++ project. Once compiled, your prototype code becomes extremely efficient, optimized for the target architecture, and its performance is predictable and reproducible.

2.3 Numba

More specialised than Cython, uses LLVM instead of the generic C compiler. Numba makes optimising inner numeric loops easy.

Highlights: JIT-compiles plain Python, so it’s easy to use normal debuggers then switch on the compiler for performance improvements using the @jit. Generic dispatch using the @generated_jit decorator. Compiles to multi-core vectorisations as well as CUDA. In principle, this means you can do your calculations on the GPU.

Problems: LLVM is a shifty beast and sensitive version dependencies are annoying. Documentation is a bit crap, or at least unfriendly to outsiders. Practically, getting performance out of a GPU is trickier than working out you can optimise away one tedious matrix op, and doing it at this level is hard. There is too much messing with details of how many processors to allocate what to.

You might find it easier to use Julia if a well-maintained and documented LLVM infrastructure is a real selling point for you.

3 Which Foreign Function Interface am I supposed to be using now?

Want to call a function written in C, C++, FORTRAN etc from Python?

If you are just talking to C, ctypes is a Python library to translate Python objects to C with minimal fuss, and no compiler requirement. See the ctypes tutorial.

And of course, if you have your compiler lying about, Python was made to talk to other languages and has a normal C API.

If you want something closer to Python for your development process, Cython allows Python compilation using a special syntax, and easy calling of foreign functions in one easy package. SWIG wraps function interfaces between various languages, but looks like a PITA; (See a comparison on Stack Overflow).

There is also Boost.Python if you want to talk to C++. Boost comes with lots of other fancy bits, like numerical libraries.

There are many other options, but in practice I’ve never needed to go further than Cython, (listed below as a compilation option) so I can’t even talk about all the options listed here knowledgeably.

4 Multiprocessing

See cluster Python for a more detailed discussion.