Machine learning and statistics in Julia

November 27, 2019 — May 26, 2022

computers are awful
julia
neural nets
number crunching
optimization
premature optimization
python
statistics

Stats/ML and also DSP in Julia.

Figure 1: Handy map of the Julia ML ecosystem

1 Machine learning

Let’s put the automatic differentiation, the optimizers, and the samplers together to do differentiable learning!

The deep learning toolkits have shorter feature lists than the lengthy ones of those fancy Python/C++ libraries (e.g. mobile app building, cuDNN-backed optimisations are all less present in Julia libraries). But maybe the elegance/performance of Julia makes some of those features irrelevant? I, for one, don’t care about most of those because I’m a researcher, not a deployer.

Having said that, Tensorflow.jl gets all the features because it invokes C++ tensorflow. Surely one misses the benefit of Julia this way, since there are two different array-processing infrastructures to data between, and a different approach to JIT versus pre-compiled execution. Or no?

Flux.jl sounds like a reimplementation of Tensorflow-style differentiable programming inside Julia, which strikes me as the right way to do this to benefit from the end-to-end-optimised design philosophy of Julia.

Flux is a library for machine learning. It comes “batteries-included” with many useful tools built in, but also lets you use the full power of the Julia language where you need it. The whole stack is implemented in clean Julia code (right down to the GPU kernels) and any part can be tweaked to your liking.

It’s missing some features of e.g. Tensorflow, but includes compensatory surprising/unique feature combinations. GPU support seems to suggest it will support common CUDA optimizations (even some CuDNN ops), although I have suspicions that not every CUDA op is supported and also CUDA itself can be scanty (does CUDA do a GPU Discrete Cosine Transform yet?).

Its end-to-end Julia philosophy supports neat tricks. Favourite: DiffEqFlux — see below — which makes Neural ODEs sort-of simple to create.

I have not used it enough to know yet, but I suspect that the generic nature of Flux works against it in one sense, which is that I imagine there are not convenient distributed multi-GPU trainers at the moment. I should confirm this.

Knet.jl is another deep learning library that claims to show the ease of implementing deep learning frameworks in Julia.

Alternatively, Mocha.jl is a belt-and-braces deep learning thing, with a library of pre-defined layers deprecated and unmaintained.

If one were aiming to do that, why not do something left-field like use the dynamical systems approach to deep learning? This neat trick was popularised by Haber and Ruthotto et al, who have released some of their models as Meganet.jl. I’m curious to see how they work. (seems to have paused).

There are various Gaussian Process options.

MLJ is a scikit-learn-like pipeline for data analysis in Julia which standardises model composition and automates some of the training etc. It has various adaptors for other ML systems via MLJModels.

See also * FluxTraining.jl * FluxML/FastAI.jl: Port of FastAI V2 API to Julia

2 Statistics, probability and data analysis

Hayden Klok and Yoni Nazarathy are writing a free Julia Statistics textbook (preprint) (Nazarathy and Klok 2021) which seems a thorough introduction to statistics as well as Julia, albeit statistics in a classical frame that won’t be fashionable with either your learning theory or Bayesian types.

A good starting point for doing stuff is JuliaStats which organisation produces many statistics megapackages, for kernel density estimates, generalised linear models, loess etc. Install them all using Statskit:

using StatsKit

Less well known but handy is F. Bagge Carlson’s TotalLeastSquares which does neat errors-in-variables models Bagge Carlson, F., Machine Learning and System Identification for Estimation in Physical Systems (PhD Thesis 2018).

2.1 Data frames

The workhorse data structure of statistics.

This was complicated for a while but now I think it has settled down to be simple: Data frames are provided by DataFrames.jl. AFAICT this is the only one we need to care about now.1 Legacy compatibility is provided by IterableTables.jl to translate where needed between various DataFrame-like sources.

You can load a lot of the R standard datasets using RDatasets.

using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")

As far as sophisticated processing:

DataFramesMeta has been recommended, as a tidyverse analogue for Julia. One can access DataFrames (and DataTables and SQL databases and streaming data sources) using It seems to be very active.

Query.jl looks similar, and is integrated with Iterabletables.jl

Query is a package for querying Julia data sources. It can filter, project, join and group data from any iterable data source, including all the sources supported in IterableTables.jl. One can for example query any of the following data sources: any array, DataFrames, DataStreams (including CSV, Feather, SQLite, ODBC), DataTables, IndexedTables, TimeSeries, Temporal, TypedTables and DifferentialEquations (any DESolution).

It seems less active ATM than DataFramesMeta though.

Another alternative: tidyverse-like behaviour via the Pipe or Chain packages;

DataFrames taste better with InvertedIndices, which allow searching by negation. I think this is redundant for recent DataFrames though.

2.2 Frequentist statistics

Lasso and other sparse regressions are available in Lasso.jl which reimplements the lasso algorithm in pure Julia, GLMNET.jl which wraps the classic Friedman FORTRAN code for the same. There is also (functionality unattested) an orthogonal matching pursuit one called OMP.jl but that algorithm is simple enough to bang out oneself in an afternoon, so no stress if it doesn’t work. Incremental/online versions of (presumably exponential family) statistics are in OnlineStats. MixedModels.jl

is a Julia package providing capabilities for fitting and examining linear and generalised linear mixed-effect models. It is similar in scope to the lme4 package for R.

2.3 Probabilistic programming

Probabilistic programming! Bayesian inference considered broadly! Several options on the probabilistic programming page are based on Julia, specifically, Turing.jl (source), Mamba.jl, Gen (source), DynamicHMC, Klara.jl, and probably others. Of these, Gen and Turing seem the most active.

3 Differentiating, optimisation

3.1 Optimising

JuMP supports many types of optimisation, including over non-continuous domains, and is part of the JuliaOpt family of confusingly diverse optimizers, which invoke various sub-families of optimizers. The famous NLOpt solvers comprise one such class, and they can additionally be invoked separately.

Unlike NLOpt and the JuMP family, Optim.jl (part of JuliaNLSolvers, a different family entirely) solves optimisation problems purely inside Julia. It has nieces and nephews such as LsqFit for Levenberg-Marquardt non-linear least squares fits. Optim.jl will automatically invoke ForwardDiff. Assumes mostly unconstrained problems.

Krylov.jl is a collection of Krylov-type iterative methods for large iterative linear and least-squares objectives.

3.2 Autodiff

Julia is a hotbed of autodiff for technical and community reasons. Such a hotbed that it’s worth discussing in the autodiff notebook.

Closely related, projects like ModelingToolkit.jl blur the lines between equations and coding, and allow easy definition of differentiable or probabilistic programming.

4 ODEs, PDEs, SDEs

Chris Rauckackas is a veritable wizard with this stuff; read his blog.

Here is a tour of fun tricks with stochastic PDEs. There is a lot of tooling for this; DiffEqOperators … does something. DiffEqFlux (EZ neural ODEs works with Flux and claims to make Neural ODEs simple. The implementation of these things in Python, for the award-winning NeurIPS paper that made them famous was a nightmare. +1 for Julia here. The neural SDE section is mostly Julia; Go check that out.

There are many PDE options. Gridap seems fresh.

5 Configuration

See experiment tracking in ML for what I mean here.

6 Matrix Factorisation and completion

NMF.jl contains reference implementations of non-negative matrix factorisation.

Laplacians.jl by Dan Spielman et al. is a matrix factorisation toolkit especially for Laplacian (graph adjacency) matrices.

Once again, F. Bagge Carlson’s TotalLeastSquares solves certain matrix factorisation and completion problems.

7 Signal processing

DSP.jl has been split off from core and now needs to be installed separately. Also, DirectConvolutions has sensible convolution code.

FFTs are provided by AbstractFFTs, which in principle wraps many FFT implementations. I don’t know if there is a GPU implementation yet, but there for sure is the classic CPU implementation provided by FFTW.jl which uses FFTW internally.

As for how to use these things, Numerical tours of data sciences has a Julia edition with lots of signal processing content.

JuliaAudio processes audio. They recommend PortAudio.jl as a real-time soundcard interface, which looks sorta simple. See rkat’s example of how this works. There are useful abstractions like SampledSignals to load audio and keep the data and signal rate bundled together. Although, as SampledSignal maintainer Spencer Russell points out, AxisArrays might be the right data structure for sample signals, and you could use SampledSignals purely for IO, and ignore its data structures thereafter.

Images.jl processes images.

8 QMC

Low discrepancy and other QMC stuff. Mostly I want low discrepancy sequences. There are two options with near identical interfaces; I’m not sure of the differences.

Sobol.jl claims to have been performance profiled:

] add Sobol
using Sobol
s = SobolSeq(2)
# Then
x = next!(s)

QMC.jl:

] add https://github.com/PieterjanRobbe/QMC.jl
using QMC
lat = LatSeq(2)
#then
next(lat)

9 References

Akbayrak, Bocharov, and de Vries. 2021. Extended Variational Message Passing for Automated Approximate Bayesian Inference.” Entropy.
Cox, van de Laar, and de Vries. 2019. A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms.” International Journal of Approximate Reasoning.
Cusumano-Towner, and Mansinghka. 2018. A Design Proposal for Gen: Probabilistic Programming with Fast Custom Inference via Code Generation.” In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. MAPL 2018.
Fischer, and Saba. 2018. Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs.” arXiv:1810.09868 [Cs, Stat].
McNicholas, and Tait. 2019. Data Science With Julia.
Nazarathy, and Klok. 2021. Statistics with Julia: fundamentals for data science, machine learning and artificial intelligence. Springer Series in the Data Science.
Rackauckas. 2019a. Neural Jump SDEs (Jump Diffusions) and Neural PDEs.” The Winnower.
———. 2019b. The Essential Tools of Scientific Machine Learning (Scientific ML).”
Reid. 2015. Advanced Analytic 18.305 Methods in Science and Engineering.
van de Laar, Cox, Senoz, et al. 2018. ForneyLab: A Toolbox for Biologically Plausible Free Energy Minimization in Dynamic Neural Models.” In Conference on Complex Systems.
Xu, Kailai, and Darve. 2020. ADCME: Learning Spatially-Varying Physical Fields Using Deep Neural Networks.” In arXiv:2011.11955 [Cs, Math].
Xu, Kai, Ge, Tebbutt, et al. 2019. AdvancedHMC.jl: A Robust, Modular and Efficient Implementation of Advanced HMC Algorithms.”

Footnotes

  1. There are some older ones you might encounter such as DataTables.jl which are subtly incompatible in tedious ways which these days we can ignore.↩︎