Hydra for tracking machine learning experiments

October 20, 2021 — February 12, 2025

computers are awful

faster pussycat

how do science

premature optimization

provenance

python

Suspiciously similar content

1 What?

Figure 1: DID YOU SAY A FRAMEWORK FOR ELEPHANTLY CONFIGURING COMPLEX APPLICATIONS

When I configure my ML experiments, i.e. try various combinations of parameters to find the good ones, the one-stop shop (at least for Python) is Hydra. Related: hyperparameter search (which I also sometimes do using Hydra).

The problem that Hydra solves is

Configuring experiments, by which we usually mean “trying out neural net parameters to find good/bad ones”. It generates a command-line interface that allows me to express various parameters and hyperparameters to test different candidate NN configurations, set sensible defaults, and use text files for storing complex information in a standardised and readable format.
Logging outputs in a special directory per-run: whenever I set up an experiment using the configuration interface, it gets its own automatic output folder which keeps all the experiment parameters for later reference and analysis.

It has a few other bonus features as well (e.g. automatically running jobs in parallel or performing parameter sweeps or integrating with hyperparameter tuners).

This isn’t necessarily obvious because the documentation targets experienced users, not new users. That’s why this notebook exists. Hydra is a supposedly generic configuration system, but they’ve made sure to have good supports for ML in particular. This isn’t obvious (IMO) from the docs, but despite that, it’s good enough to push through that difficulty. Another issue is that it is IMO a little over-engineered and chunky.

If all the dense documentation is off-putting, you might want to explore Pyrallis or gin as an alternative.

1.1 Features

Explicit flexible config syntax (by YAML) which is semi-easy to read and write and process
They have thought about configuring ML experiments in particular
Optional typing system to detect misconfigurations (although in practice I will never have time to actually set this up)
A command-line system with sophisticated further configuration and overrides
Hierarchical configurations can be overridden in various ways

1.2 Misfeatures

Most of the documentation and examples are about PyTorch, presumably because of the Facebook connection, which means that the examples are not generic.
The manual is flat, affectless and unopinionated even where an opinion would be useful and where some affect would be more engaging and memorable.
Hierarchical configurations may be overridden in too many ways and TBH are a bit confusing. I would prefer less flexibility for the sake of simplicity.
duplication of effort. Configs are specifed both in YAML and in fucntion invocation, which is confusing and error-prone.

This all comes to a head when starting a new project: because Hydra is so generic, powerful and clean, it is not clear where to even begin.

Recommended workaround: use 3rd-party how-tos and boilerplate examples to copy-paste. These are IMO worth the overhead since setting it up this way saves much time later on.

2 Tutorials and examples

Best tutorial: Kushajveer Singh, Complete tutorial on how to use Hydra in Machine Learning projects
Julien Beaulieu, Building A Flexible Configuration System For Deep Learning Models
Simone Scardapane, Learning Hydra for configuring ML experiments
Omry Yadan (Hydra author) writes Hydra — A fresh look at configuration for Machine Learning projects

I generally hate video documentation but I will make a rare exception here:

Here are some examples of Hydra in action which might be useful as templates:

In practice, I found the fancy templates of those way too fancy and feature-stuffed. It was easiest to start from scratch without all the overblown dependencies and fancy chaos.

3 Hydra-zen

Hydra-zen:

The most important benefit of using hydra-zen is that it automatically and dynamically generates structured configs for you

Creating a structured config without hydra-zen:

from dataclasses import dataclass, field

def foo(bar: int, baz: list[str], qux: float = 1.23):
    ...

@dataclass
class FooConf:
    _target_: str = "__main__.foo"
    bar: int = 2
    baz: list[str] = field(default_factory=lambda: ["abc"])
    qux: float = 1.23

Creating an equivalent structured config with hydra-zen:

from hydra_zen import builds

def foo(bar: int, baz: list[str], qux: float = 1.23):
    ...

ZenFooConf = builds(foo, bar=2, baz=["abc"], populate_full_signature=True)

This means that it is much easier and safer to write and maintain the configs for your Hydra applications:

Write all of your configs in Python. No more YAML files!

Write less, stop repeating yourself, and get more out of your configs.

Get automatic type-safety via builds()’s signature inspection.

Validate your configs before launching your application.

Leverage auto-config support for additional types, like functools.partial, that are not natively supported by Hydra.

4 Instantiating arbitrary Python classes

Instantiating objects with hydra.utils.instantiate — this allows us to specify which Python class to instantiate using a special field, e.g.

_target_: src.models.mnist_model.MNISTLitModel

Corollary: config files are a security risk. Do not use a config file from anyone you do not trust.

5 Environment variable interpolation

Variable interpolation is supported but finicky because it supports both internal variables (which we just saw) and system environment variables, and it takes a while to work out what is what. Hydra supports system environment variables but confusingly does not expand any paths it finds in them, as it would if they were internal Hydra variables. 🤷 I gather this is because the system environment variables arrive via lower-level Omegaconf resolvers upon which Hydra is built.

local_path: ${oc.env:SPECIAL_PATH}

Pro tip: this is handy with an environment variable config system like dotenv.

Hydra supports additional environment variable management.

6 Paths

Paths are complicated. It’s useful to be aware of the following patterns:

Under some configurations Hydra hijacks the working directory by changing it to the current log directory. It’s useful to have the original working dir path as a special variable.

work_dir: ${hydra:runtime.cwd}

# Example of using the above to define some other folder
data_dir: ${work_dir}/data/

We can alternatively get at the original and configured paths using hydra.utils.get_original_cwd.

Configuring that directory is possible. The default looks like this:

hydra:
  run:
    dir: ./outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}

Here is one that is IMO more useful, allowing us to optionally use a custom output directory set by local environment variables and which uses the job name.

hydra:
  run:
    dir: ${oc.env:OUTPUT_DIR,./output}/${hydra.job.name}/${now:%Y-%m-%d}/${now:%H-%M-%S}

NB: Hijacking is now optional: Set hydra.job.chdir=False to disable hijacking.

Paths are still slightly unintuitive. Hydra’s facility for creating nice paths is great, but they do not necessarily interpolate in the obvious way — tl;dr if you want to change into the path that Hydra lets us define with lots of luxurious command substitution everything is fine.

If we wish to interrogate Hydra in arbitrary locations in the config file to do arbitrary things (e.g. using the Hydra job name or job id as a parameter for something else) it does not seem to work as well; the hydra object is not available there.

The path of least resistance is to do everything in the folder Hydra provides, but if we have some artifact that needs to go somewhere else, we can import the hydra object explicitly and interrogate it for the variables of interest. This is slightly ugly and leaky.

7 Logs

Logs are per default configured through Hydra to persist to a disk file as well as stdout, which is usually what I want. See Customising logging.

8 Python packages with Hydra configs

Feasible. See the example.

9 Go parallel with Hydra

Hydra will sweep parameters in parallel via plugins:

Joblib Launcher plugin (for local jobs)
Submitit Launcher plugin (for SLURM clusters)

The docs are a little confusing, but check out this example.

Looking at joblib, for example, we might want to enable a parameter sweep from the command line. AFAICT this should launch

python job.py hydra/launcher=joblib

TBC