Hydra for tracking machine learning experiments
October 20, 2021 — February 12, 2025
Suspiciously similar content
1 What?
When I configure my ML experiments, i.e. try various combinations of parameters to find the good ones, the one-stop shop (at least for Python) is Hydra. Related: hyperparameter search (which I also sometimes do using Hydra).
The problem that Hydra solves is
- Configuring experiments, by which we usually mean “trying out neural net parameters to find good/bad ones”. It generates a command-line interface that allows me to express various parameters and hyperparameters to test different candidate NN configurations, set sensible defaults, and use text files for storing complex information in a standardised and readable format.
- Logging outputs in a special directory per-run: whenever I set up an experiment using the configuration interface, it gets its own automatic output folder which keeps all the experiment parameters for later reference and analysis.
It has a few other bonus features as well (e.g. automatically running jobs in parallel or performing parameter sweeps or integrating with hyperparameter tuners).
This isn’t necessarily obvious because the documentation targets experienced users, not new users. That’s why this notebook exists. Hydra is a supposedly generic configuration system, but they’ve made sure to have good supports for ML in particular. This isn’t obvious (IMO) from the docs, but despite that, it’s good enough to push through that difficulty. Another issue is that it is IMO a little over-engineered and chunky.
If all the dense documentation is off-putting, you might want to explore Pyrallis or gin as an alternative.
1.1 Features
- Explicit flexible config syntax (by YAML) which is semi-easy to read and write and process
- They have thought about configuring ML experiments in particular
- Optional typing system to detect misconfigurations (although in practice I will never have time to actually set this up)
- A command-line system with sophisticated further configuration and overrides
- Hierarchical configurations can be overridden in various ways
1.2 Misfeatures
- Most of the documentation and examples are about PyTorch, presumably because of the Facebook connection, which means that the examples are not generic.
- The manual is flat, affectless and unopinionated even where an opinion would be useful and where some affect would be more engaging and memorable.
- Hierarchical configurations may be overridden in too many ways and TBH are a bit confusing. I would prefer less flexibility for the sake of simplicity.
- duplication of effort. Configs are specifed both in YAML and in fucntion invocation, which is confusing and error-prone.
This all comes to a head when starting a new project: because Hydra is so generic, powerful and clean, it is not clear where to even begin.
Recommended workaround: use 3rd-party how-tos and boilerplate examples to copy-paste. These are IMO worth the overhead since setting it up this way saves much time later on.
2 Tutorials and examples
- Best tutorial: Kushajveer Singh, Complete tutorial on how to use Hydra in Machine Learning projects
- Julien Beaulieu, Building A Flexible Configuration System For Deep Learning Models
- Simone Scardapane, Learning Hydra for configuring ML experiments
- Omry Yadan (Hydra author) writes Hydra — A fresh look at configuration for Machine Learning projects
I generally hate video documentation but I will make a rare exception here:
Here are some examples of Hydra in action which might be useful as templates:
- Worked examples from the project manual
- lucmos/nn-template: Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, DVC, and Streamlit.
- ashleve/lightning-hydra-template: PyTorch Lightning + Hydra. A very general, feature-rich template for rapid and scalable ML experimentation with best practices. ⚡🔥⚡
In practice, I found the fancy templates of those way too fancy and feature-stuffed. It was easiest to start from scratch without all the overblown dependencies and fancy chaos.
More community documentation:
fb-hydra
tag on Stack Overflow- Apparently there is action on the Hydra Zulip chat?
3 Hydra-zen
The most important benefit of using hydra-zen is that it automatically and dynamically generates structured configs for you
Creating a structured config without hydra-zen:
from dataclasses import dataclass, field
def foo(bar: int, baz: list[str], qux: float = 1.23):
...
@dataclass
class FooConf:
_target_: str = "__main__.foo"
bar: int = 2
baz: list[str] = field(default_factory=lambda: ["abc"])
qux: float = 1.23
Creating an equivalent structured config with hydra-zen:
from hydra_zen import builds
def foo(bar: int, baz: list[str], qux: float = 1.23):
...
ZenFooConf = builds(foo, bar=2, baz=["abc"], populate_full_signature=True)
This means that it is much easier and safer to write and maintain the configs for your Hydra applications:
- Write all of your configs in Python. No more YAML files!
- Write less, stop repeating yourself, and get more out of your configs.
- Get automatic type-safety via
builds()
’s signature inspection.- Validate your configs before launching your application.
- Leverage auto-config support for additional types, like
functools.partial
, that are not natively supported by Hydra.
4 Instantiating arbitrary Python classes
Instantiating objects with hydra.utils.instantiate
— this allows us to specify which Python class to instantiate using a special field, e.g.
Corollary: config files are a security risk. Do not use a config file from anyone you do not trust.
5 Environment variable interpolation
Variable interpolation is supported but finicky because it supports both internal variables (which we just saw) and system environment variables, and it takes a while to work out what is what. Hydra supports system environment variables but confusingly does not expand any paths it finds in them, as it would if they were internal Hydra variables. 🤷 I gather this is because the system environment variables arrive via lower-level Omegaconf resolvers upon which Hydra is built.
Pro tip: this is handy with an environment variable config system like dotenv.
Hydra supports additional environment variable management.
6 Paths
Paths are complicated. It’s useful to be aware of the following patterns:
Under some configurations Hydra hijacks the working directory by changing it to the current log directory. It’s useful to have the original working dir path as a special variable.
work_dir: ${hydra:runtime.cwd}
# Example of using the above to define some other folder
data_dir: ${work_dir}/data/
We can alternatively get at the original and configured paths using hydra.utils.get_original_cwd
.
Configuring that directory is possible. The default looks like this:
Here is one that is IMO more useful, allowing us to optionally use a custom output directory set by local environment variables and which uses the job name.
NB: Hijacking is now optional: Set hydra.job.chdir=False
to disable hijacking.
Paths are still slightly unintuitive. Hydra’s facility for creating nice paths is great, but they do not necessarily interpolate in the obvious way — tl;dr if you want to change into the path that Hydra lets us define with lots of luxurious command substitution everything is fine.
If we wish to interrogate Hydra in arbitrary locations in the config file to do arbitrary things (e.g. using the Hydra job name or job id as a parameter for something else) it does not seem to work as well; the hydra
object is not available there.
The path of least resistance is to do everything in the folder Hydra provides, but if we have some artifact that needs to go somewhere else, we can import the hydra object explicitly and interrogate it for the variables of interest. This is slightly ugly and leaky.
7 Logs
Logs are per default configured through Hydra to persist to a disk file as well as stdout, which is usually what I want. See Customising logging.
8 Python packages with Hydra configs
Feasible. See the example.
9 Go parallel with Hydra
Hydra will sweep parameters in parallel via plugins:
- Joblib Launcher plugin (for local jobs)
- Submitit Launcher plugin (for SLURM clusters)
The docs are a little confusing, but check out this example.
Looking at joblib, for example, we might want to enable a parameter sweep from the command line. AFAICT this should launch
TBC