Reproducibility in Machine Learning research
May 5, 2024 — May 5, 2024
How does reproducible science happen for ML models? How can we responsibly communicate how the latest sexy paper is likely to work in practice?
1 Difficulties of foundation models in particular
When the model is both too large and too secretive to be interrogated. TBD
2 Connection to domain adaptation
How do we know that our models generalize to the wild? See Domain adaptation.
3 Benchmarks
See ML benchmarks.
4 Incoming
REFORMS: Reporting standards for ML-based science:
The REFORMS checklist consists of 32 items across 8 sections. It is based on an extensive review of the pitfalls and best practices in adopting ML methods. We created an accompanying set of guidelines for each item in the checklist. We include expectations about what it means to address the item sufficiently. To aid researchers new to ML-based science, we identify resources and relevant past literature.
The REFORMS checklist differs from the large body of past work on checklists in two crucial ways. First, we aimed to make our reporting standards field-agnostic, so that they can be used by researchers across fields. To that end, the items in our checklist broadly apply across fields that use ML methods. Second, past checklists for ML methods research focus on reproducibility issues that arise commonly when developing ML methods. But these issues differ from the ones that arise in scientific research. Still, past work on checklists in both scientific research and ML methods research has helped inform our checklist.
Various syntheses arise from time to time: Albertoni et al. (2023); Pineau et al. (2020).