Skip to content

RutgersCSSystems/mlperf-hpc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

MLPerf HPC

This is a landing page for the MLPerf HPC working group and benchmark suite. This material will eventually be moved into https://github.com/mlcommons.

Overview

The HPC working group in MLPerf is working on ML benchmarking on supercomputers. We are currently working on publishing a v0.5 HPC training benchmark suite. For an overview, please see these slides from the MLBench workshop at ISPASS 2020.

Group chairs:

  • Steven Farrell, Lawrence Berkeley National Laboratory
  • Jacob Balma, Hewlett Packard Enterprise
  • Abid Malik, Brookhaven National Laboratory

Deputy chairs:

  • Murali Emani, Argonne National Laboratory
  • Aristeidis Tsaris, Oak Ridge National Laboratory

To join the group, first join the general MLPerf group and then the HPC working group as described here: https://mlperf.org/get-involved/

Benchmarks

Reference implementations are at benchmarks.

Datasets

Instructions to acquire the datasets are given in the reference implementation READMEs.

Rules

The MLPerf HPC v0.5 rules are based on the MLPerf Training v0.7 rules with some adjustments.

The MLPerf Training rules are available at training_rules.

The MLPerf HPC specific rules are at hpc_training_rules.

Compliance

The MLPerf logging package implements logging and compliance-checking utilities for MLPerf benchmarks. We have a temporary fork and hpc-0.5.0 branch in which we are adding support for MLPerf-HPC v0.5 at https://github.com/mlperf-hpc/logging/tree/hpc-0.5.0

To install and test compliance of your runs/submissions:

# Install the package into your python environment.
# A development install (-e) is recommended for now so you can pull new updates.
git clone -b hpc-0.5.0 https://github.com/mlperf-hpc/logging.git mlperf-logging
pip install [--user] -e mlperf-logging

# Test compliance of a specific mlperf hpc log file
python -m mlperf_logging.compliance_checker --ruleset hpc_0.5.0 $logFile

# Test a system description file (we just use the Training v0.7 rules)
python -m mlperf_logging.system_desc_checker $jsonFile training 0.7.0

# Test a full submission folder
python -m mlperf_logging.package_checker $submissionDir hpc 0.5.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published