Decide what metrics to capture to ASV for merlin-models notebooks #235

EvenOldridge · 2022-04-25T16:50:41Z

No description provided.

EvenOldridge · 2022-04-26T18:47:45Z

@bschifferer can you please review the notebooks and share what the standard outputs for these notebooks are. Can we standardize all of the notebooks to a singe set of metrics? If not what are the differences?

bschifferer · 2022-05-03T10:18:00Z

I took a look and Merlin Models have only a few set of metrics. Merlin Systems and Merlin/Merlin are based on Merlin Models examples and uses the same metrics. Can we discuss that based on the use cases/models?

My proposal is:

All Notebooks:

Runtime
% GPU utilization average/peak?

Training:

Throughput (ex/sec)
Ranking Metrics (both train and valid):
-- If Binary:
--- Recall
--- Precision
--- AUC
-- We dont have a multi-class example, yet.
Retrieval Metrics (both train and valid):
-- Loss
-- Recall@10
-- NDCG@10

Inference:

We need to add more tests for it
p50, p95, p99 latency
Same metrics as for training - we need to check, if inference is deployed correctly

@sararb @gabrielspmoreira do you want to add additional metrics for training? Are other metrics helpful to understand, if everything is running correctly?

bschifferer · 2022-05-09T09:24:48Z

After our last CI meeting, we want only track a few metrics for some specific notebooks.

@jperez999 is our CI a single or multi-GPU environment?

Currently, we use following datasets in our repositories:

MovieLens - too small, not sure if validation loss/AUC is meaningful
Aliccp - good dataset size, but dataset is pretty unbalanced, we do not have an example with high accuracy results
Outbrain - only used in one example and hasnt been updated for a while
Rossmann - not a recsys example

Proposal is to use criteo: It has a large dataset size and it is used in research community/perf benchmark, therefore, there are examples for good AUC scores.

Proposal:

7 metrics:
-- NVtabular runtime
-- TensorFlow runtime
-- TensorFlow AUC
-- PyTorch runtime
-- PyTorch AUC
-- Triton p95 latency TensorFlow
-- Triton p95 latency PyTorch
Runtime for NVTabular workflow - Criteo / 02-ETL-with-NVTabular.ipynb
-- tracks if NVTabular workflow runs efficient on large dataset
Runtime and Validation AUC for Criteo example with Merlin Models (needs to be built) for TensorFlow
-- tracks if tensorflow dataloaders are efficient
-- tracks if merlin models are efficient
-- tracks if training logic is still correct
Runtime and Validation AUC for Criteo example for PyTorch
-- tracks if pytorch dataloaders are efficient
-- tracks if training logic is still correct
-- needs to be upgraded to merlin models when PyTorch version is available
** Triton Examples needs to be defined**

bschifferer · 2022-05-17T19:00:51Z

How about HugeCTR?

karlhigley · 2022-05-17T19:04:57Z

Not quite sure I understand the Triton part: How do we intend to measure p95 latency? Do we have a way to generate realistic requests that would make p95 meaningful? Or are we just intending to allow for some variance in the latency of serving the same request over and over?

bschifferer · 2023-04-03T11:43:49Z

I think we decided to collect the metrics here: https://nvidia.slack.com/archives/CVBDJUPEZ/p1679617586380369

and we continue the ticket with NVIDIA-Merlin/models#1047 .

EvenOldridge mentioned this issue Apr 25, 2022

[RMP] Set up unit tests for Merlin example notebooks #205

Closed

5 tasks

EvenOldridge assigned bschifferer Apr 25, 2022

This was referenced May 27, 2022

[Task] Set up integration tests for Merlin example notebooks #343

Open

[Task] Set up performance metrics from integration tests for Merlin example notebooks #344

Open

This was referenced Jun 27, 2022

Decide what metrics to capture to ASV for merlin-systems notebooks #236

Closed

Decide what metrics to capture to ASV for merlin notebooks #237

Closed

Report metrics to asvdb from the notebooks #240

Closed

bschifferer closed this as completed Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide what metrics to capture to ASV for merlin-models notebooks #235

Decide what metrics to capture to ASV for merlin-models notebooks #235

EvenOldridge commented Apr 25, 2022

EvenOldridge commented Apr 26, 2022

bschifferer commented May 3, 2022 •

edited

Loading

bschifferer commented May 9, 2022 •

edited

Loading

bschifferer commented May 17, 2022

karlhigley commented May 17, 2022

bschifferer commented Apr 3, 2023

Decide what metrics to capture to ASV for merlin-models notebooks #235

Decide what metrics to capture to ASV for merlin-models notebooks #235

Comments

EvenOldridge commented Apr 25, 2022

EvenOldridge commented Apr 26, 2022

bschifferer commented May 3, 2022 • edited Loading

bschifferer commented May 9, 2022 • edited Loading

bschifferer commented May 17, 2022

karlhigley commented May 17, 2022

bschifferer commented Apr 3, 2023

bschifferer commented May 3, 2022 •

edited

Loading

bschifferer commented May 9, 2022 •

edited

Loading