Adding links to performance benchmark page

NVIDIA · Jul 21, 2021 · 49e23b4 · 49e23b4
1 parent 3d8d878
commit 49e23b4
Show file tree

Hide file tree

Showing 52 changed files with 104 additions and 0 deletions.
diff --git a/CUDA-Optimized/FastSpeech/README.md b/CUDA-Optimized/FastSpeech/README.md
@@ -315,6 +315,8 @@ Sample result waveforms are [FP32](fastspeech/trt/samples) and [FP16](fastspeech
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/Kaldi/SpeechRecognition/README.md b/Kaldi/SpeechRecognition/README.md
@@ -192,6 +192,8 @@ you can set `count` to `1` in the [`instance_group` section](https://docs.nvidia
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 
 ### Metrics
 

diff --git a/MxNet/Classification/RN50v1.5/README.md b/MxNet/Classification/RN50v1.5/README.md
@@ -552,6 +552,8 @@ By default:
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 To benchmark training and inference, run:

diff --git a/PyTorch/Classification/ConvNets/efficientnet/README.md b/PyTorch/Classification/ConvNets/efficientnet/README.md
@@ -492,6 +492,8 @@ Quantized models could also be used to classify new images using the `classify.p
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/PyTorch/Classification/ConvNets/resnet50v1.5/README.md b/PyTorch/Classification/ConvNets/resnet50v1.5/README.md
@@ -498,6 +498,8 @@ To run inference on JPEG image using pretrained weights:
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/PyTorch/Classification/ConvNets/resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/resnext101-32x4d/README.md
@@ -481,6 +481,8 @@ To run inference on JPEG image using pretrained weights:
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md
@@ -483,6 +483,8 @@ To run inference on JPEG image using pretrained weights:
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/PyTorch/Classification/ConvNets/triton/resnet50/README.md b/PyTorch/Classification/ConvNets/triton/resnet50/README.md
@@ -325,6 +325,8 @@ we can consider that all clients are local.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 
 ### Offline scenario
 This table lists the common variable parameters for all performance measurements:

diff --git a/PyTorch/Classification/ConvNets/triton/resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/triton/resnext101-32x4d/README.md
@@ -194,6 +194,8 @@ To process static configuration logs, `triton/scripts/process_output.sh` script
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Dynamic batching performance
 The Triton Inference Server has a dynamic batching mechanism built-in that can be enabled. When it is enabled, the server creates inference batches from multiple received requests. This allows us to achieve better performance than doing inference on each single request. The single request is assumed to be a single image that needs to be inferenced. With dynamic batching enabled, the server will concatenate single image requests into an inference batch. The upper bound of the size of the inference batch is set to 64. All these parameters are configurable.
 

diff --git a/PyTorch/Classification/ConvNets/triton/se-resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/triton/se-resnext101-32x4d/README.md
@@ -195,6 +195,8 @@ To process static configuration logs, `triton/scripts/process_output.sh` script
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Dynamic batching performance
 The Triton Inference Server has a dynamic batching mechanism built-in that can be enabled. When it is enabled, the server creates inference batches from multiple received requests. This allows us to achieve better performance than doing inference on each single request. The single request is assumed to be a single image that needs to be inferenced. With dynamic batching enabled, the server will concatenate single image requests into an inference batch. The upper bound of the size of the inference batch is set to 64. All these parameters are configurable.
 

diff --git a/PyTorch/Detection/SSD/README.md b/PyTorch/Detection/SSD/README.md
@@ -565,6 +565,8 @@ To use the inference example script in your own code, you can call the `main` fu
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/PyTorch/LanguageModeling/BERT/README.md b/PyTorch/LanguageModeling/BERT/README.md
@@ -692,6 +692,8 @@ For SQuAD, to run inference interactively on question-context pairs, use the scr
 The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. More information on how to perform inference using NVIDIA Triton Inference Server can be found in [triton/README.md](./triton/README.md).
 
 ## Performance
+
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
 
 ### Benchmarking
 

diff --git a/PyTorch/LanguageModeling/BERT/triton/README.md b/PyTorch/LanguageModeling/BERT/triton/README.md
@@ -102,6 +102,8 @@ To make the machine wait until the server is initialized, and the model is ready
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 The numbers below are averages, measured on Triton on V100 32G GPU, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching). 
 
 | Format | GPUs | Batch size | Sequence length | Throughput - FP32(sequences/sec) | Throughput - mixed precision(sequences/sec) | Throughput speedup (mixed precision/FP32)  |

diff --git a/PyTorch/LanguageModeling/Transformer-XL/README.md b/PyTorch/LanguageModeling/Transformer-XL/README.md
@@ -1113,6 +1113,8 @@ perplexity on the test dataset.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model

diff --git a/PyTorch/Recommendation/DLRM/README.md b/PyTorch/Recommendation/DLRM/README.md
@@ -574,6 +574,8 @@ The NVIDIA Triton Inference Server provides a cloud inferencing solution optimiz
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/PyTorch/Recommendation/DLRM/triton/README.md b/PyTorch/Recommendation/DLRM/triton/README.md
@@ -192,6 +192,8 @@ For more information about `perf_client` please refer to [official documentation
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Throughput/Latency results
 
 Throughput is measured in recommendations/second, and latency in milliseconds.

diff --git a/PyTorch/Recommendation/NCF/README.md b/PyTorch/Recommendation/NCF/README.md
@@ -379,6 +379,8 @@ The script will then:
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 #### Training performance benchmark

diff --git a/PyTorch/Segmentation/MaskRCNN/README.md b/PyTorch/Segmentation/MaskRCNN/README.md
@@ -484,6 +484,8 @@ __Note__: The score is always the Average Precision(AP) at
   - maxDets = 100
 
 ## Performance
+
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
 
 ### Benchmarking
 Benchmarking can be performed for both training and inference. Both scripts run the Mask R-CNN model using the parameters defined in `configs/e2e_mask_rcnn_R_50_FPN_1x.yaml`. You can specify whether benchmarking is performed in FP16, TF32 or FP32 by specifying it as an argument to the benchmarking scripts.

diff --git a/PyTorch/Segmentation/nnUNet/README.md b/PyTorch/Segmentation/nnUNet/README.md
@@ -454,6 +454,8 @@ The script will then:
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks to measure the model performance in training and inference modes.

diff --git a/PyTorch/Segmentation/nnUNet/triton/README.md b/PyTorch/Segmentation/nnUNet/triton/README.md
@@ -344,6 +344,8 @@ we can consider that all clients are local.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 
 ### Offline scenario
 This table lists the common variable parameters for all performance measurements:

diff --git a/PyTorch/SpeechRecognition/Jasper/README.md b/PyTorch/SpeechRecognition/Jasper/README.md
@@ -567,6 +567,8 @@ More information on how to perform inference using Triton Inference Server with
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.
 

diff --git a/PyTorch/SpeechRecognition/Jasper/triton/README.md b/PyTorch/SpeechRecognition/Jasper/triton/README.md
@@ -274,6 +274,8 @@ For more information about `perf_client`, refer to the [official documentation](
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Inference Benchmarking in Triton Inference Server
 
 To benchmark the inference performance on Volta Turing or Ampere GPU, run `bash triton/scripts/execute_all_perf_runs.sh` according to [Quick-Start-Guide](#quick-start-guide) Step 7.

diff --git a/PyTorch/SpeechSynthesis/FastPitch/README.md b/PyTorch/SpeechSynthesis/FastPitch/README.md
@@ -532,6 +532,8 @@ More examples are presented on the website with [samples](https://fastpitch.gith
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model

diff --git a/PyTorch/SpeechSynthesis/FastPitch/triton/README.md b/PyTorch/SpeechSynthesis/FastPitch/triton/README.md
@@ -342,6 +342,8 @@ we can consider that all clients are local.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 
 
 ### Offline scenario

diff --git a/PyTorch/SpeechSynthesis/Tacotron2/README.md b/PyTorch/SpeechSynthesis/Tacotron2/README.md
@@ -524,6 +524,8 @@ python inference.py --tacotron2 <Tacotron2_checkpoint> --waveglow <WaveGlow_chec
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model

diff --git a/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/README.md b/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/README.md
@@ -160,6 +160,8 @@ By default the `./build_trtis.sh` script builds the TensorRT engines with FP16 m
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 The following tables show inference statistics for the Tacotron2 and WaveGlow
 text-to-speech system.
 The tables include average latency, latency standard deviation,

diff --git a/PyTorch/Translation/GNMT/README.md b/PyTorch/Translation/GNMT/README.md
@@ -932,6 +932,8 @@ To view all available options for inference, run `python3 translate.py --help`.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 The following section shows how to run benchmarks measuring the model
 performance in training and inference modes.

diff --git a/PyTorch/Translation/Transformer/README.md b/PyTorch/Translation/Transformer/README.md
@@ -364,6 +364,8 @@ sacrebleu -t wmt14/full -l en-de --echo src | python inference.py --buffer-size
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/TensorFlow/Classification/ConvNets/resnet50v1.5/README.md b/TensorFlow/Classification/ConvNets/resnet50v1.5/README.md
@@ -451,6 +451,8 @@ The optional `--xla` and `--amp` flags control XLA and AMP during inference.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/TensorFlow/Classification/ConvNets/resnext101-32x4d/README.md b/TensorFlow/Classification/ConvNets/resnext101-32x4d/README.md
@@ -420,6 +420,8 @@ The optional `--xla` and `--amp` flags control XLA and AMP during inference.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

diff --git a/TensorFlow/Classification/ConvNets/se-resnext101-32x4d/README.md b/TensorFlow/Classification/ConvNets/se-resnext101-32x4d/README.md
@@ -415,6 +415,8 @@ The optional `--xla` and `--amp` flags control XLA and AMP during inference.
 
 ## Performance
 
+The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
+
 ### Benchmarking
 
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.