TRAINING
DLRM Training
Up to 3X Higher Throughput
for AI Training on Largest Models
DLRM on HugeCTR framework, precision = FP16 | 1x DGX A100 640GB batch size = 48 | 2x DGX A100 320GB batch size = 32 | 1x DGX-2 (16x V100 32GB) batch size = 32. Speedups Normalized to Number of GPUs.
INFERENCE
RNN-T Inference: Single Stream
Up to 1.25X Higher Throughput for AI Inference
MLPerf 0.7 RNN-T measured with (1/7) MIG slices. Framework: TensorRT 7.2, dataset = LibriSpeech, precision = FP16.
Big Data Analytics Benchmark
RNN-T Inference: Single Stream
Up to 83X Higher Throughput than CPU,
2X Higher Throughput than DGX A100 320GB
Big data analytics benchmark | 30 analytical retail queries, ETL, ML, NLP on 10TB dataset | CPU: 19x Intel Xeon Gold 6252 2.10 GHz, Hadoop | 16x DGX-1 (8x V100 32GB each), RAPIDS/Dask | 12x DGX A100 320GB and 6x DGX A100 640GB, RAPIDS/Dask/BlazingSQL. Speedups Normalized to Number of GPUs