2024年11月1日发(作者:荆玉石)
UNPRECEDENTED SCALE
AT EVERY SCALE
NVIDIA A100
TENSOR CORE GPU
The Most Powerful Compute Platform for
Every Workload
The NVIDIA
®
A100 Tensor Core GPU delivers unprecedented
acceleration—at every scale—to power the world’s highest-
performing elastic data centers for AI, data analytics, and
high-performance computing (HPC) applications. As the
engine of the NVIDIA data center platform, A100 provides
up to 20X higher performance over the prior NVIDIA Volta
™
generation. A100 can efficiently scale up or be partitioned
into seven isolated GPU instances, with Multi-Instance GPU
(MIG) providing a unified platform that enables elastic data
centers to dynamically adjust to shifting workload demands.
NVIDIA A100 Tensor Core technology supports a broad range
of math precisions, providing a single accelerator for every
workload. The latest generation A100 80GB doubles GPU
memory and debuts the world’s fastest memory bandwidth
at 2 terabytes per second (TB/s), speeding time to solution
for the largest models and most massive data sets.
A100 is part of the complete NVIDIA data center solution that
incorporates building blocks across hardware, networking,
software, libraries, and optimized AI models and applications
from NGC
™
. Representing the most powerful end-to-end AI
and HPC platform for data centers, it allows researchers
to deliver real-world results and deploy solutions into
production at scale.
SYSTEM SPECIFICATIONS
NVIDIA A100
for NVLink
Peak FP64
Peak FP64
Tensor Core
Peak FP32
Tensor Float 32
(TF32)
9.7 TF
19.5 TF
19.5 TF
156 TF | 312 TF*
NVIDIA A100
for PCIe
9.7 TF
19.5 TF
19.5 TF
156 TF | 312 TF*
312 TF | 624 TF*
312 TF | 624 TF*
Peak BFLOAT16 312 TF | 624 TF*
Tensor Core
Peak FP16
Tensor Core
Peak INT8
Tensor Core
Peak INT4
Tensor Core
GPU Memory
GPU Memory
Bandwidth
Interconnect
312 TF | 624 TF*
624 TOPS | 1,248 624 TOPS | 1,248
TOPS*TOPS*
1,248
TOPS | 2,496
TOPS*
40GB
1,555
GB/s
80GB
2,039
GB/s
1,248
TOPS | 2,496
TOPS*
40GB
1,555 GB/s
NVIDIA NVLink
600 GB/s**
PCIe Gen4 64 GB/s
Various instance
sizes with up to
7 MIGs @ 5 GB
PCIe
250 W
NVIDIA NVLink
600 GB/s**
PCIe Gen4 64 GB/s
Various instance
sizes with up to
7 MIGs @ 10 GB
4/8 SXM on
NVIDIA HGX
™
A100
400 W400 W
Multi-Instance
GPU
Form Factor
Max TDP Power
* With sparsity
** SXM GPUs via HGX A100 server boards; PCIe GPUs via NVLink
Bridge for up to 2 GPUs
A100 | DATAShEET | JAN21 | 1
Incredible Performance Across Workloads
Up to 3X Higher AI Training on
Largest Models
DLRM Training
3X
Up to 249X Higher AI Inference
Performance over CPUs
BERT-LARGE Inference
250X
Up to 1.25X Higher AI Inference
Performance over A100 40GB
RNN-T Inference: Single Stream
Up to 1.8X Higher Performance for
HPC Applications
Quantum Espresso
2X
3X
200X
245X
249X
2X
1
2024年11月1日发(作者:荆玉石)
UNPRECEDENTED SCALE
AT EVERY SCALE
NVIDIA A100
TENSOR CORE GPU
The Most Powerful Compute Platform for
Every Workload
The NVIDIA
®
A100 Tensor Core GPU delivers unprecedented
acceleration—at every scale—to power the world’s highest-
performing elastic data centers for AI, data analytics, and
high-performance computing (HPC) applications. As the
engine of the NVIDIA data center platform, A100 provides
up to 20X higher performance over the prior NVIDIA Volta
™
generation. A100 can efficiently scale up or be partitioned
into seven isolated GPU instances, with Multi-Instance GPU
(MIG) providing a unified platform that enables elastic data
centers to dynamically adjust to shifting workload demands.
NVIDIA A100 Tensor Core technology supports a broad range
of math precisions, providing a single accelerator for every
workload. The latest generation A100 80GB doubles GPU
memory and debuts the world’s fastest memory bandwidth
at 2 terabytes per second (TB/s), speeding time to solution
for the largest models and most massive data sets.
A100 is part of the complete NVIDIA data center solution that
incorporates building blocks across hardware, networking,
software, libraries, and optimized AI models and applications
from NGC
™
. Representing the most powerful end-to-end AI
and HPC platform for data centers, it allows researchers
to deliver real-world results and deploy solutions into
production at scale.
SYSTEM SPECIFICATIONS
NVIDIA A100
for NVLink
Peak FP64
Peak FP64
Tensor Core
Peak FP32
Tensor Float 32
(TF32)
9.7 TF
19.5 TF
19.5 TF
156 TF | 312 TF*
NVIDIA A100
for PCIe
9.7 TF
19.5 TF
19.5 TF
156 TF | 312 TF*
312 TF | 624 TF*
312 TF | 624 TF*
Peak BFLOAT16 312 TF | 624 TF*
Tensor Core
Peak FP16
Tensor Core
Peak INT8
Tensor Core
Peak INT4
Tensor Core
GPU Memory
GPU Memory
Bandwidth
Interconnect
312 TF | 624 TF*
624 TOPS | 1,248 624 TOPS | 1,248
TOPS*TOPS*
1,248
TOPS | 2,496
TOPS*
40GB
1,555
GB/s
80GB
2,039
GB/s
1,248
TOPS | 2,496
TOPS*
40GB
1,555 GB/s
NVIDIA NVLink
600 GB/s**
PCIe Gen4 64 GB/s
Various instance
sizes with up to
7 MIGs @ 5 GB
PCIe
250 W
NVIDIA NVLink
600 GB/s**
PCIe Gen4 64 GB/s
Various instance
sizes with up to
7 MIGs @ 10 GB
4/8 SXM on
NVIDIA HGX
™
A100
400 W400 W
Multi-Instance
GPU
Form Factor
Max TDP Power
* With sparsity
** SXM GPUs via HGX A100 server boards; PCIe GPUs via NVLink
Bridge for up to 2 GPUs
A100 | DATAShEET | JAN21 | 1
Incredible Performance Across Workloads
Up to 3X Higher AI Training on
Largest Models
DLRM Training
3X
Up to 249X Higher AI Inference
Performance over CPUs
BERT-LARGE Inference
250X
Up to 1.25X Higher AI Inference
Performance over A100 40GB
RNN-T Inference: Single Stream
Up to 1.8X Higher Performance for
HPC Applications
Quantum Espresso
2X
3X
200X
245X
249X
2X
1