NVIDIA Datacentre GPU Buyers Guide
The GPU (Graphics Processing Unit) or graphics card is the most important component in a server. This is because it’s the GPU that does most of the work when it comes to rendering graphics and video, running simulations and AI models. This guide will teach you everything you need to know so you pick the perfect model for your servers.
What makes NVIDIA datacentre GPUs special
NVIDIA datacentre GPUs feature a whole host of extra features and capabilities that their consumer counterparts lack.
Certified Drivers
ISVs such as Autodesk, Dassault and Siemens certify their applications, ensuring optimal stability backed by enterprise-class customer support.
Enterprise Class
Enterprise-class components ensure better reliability and resiliency, reducing failure rates especially when used at full load for longer periods of time.
ECC Memory
Error correcting code (ECC) memory acts to protect data from corruption, so any errors are eradicated prior to them affecting the workload being processed.
Extended Memory
Larger onboard frame buffers than consumer GPUs enable larger and more complex renders and compute simulations to be processed.
Extended Warranty
The standard warranty provides cover for 3 years in professional environments and can be extended to total of 5 years upon request.
The NVIDIA datacentre GPU range
The following table gives an overview of which GPUs are most suitable for different workloads, ranging from machine learning (ML), deep learning (DL) and artificial intelligence (AI) - both training and inferencing - as these require quite different attributes. We also grade them for scientific compute loads often referred to as HPC, rendering and finally cloud-native NVIDIA vGPU platforms such as virtual PCs (vPC), virtual workstations (vWS) and Omniverse Enterprise.
H200 | H100 | A100 | A30 | L40S | L40 | A40 | A10 | A16 | L4 | A2 | |
---|---|---|---|---|---|---|---|---|---|---|---|
ML / DL / AI - TRAINING | Yes |
Yes |
Yes |
Yes |
Yes |
No |
No |
No |
No |
No |
No |
ML / DL / AI - INFERENCING | Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
HPC | Yes |
Yes |
Yes |
Yes |
No |
No |
No |
No |
No |
No |
No |
RENDERING | No |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
No |
No |
No |
vPC | No |
No |
No |
No |
No |
No |
No |
No |
Yes |
Yes |
Yes |
vWS | No |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
No |
OMNIVERSE | Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
No |
No |
No |
No |
H200
The H200 is the flagship datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads, such as LLMs and generative AI. It is only available in the SXM form factor, and is equipped with 16,896 CUDA cores and 528 4th gen Tensor cores plus a huge 141GB of ultra-reliable HBM3e ECC memory.
CUDA
CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.
DPX INSTRUCTIONS
DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
MIG
Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.
VISUALISATION (X/10) N/A
Real Time Ray Tracing No
VR Ready No
NVLink Yes
H100
The H100 is an extremely high performance datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in an SXM version equipped with 16,896 CUDA cores, 528 4th gen Tensor cores and 80GB of ultra-reliable ECC memory plus the H100 NVL PCIe version which differs by having the same number of cores as the SXM version but has 94GB of ultra-reliable ECC memory. There’s also an older H100 PCIe version which has 14,592 CUDA cores, 456 4th gen Tensor cores and 80GB of ultra-reliable ECC memory.
CUDA
CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.
DPX INSTRUCTIONS
DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
MIG
Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.
VISUALISATION (X/10) N/A
Real Time Ray Tracing No
VR Ready No
NVLink Yes
A100
*Long lead time, consider DGX or L40S instead
The A100 is the flagship datacentre GPU based on the older Ampere architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in both PCIe and SXM form factors, equipped with 6,192 CUDA cores and 432 3rd gen Tensor cores plus either 40 or 80GB of ultra-reliable HBM2 ECC memory.
CUDA
CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.
SPARSITY
Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
MIG
Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.
VISUALISATION (X/10) N/A
Real Time Ray Tracing No
VR Ready No
NVLink Yes
A30
The A30 is cut down version of the A100 to hit a lower price point. It is based on the same Ampere GA100 architecture and is designed for deep learning, AI and HPC workloads. It is equipped with 3,804 CUDA cores and 224 3rd gen Tensor cores plus 24GB of ultra-reliable HBM2 ECC memory.
CUDA
CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.
SPARSITY
Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
MIG
Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.
VISUALISATION (X/10) N/A
Real Time Ray Tracing No
VR Ready No
NVLink Yes
L40S
The L40S is the flagship datacentre GPU based on the Ada Lovelace architecture and is designed primarily for high-end graphics and AI workloads. It has the same overall configuration as the L40, with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory. However, the L40S features improved Tensor cores which deliver double the performance of the L40 at TF32 and TF16, making it a far superior card for training and inferencing AI models.
CUDA
CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.
RAY TRACING
Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.
VISUALISATION (X/10) 10
Real Time Ray Tracing Yes
VR Ready Yes
NVLink No
L40
The L40 is a high performance datacentre GPU based on the Ada Lovelace architecture and is designed primarily for visualisation applications. It is equipped with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 memory. The L40 should not be confused with the L40S, which has an improved Tensor core design that is twice as fast at TF32 and TF16, making the L40S a far better choice for deep learning and AI workloads.
CUDA
CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.
RAY TRACING
Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.
VISUALISATION (X/10) 10
Real Time Ray Tracing Yes
VR Ready Yes
NVLink No
A40
The A40 is the flagship datacentre GPU based on the Ampere GA102 architecture and is designed primarily for visualisation and demanding virtualised graphics. It is equipped with 10,752 CUDA cores, 336 3rd gen Tensor cores, 84 2nd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory.
CUDA
CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.
SPARSITY
Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.
DATA SCIENCE & AI
Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
VISUALISATION (X/10) 7
Real Time Ray Tracing Yes
VR Ready Yes
NVLink Yes
A10
The A10 is cut down version of the A40 to hit a lower price point. It is based on the same Ampere GA102 architecture and is designed primarily for visualisation applications and deep learning inferencing. It is equipped with 9,216 CUDA cores, 288 3rd gen Tensor cores, 72 2nd RT cores plus 24GB of ultra-reliable GDDR6 ECC memory.
CUDA
CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.
SPARSITY
Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.
DATA SCIENCE & AI
Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
VISUALISATION (X/10) 3
Real Time Ray Tracing Yes
VR Ready Yes
NVLink No
A16
The A16 is a specialist GPU accelerator for providing VDI experiences to client devices using NVIDIA vGPU services. Unlike other GPUs such as A40 which are optimised to drive relatively graphically demanding vWS sessions, the A16 is optimised to drive everyday Windows desktop applications using vPC sessions. Featuring four Ampere GPUs each with 1,280 CUDA cores and 16GB of server-grade error code correcting (ECC) memory the A16 is ideal for sessions running every day office applications, streaming video and teleconferencing tools.
CUDA
CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.
VISUALISATION (X/10) 3
Real Time Ray Tracing Yes
VR Ready No
NVLink No
L4
The L4 is a half-height low-power GPU based on the Ada Lovelace architecture and is designed primarily for deep learning inferencing plus less demanding graphics and video workloads. It is equipped with 7,680 CUDA cores, 240 Tensor cores, 60 RT cores plus 24GB of server-grade error code correcting (ECC) GDDR6 memory.
COMPUTE
CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.
DATA SCIENCE & AI
Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.
VISUALISATION (X/10) 4
Real Time Ray Tracing Yes
VR Ready Yes
NVLink No
A2
The A2 is a compact, half-height GPU based on the Ampere GA102 architecture and is designed primarily for deep learning inferencing. It is equipped with 1,280 CUDA cores, 40 3rd gen Tensor cores, 10 2nd RT cores plus 16GB of server-grade error code correcting (ECC) GDDR6 memory.
COMPUTE
CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.
SPARSITY
Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.
DATA SCIENCE & AI
Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.
VISUALISATION (X/10) 3
Real Time Ray Tracing Yes
VR Ready No
NVLink No
NVIDIA Professional datacentre GPU Summary
The below table summarises each GPUs performance along with their technical specifications.
H200 | H100 | A100 | A30 | L40S | L40 | A40 | A10 | A16 | L4 | A2 | |
---|---|---|---|---|---|---|---|---|---|---|---|
PERFORMANCE & CAPABILITIES | |||||||||||
VISUALISATION (X/10) | N/A | N/A | N/A | N/A | 10 | 10 | 7 | 3 | 3 | 4 | 3 |
DOUBLE PRECISION / FP64 (TFLOPS) | 67 | 34 / 26 | 9.7 | 5.2 | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
SINGLE PRECISION / FP32 (TFLOPS) | 989 | 989 / 756 | 312 | 165 | 366 | 181 | 149.6 | 125 | 4 x 18 | 120 | 18 |
HALF PRECISION / FP16 (TFLOPS) | 1,979 | 1,979 / 1,513 | 624 | 330 | 733 | 362 | 299.4 | 250 | 4 x 35.9 | 242 | 36 |
RAY TRACING | No |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
VR READY | No |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
No |
NVLINK | Yes |
Yes |
Yes |
Yes |
No |
No |
Yes |
No |
No |
No |
No |
SPECS | |||||||||||
ARCHITECTURE | Hopper | Hopper | Ampere | Ampere | Ada Lovelace | Ada Lovelace | Ampere | Ampere | Ampere | Ada Lovelace | Ampere |
FORM FACTOR | SXM5 |
SXM5/
PCIe 5 |
SXM4/
PCIe 4 |
PCIe 4 | PCIe 4 | PCIe 4 | PCIe 4 | PCIe 4 | PCIe 4 | PCIe 4 | PCIe 4 |
GPU | H200 | H100 | GA100 | GA100 | AD102 | AD102 | GA102 | GA102 | GA102 | AD104 | GA102 |
CUDA CORES | 16,896 | 16,896 or 14,592 | 6,912 | 3,804 | 18,176 | 18,176 | 10,752 | 9,216 | 4x 1,280 | 7,680 | 1,280 |
TENSOR CORES |
528
4th gen |
528 or 456 4th gen |
432
3rd gen |
224
3rd gen |
568
4th gen |
568
4th gen |
336
3rd gen |
288
3rd gen |
4x40
3rd gen |
240
4th gen |
40
3rd gen |
RT CORES | 0 | 0 | 0 | 0 |
142
3rd gen |
142
3rd gen |
84
2nd gen |
72
2nd gen |
4x10
2nd gen |
60
3rd gen |
10
2nd gen |
MEMORY | 141GB HBM3e | 80 or 94GB HBM3 | 40 or 80GB HBM2 | 24GB HBM2 | 48GB GDDR6 | 48GB GDDR6 | 48GB GDDR6 | 24GB GDDR6 | 4x 16GB GDDR6 | 24GB GDDR6 | 16GB GDDR6 |
ECC MEMORY | Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
MEMORY CONTROLLER | 5,120-bit | 5,120-bit | 5,120-bit | 3,072-bit | 384-bit | 384-bit | 384-bit | 384-bit | 384-bit | 192-bit | 128-bit |
NVLINK SPEED | 900GB/sec | 900GB/sec | 600GB/sec | 200GB/sec | No |
No |
112GB/sec | No |
No |
No |
No |
TDP | 300W-700W | 300W-700W | 250W | 165W | 350W | 300W | 300W | 150W | 250W | 72W | 60W |
Ready to buy?
All NVIDIA datacentre GPUs must be purchased as part of a 3XS Systems server build, rather than being able to buy them standalone like their workstation counterparts. For organisations that fall into either higher education or further education sectors, supported pricing can be obtained that will be applied to the entire server build.
We hope you’ve found this NVIDIA datacentre GPU buyer’s guide helpful, however if you would like further advice on choosing the correct GPU for your use case or project, then don’t hesitate to get in touch on 01204 474747 or at [email protected] .