NVIDIA Datacentre GPU Buyers Guide

NVIDIA RTX Pro 6000 Server Edition

The GPU (Graphics Processing Unit) or graphics card is the most important component in a server. This is because it's the GPU that does most of the work when it comes to rendering graphics and video, running simulations and AI models. This guide will teach you everything you need to know so you pick the perfect model for your servers.

What makes NVIDIA Datacentre GPUs Special

NVIDIA datacentre GPUs feature a whole host of extra features and capabilities that their consumer counterparts lack.

encrypted

Certified Drivers

ISVs such as Autodesk, Dassault and Siemens certify their applications, ensuring optimal stability backed by enterprise-class customer support.

memory_alt

Enterprise Class

Enterprise-class components ensure better reliability and resiliency, reducing failure rates especially when used at full load for longer periods of time.

running_with_errors

ECC Memory

Error correcting code (ECC) memory acts to protect data from corruption, so any errors are eradicated prior to them affecting the workload being processed.

memory

Extended Memory

Larger onboard frame buffers than consumer GPUs enable larger and more complex renders and compute simulations to be processed.

lock

Security

USB-C ports can be disabled, increasing data integrity when installed in secure environments or when used with sensitive information

inventory

Extended Warranty

The standard warranty provides cover for 3 years in professional environments and can be extended to total of 5 years upon request.

The NVIDIA Datacentre GPU Range

The following table gives an overview of which GPUs are most suitable for different workloads, ranging from machine learning (ML), deep learning (DL) and artificial intelligence (AI) - both training and inferencing - as these require quite different attributes. We also grade them for scientific compute loads often referred to as HPC, rendering and finally cloud-native NVIDIA vGPU platforms such as virtual PCs (vPC), virtual workstations (vWS) and Omniverse Enterprise.

RTX PRO 6000 Blackwell Server H200 H100 A100 A30 L40S L40 A40 A10 A16 L4 A2
ML / DL / AI - TRAINING
ML / DL / AI - INFERENCING
HPC
RENDERING
vPC
vWS
OMNIVERSE

RTX PRO 6000 Blackwell Server

RTX Pro 6000 Blackwell Server

The RTX PRO 6000 Blackwell Server is a powerful datacentre PCIe GPU based on the Blackwell architecture and is designed for the most demanding deep learning, AI and HPC workloads, such as LLMs and generative AI. It is equipped with 24,604 CUDA cores, 752 5th gen Tensor cores, 188 4th gen RT cores plus a huge 96GB of ultra-reliable GDDR7 ECC memory.

CUDA

CUDA cores are the workhorse in Blackwell GPUs, as the architecture supports many cores and accelerates workloads up to 28% (FP32) faster than the previous Ada Lovelace generation

RAY TRACING

Blackwell GPUs feature fourth generation RT cores delivering up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fifth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ada Lovelace GPUs, they also support FP4 precision.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as four independent instances, giving multiple users access to GPU acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
10
VIEW RANGE

H200

h200 Graphics Card

The H200 is the flagship datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads, such as LLMs and generative AI. It is only available in SXM and PCIe versions, both equipped with 16,896 CUDA cores and 528 4th gen Tensor cores plus a huge 141GB of ultra-reliable HBM3e ECC memory. The PCIe version has lower performance.

CUDA

CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DPX INSTRUCTIONS

DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
N/A
VIEW RANGE

H100

H100 Graphics Card

The H100 is an extremely high performance datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in an SXM version equipped with 16,896 CUDA cores, 528 4th gen Tensor cores and 80GB of ultra-reliable ECC memory plus the H100 NVL PCIe version which differs by having the same number of cores as the SXM version but has 94GB of ultra-reliable ECC memory. There's also an older H100 PCIe version which has 14,592 CUDA cores, 456 4th gen Tensor cores and 80GB of ultra-reliable ECC memory.

CUDA

CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DPX INSTRUCTIONS

DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
N/A
VIEW RANGE

A100

A100 Graphics Card

The A100 is the flagship datacentre GPU based on the older Ampere architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in both PCIe and SXM form factors, equipped with 6,192 CUDA cores and 432 3rd gen Tensor cores plus either 40 or 80GB of ultra-reliable HBM2 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
N/A
VIEW RANGE

A30

A30 Graphics Card

The A30 is cut down version of the A100 to hit a lower price point. It is based on the same Ampere GA100 architecture and is designed for deep learning, AI and HPC workloads. It is equipped with 3,804 CUDA cores and 224 3rd gen Tensor cores plus 24GB of ultra-reliable HBM2 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
N/A
VIEW RANGE

L40S

L40S Graphics Card

The L40S is the flagship datacentre GPU based on the Ada Lovelace architecture and is designed primarily for high-end graphics and AI workloads. It has the same overall configuration as the L40, with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory. However, the L40S features improved Tensor cores which deliver double the performance of the L40 at TF32 and TF16, making it a far superior card for training and inferencing AI models.

CUDA

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

RAY TRACING

Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
10
VIEW RANGE

L40

L40 Graphics Card

The L40 is a high performance datacentre GPU based on the Ada Lovelace architecture and is designed primarily for visualisation applications. It is equipped with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 memory. The L40 should not be confused with the L40S, which has an improved Tensor core design that is twice as fast at TF32 and TF16, making the L40S a far better choice for deep learning and AI workloads.

CUDA

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

RAY TRACING

Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
10
VIEW RANGE

A40

A40 Graphics Card

The A40 is the flagship datacentre GPU based on the Ampere GA102 architecture and is designed primarily for visualisation and demanding virtualised graphics. It is equipped with 10,752 CUDA cores, 336 3rd gen Tensor cores, 84 2nd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
7
VIEW RANGE

A10

A10 Graphics Card

The A10 is cut down version of the A40 to hit a lower price point. It is based on the same Ampere GA102 architecture and is designed primarily for visualisation applications and deep learning inferencing. It is equipped with 9,216 CUDA cores, 288 3rd gen Tensor cores, 72 2nd RT cores plus 24GB of ultra-reliable GDDR6 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
3
VIEW RANGE

A16

A16 Graphics Card

The A16 is a specialist GPU accelerator for providing VDI experiences to client devices using NVIDIA vGPU services. Unlike other GPUs such as A40 which are optimised to drive relatively graphically demanding vWS sessions, the A16 is optimised to drive everyday Windows desktop applications using vPC sessions. Featuring four Ampere GPUs each with 1,280 CUDA cores and 16GB of server-grade error code correcting (ECC) memory the A16 is ideal for sessions running every day office applications, streaming video and teleconferencing tools.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
3
VIEW RANGE

L4

L4 Graphics Card

The L4 is a half-height low-power GPU based on the Ada Lovelace architecture and is designed primarily for deep learning inferencing plus less demanding graphics and video workloads. It is equipped with 7,680 CUDA cores, 240 Tensor cores, 60 RT cores plus 24GB of server-grade error code correcting (ECC) GDDR6 memory.

COMPUTE

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
4
VIEW RANGE

A2

A2 Graphics Card

The A2 is a compact, half-height GPU based on the Ampere GA102 architecture and is designed primarily for deep learning inferencing. It is equipped with 1,280 CUDA cores, 40 3rd gen Tensor cores, 10 2nd RT cores plus 16GB of server-grade error code correcting (ECC) GDDR6 memory.

COMPUTE

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VISUALISATION (X/10) Real Time Ray Tracing VR Ready NVLink
3
VIEW RANGE

NVIDIA Professional Datacentre GPU Summary

The below table summarises each GPUs performance along with their technical specifications.

RTX PRO 6000 Blackwell Server H200 H100 A100 A30 L40S L40 A40 A10 A16 L4 A2
VISUALISATION (X/10) 11 N/A N/A N/A N/A 10 10 7 3 3 4 3
DOUBLE PRECISION / FP64 (TFLOPS) TBC 67 / 60 34 / 26 9.7 5.2 N/A N/A N/A N/A N/A N/A N/A
SINGLE PRECISION / FP32 (TFLOPS) TBC 898 / 835 989 / 756 312 165 366 181 149.6 125 4 x 18 120 18
HALF PRECISION / FP16 (TFLOPS) TBC 1,979 / 1,671 1,979 / 1,513 624 330 733 362 299.4 250 4 x 35.9 242 36
RAY TRACING
VR READY
NVLINK
ARCHITECTURE Blackwell Hopper Hopper Ampere Ampere Ada Lovelace Ada Lovelace Ampere Ampere Ampere Ada Lovelace Ampere
FORM FACTOR PCIe 5 SXM5 / PCIe 5 SXM5/ PCIe 5 SXM4/ PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4
GPU RTX PRO 6000 H200 H100 GA100 GA100 AD102 AD102 GA102 GA102 GA102 AD104 GA102
CUDA CORES 24,604 16,896 16,896 or 14,592 6,912 3,804 18,176 18,176 10,752 9,216 4x 1,280 7,680 1,280
TENSOR CORES 752 5th gen 528 4th gen 528 or 456 4th gen 432 3rd gen 224 3rd gen 568 4th gen 568 4th gen 336 3rd gen 288 3rd gen 4x40 3rd gen 240 4th gen 40 3rd gen
RT CORES 188 4th gen 0 0 0 0 142 3rd gen 142 3rd gen 84 2nd gen 72 2nd gen 4x10 2nd gen 60 3rd gen 10 2nd gen
MEMORY 96GB GDDR7 141GB HBM3e 80 or 94GB HBM3 40 or 80GB HBM2 24GB HBM2 48GB GDDR6 48GB GDDR6 48GB GDDR6 24GB GDDR6 4x 16GB GDDR6 24GB GDDR6 16GB GDDR6
ECC MEMORY
MEMORY CONTROLLER 512-bit 5,120-bit 5,120-bit 5,120-bit 3,072-bit 384-bit 384-bit 384-bit 384-bit 384-bit 192-bit 128-bit
NVLINK SPEED 900GB/sec 900GB/sec 600GB/sec 200GB/sec 112GB/sec
TDP 600W 300W-700W 300W-700W 250W 165W 350W 300W 300W 150W 250W 72W 60W

Ready to Buy?

All NVIDIA datacentre GPUs must be purchased as part of a 3XS Systems server build, rather than being able to buy them standalone like their workstation counterparts. For organisations that fall into either higher education or further education sectors, supported pricing can be obtained that will be applied to the entire server build.

GPU-ACCELERATED SERVERS FOR GRAPHICS

CONFIGURE NOW

GPU-ACCELERATED SERVERS FOR VIRTUALISATION

CONFIGURE NOW

GPU-ACCELERATED SERVERS FOR DEEP LEARNING & AI

CONFIGURE NOW

We hope you've found this NVIDIA datacentre GPU buyer's guide helpful, however if you would like further advice on choosing the correct GPU for your use case or project, then don't hesitate to get in touch on 01204 474747 or at [email protected].