NVIDIA Tesla T4 Hyperscaling Accelerator Card

Understand Inference Performance

With inference, speed is just the beginning of performance. To get a complete picture about inference performance, there are seven factors to consider, ranging from programmability to rate of learning. The NVIDIA TensorRT Hyperscale Inference Platform delivers on all fronts. It delivers the best inference performance at scale with the versatility to handle the growing diversity of today's networks.

Programmability

Low Latency

Accuracy

Size of Network

Throughput

Efficiency

Rate of Learning

NVIDIA T4 – Powered by Turing Tensor Cores

The NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator. Powered by NVIDIA Turing Tensor Cores, T4 brings revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Packaged in an energy-efficient 70-watt, small PCIe form factor, T4 is optimized for scale-out servers and is purpose-built to deliver state-of-the-art inference in real time.

NVIDIA Tensor Cores

The Power of NVIDIA TensorRT

NVIDIA TensorRT is a high-performance inference platform that includes an optimiser, runtime engines, and inference server to deploy applications in production. TensorRT speeds apps up to 40X over CPU-only systems for video streaming, recommendation, and natural language processing.

Production Ready Datacentre Inference

The NVIDIA TensorRT inference server is a containerised micro-service that enables applications to use AI models in datacentre production. It maximizes GPU utilization, supports all popular AI frameworks, and integrates with Kubernetes and Docker.

NVIDIA Tesla T4 Specifications

Performance
Turing Tensor Cores	320
NVIDIA CUDA Cores	2,560
Single Precision Performance (FP32)	8.1 TFLOPS
Mixed Precision (FP16/FP32)	65 FP16 TFLOPS
INT8 Precision	130 INT8 TOPS
INT4 Precision	260 INT4 TOPS
Interconnect
GEN3	x16 PCIe
Memory
Capacity	16GB GDDR6
Bandwidth	320+ GB/s
Power
Usage	70 Watts

NVIDIA Tesla T4

Introducing Hyperscale Inference