Nvidia VERA CPU
NVIDIA Vera CPU is a next-generation data-center central processing unit developed by NVIDIA, specifically engineered to support large-scale AI, high-performance computing (HPC), analytics, and cloud workloads with very high bandwidth, energy efficiency, and tight integration with GPUs. It forms one of the core components of NVIDIA’s Vera Rubin platform, which is designed as a rack-scale AI supercomputing architecture.
Overview and Positioning
Purpose:
-
Designed to serve as the CPU foundation in modern AI datacenters and AI “factories,” where traditional CPUs often become bottlenecks when paired with large fleets of AI accelerators.
-
Optimized for data movement, memory throughput, and orchestration to keep GPUs fully utilized during AI training and inference workloads.
-
Can also operate as a standalone high-performance platform for cloud services, analytics, HPC, and enterprise workloads independent of GPUs.
Integration:
-
A key part of NVIDIA’s Vera Rubin NVL72 and NVL144 system families, where Vera CPUs work tightly with Rubin GPUs, BlueField DPUs, ConnectX SuperNICs, and NVLink interconnects to deliver scalable AI compute at rack scale.
Architecture and Key Features
Core Design:
-
88 NVIDIA-designed Olympus CPU cores with 176 simultaneous threads enabled via Spatial Multithreading (a new multithreading approach where hardware resources are partitioned per thread rather than time-sliced).
-
Based on Arm® architecture (Armv9.2 compatibility) but with NVIDIA’s custom core microarchitecture optimized for throughput and efficiency.
Memory and Bandwidth:
-
Supports up to 1.5 TB of LPDDR5X system memory with extremely high bandwidth (up to 1.2 TB/s), critical for memory-intensive AI pipelines, data preparation, and cache operations.
-
Second-generation NVIDIA Scalable Coherency Fabric (SCF) ensures uniform, high-bandwidth access across all CPU cores and the memory subsystem for predictable performance.
High-Speed Interconnects:
-
NVLink-C2C (Chip-to-Chip) interface with about 1.8 TB/s coherent bandwidth connects Vera CPUs to Rubin GPUs, enabling unified memory semantics and efficient data sharing between CPU and GPU.
Advanced Capabilities:
-
Confidential computing support provides hardware-anchored security features protecting sensitive data and code across CPU and GPU domains.
-
Full support for AI-relevant precisions (including FP8) and optimizations tailored to modern AI workflows.
Role in AI and Datacenter Workloads
AI Factories:
In large AI deployments, the Vera CPU is designed to orchestrate GPU workloads by managing data movement, instruction dispatch, cache and KV-store operations, and memory hierarchy, ensuring that GPUs spend more time doing useful work rather than waiting on the CPU.
Standalone CPU Tasks:
Even independent of GPUs, Vera’s architecture targets demanding HPC, analytics, and cloud workloads where predictable throughput, large memory capacity, and high bandwidth are essential.
Comparison to Prior NVIDIA CPUs:
Compared to earlier-generation NVIDIA CPUs (such as Grace), Vera delivers significantly higher memory bandwidth and capacity, improved coherency fabric performance, and tighter GPU coupling, all tailored for next-gen AI and data-heavy workloads.
Industry Context
NVIDIA unveiled the Vera CPU as part of the Vera Rubin next-generation AI computing platform at CES 2026. The complete platform targets commercial deployment in the second half of 2026, claiming significant performance and cost-efficiency gains over the prior Blackwell- and Grace-based systems.
Comments
Post a Comment