NVIDIA Unveils Vera Rubin Superchip: A Leap Forward in AI and HPC Performance
At its recent GTC conference in Washington D.C., NVIDIA introduced the Vera Rubin Superchip, a groundbreaking advancement in high-performance computing and artificial intelligence infrastructure. This new platform combines two powerful "Rubin" GPUs with a single "Vera" CPU, all integrated into a single package. The Vera CPU features 88 custom NVIDIA cores and supports 176 threads, representing a significant step forward in custom processor design.
Unprecedented Compute Power and Memory Bandwidth
NVIDIA has set ambitious performance targets for the Vera Rubin Superchip. Each Rubin GPU is designed to deliver approximately 50 PetaFLOPS of FP4 compute, resulting in a combined 100 PetaFLOPS FP4 for the dual-GPU configuration. Engineering samples are already in testing, with mass production scheduled for 2026 and broader deployments expected in 2027.
The Rubin GPU architecture integrates two large compute chiplets, each estimated at around 830mm², and is paired with eight HBM4 memory stacks. This configuration provides about 288 GB of HBM4 per GPU, totaling roughly 576 GB of high-bandwidth memory for the entire Superchip. To further enhance system memory, NVIDIA has equipped the board with SOCAMM2 LPDDR5X modules, offering up to 1.5 TB of LPDDR5X per Vera CPU based on previous briefings. This combination ensures both high capacity and low latency, critical for demanding AI and HPC workloads.
Custom CPU Design and Advanced Interconnects
The Vera CPU marks a departure from NVIDIA’s previous reliance on Arm’s Neoverse designs, as the company has developed its own 88-core, 176-thread Arm-based architecture. The CPU appears to utilize a multi-chiplet layout, including a dedicated I/O chiplet to optimize data flow. NVLink bandwidth has also been significantly increased, reaching approximately 1.8 TB/s to support intensive CPU-to-GPU communication, which is essential for large-scale AI inference and training tasks.
Scalability for Exascale Systems
NVIDIA is positioning the Vera Rubin Superchip as the foundational building block for its next-generation NVL-class systems, designed to scale up to exascale performance levels. The NVL144 configuration, for example, is projected to deliver around 3.6 ExaFLOPS of FP4 inference and 1.2 ExaFLOPS of FP8 training. This setup offers approximately 13 TB/s of aggregate HBM4 bandwidth and about 75 TB of fast system memory per rack, making it ideal for large-scale AI deployments and scientific research.
For even greater performance, NVIDIA previewed the Rubin Ultra NVL576 family, which multiplies the number of GPUs to approach 15 ExaFLOPS FP4 and expands fast memory pools into the hundreds of terabytes. These configurations are tailored for hyperscale data centers and research institutions with the most demanding computational needs.
In addition to the Vera Rubin Superchip, NVIDIA showcased other compute tray variants, including the CPX platform, which are optimized for workloads requiring larger model context windows and increased memory capacity.
With the introduction of the Vera Rubin Superchip, NVIDIA continues to push the boundaries of AI and high-performance computing, setting new standards for compute density, memory bandwidth, and scalability in next-generation data center architectures.