Elevata

Article

NVIDIA GTC 2026: What Actually Matters for AI Teams Building on AWS

PPaulo FrugisCTO at ElevataApril 8, 20266 min read

Every year, NVIDIA's GTC conference sets the trajectory for AI infrastructure. GTC 2026, held in late March, was one of the denser editions in recent memory. Not because of volume, but because several announcements will have direct, near-term implications for teams running AI workloads in production. This is a recap of what caught our attention and what we think warrants a closer look.

The shift from training to inference, and why it is more important than it sounds

Jensen Huang opened GTC with a projection that should recalibrate how you think about AI infrastructure spend: a $1 trillion market for AI infrastructure by 2027. The headline is dramatic, but the underlying logic is what matters.

The training era, the period from roughly 2023 to 2025 when the industry's primary effort was building and scaling large models, is largely behind us. The next phase is inference: running those models at scale, cost-effectively, in production. NVIDIA built GTC 2026 around this transition, and the architectural choices reflect it.

Vera Rubin: the architecture built for inference

The centrepiece announcement was the Vera Rubin architecture, set for availability in the second half of 2026 on major cloud providers including AWS. The headline figure is 10x more inference efficiency per watt versus the current Blackwell generation.

To put that in context: Blackwell is already the performance benchmark for GPU-based inference today. If you're running workloads on P-class instances, you're in good shape. Vera Rubin comes in on top of that: 10x better energy efficiency, 4x more compute density, and optimised specifically for Mixture-of-Experts models and long-context inference.

The practical implication is that as Vera Rubin instances become available on AWS, teams will be able to run more inference per dollar than they can today, or run the same workloads at meaningfully lower cost. The full chain of how this translates to actual instance pricing takes time to settle, but the directional impact is clear.

A few other details worth noting on the hardware side:

NVLink 5 and the Vera Rubin NVL72 rack. The reference configuration is a full rack of 72 interconnected GPUs with fifth-generation NVLink and liquid cooling. This is the substrate for the largest-scale inference deployments.

Groq 3 LPU integration. After NVIDIA's acquisition of Groq in 2025, the Groq 3 LPU chip is being integrated into the Vera Rubin ecosystem. Groq built its reputation on ultra-low latency inference. It was used in Formula 1 telemetry for real-time decision-making. That capability now comes into the NVIDIA stack, with direct implications for latency-sensitive applications in finance, healthcare, and any context where inference needs to be measured in milliseconds.

Feynman on the horizon. NVIDIA also previewed the successor architecture, Feynman, targeting 2028. It introduces 3D die stacking and a 1.6nm process node. Details are limited (they are still launching Vera Rubin), but the roadmap signal is clear.

Agentic AI gets a software stack

The hardware story gets most of the attention, but the software announcements at GTC 2026 may be equally consequential for teams building AI systems today.

NemoClaw is NVIDIA's enterprise platform for deploying autonomous agents. Think of it as an orchestration layer, built on NVIDIA NIM microservices, that handles multi-step reasoning and self-critique, validating each subtask before the agent proceeds. The practical effect is more reliable agentic behaviour without having to build that validation logic yourself.

NemoClaw is also based on OpenClaw, an open-source project for standardising communication between agent systems and external resources. That includes direct integration with AWS Bedrock Agent Core, meaning agents built in this framework can call out to Bedrock services natively.

Nemotron 3 Super is a separate announcement aimed at edge and local deployment. It runs on RTX-class hardware and supports a context window of up to 1 million tokens. For teams dealing with high-volume document processing (legal, financial, or otherwise), that context window changes what is tractable without chunking.

Physical AI: robots, space, and gaming

GTC 2026 also pushed firmly into physical AI, a category that felt speculative a few years ago and now has concrete industry traction.

Project Groot 2 is NVIDIA's foundation model for humanoid robots, focused on spatial reasoning and motor coordination. The Newton physics simulator, co-developed with Disney and DeepMind and accelerated on GPU, provides the training environment. Several automotive and robotics manufacturers have already adopted the RoboTaxi reference platform built on this stack.

NVIDIA Space-1 was the announcement that generated the most discussion in our session: a data centre module designed to operate in orbit, enabling real-time geospatial data processing without the round-trip to a ground-based facility. The applications for climate monitoring and time-sensitive geospatial intelligence are significant.

On the gaming side, DLSS 5 moves from upscaling to full frame generation using neural networks, and GeForce NOW VR brings 90 FPS VR streaming directly from the cloud, a latency barrier the industry has been trying to break for years.

What this means for AWS users

AWS was named NVIDIA's primary scale partner at GTC 2026, which has a few concrete implications:

AWS will be among the first cloud providers to receive Vera Rubin instances. Teams building LLM-based applications on Bedrock, EKS, or EC2 will have early access to the 10x inference efficiency gains when those instances become available in the second half of 2026.

Project Ceiba, the AWS/NVIDIA supercomputer collaboration, currently runs 414 exaflops across more than 20,000 Blackwell GPUs. It is slated to be upgraded to Vera Rubin, which will push that figure considerably higher. This is the infrastructure that underpins the most demanding AI workloads running in AWS today.

AWS has also committed to deploying more than 1 million NVIDIA GPUs (across both Blackwell and Vera Rubin) by the end of 2027. That is a significant infrastructure investment that signals the depth of the partnership.

The NemoClaw and NIM integrations with Amazon Bedrock are particularly relevant for teams building agentic systems. The ability to deploy autonomous agents inside your own VPC, with Bedrock as the model layer and NemoClaw handling orchestration, means more capable agents with the data sovereignty guarantees AWS customers expect.

Finally, NVLink Fusion enables NVIDIA GPUs to work directly alongside AWS's own silicon (Trainium and Graviton) in the same workload. For teams already using Graviton instances (which carry a meaningful cost and performance advantage over comparable x86 instances), this composability is worth tracking.

Takeaway

GTC 2026 confirmed that the infrastructure layer for AI is entering a new phase. Training at scale is largely solved; the competition now is on inference efficiency, agentic capability, and the economics of running AI in production. Vera Rubin is purpose-built for that moment, and the AWS partnership means those gains will be accessible to cloud-based teams relatively quickly.

The specific numbers (10x efficiency, 4x density) will translate differently into actual workload economics depending on how AWS prices the new instances and how the integration matures. But the direction is clear, and the teams best positioned to benefit are those who have already moved AI workloads into production rather than leaving them at the proof-of-concept stage.

If you want to dig into any of these announcements in the context of your own AI workloads, whether you are evaluating inference infrastructure, building agentic systems on AWS, or looking to optimise what you are already running, we are happy to talk. Reach out at sales@elevata.io or via elevata.io.

Related

Continue reading

Related reading on this topic.