From GPU Gold Rush To Geopolitical Risk: Why You Must Secure AI Infrastructure Now

view original post

Yuriy Bulygin is CEO and co-founder of Eclypsium.

In today’s race for AI leadership, the battleground isn’t just about hiring the best talent or building the most powerful models. It’s just as much about securing the infrastructure that undergirds the AI economy. Whether companies build their own AI compute environments or outsource to providers, their competitive advantage will depend on how well that infrastructure is protected. The enterprises that demand and verify strong security across every layer of their AI stack will be best positioned to lead their markets.

To support this new phase of growth, a massive infusion of capital is flooding into AI infrastructure. From the $500 billion Stargate Project and CoreWeave’s IPO to the HUMAIN initiative in the Middle East with multibillion-dollar commitments from AMD, Nvidia, AWS and others, the scale of investment is unprecedented.

However, while headlines focus on performance benchmarks and GPU availability, a less visible threat looms—security vulnerabilities in the hardware, firmware and complex supply chains that underpin all of this AI infrastructure.

AI data centers are used for model training and inference tasks, and the infrastructure primarily consists of custom bare metal servers and networking equipment. If this infrastructure is breached, AI data centers don’t just risk operational downtime; they become conduits for stolen IP, compromised models and long-term reputational damage.

AI Infrastructure: The Next Critical Target

The security assumptions that held true for traditional cloud environments are breaking down in the face of AI’s Byzantine complexity. Shared GPUs, highly sensitive training data and globally distributed supply chains introduce a new breed of risks—one that even government agencies are now sounding the alarm on.

In May 2025, the U.S. Cybersecurity and Infrastructure Security Agency (CISA) joined international partners in issuing new guidance urging secure infrastructure and trusted computing environments for AI workloads. Nvidia’s Jensen Huang put it more bluntly, saying that his company was “not a technology company only anymore” but “an essential infrastructure company.”

It’s a sentiment that’s increasingly shaping how policymakers define critical infrastructure, with momentum building on Capitol Hill to formally designate AI infrastructure as part of the nation’s critical infrastructure, recognizing its growing role in economic stability, national security and public trust.

AI’s growing complexity creates a perfect storm of operational risk. A single rack in a hyperscale AI deployment can contain hundreds of thousands of components, often sourced from dozens of vendors across multiple countries. Ensuring the integrity of that hardware—whether it’s across delivery, deployment and operational use—is a monumental challenge.

Recent events have underscored this fragility. Actively exploited vulnerabilities in AMI’s MegaRAC firmware, the discovery of critical flaws in Nvidia GPUs and Nvidia’s own research into BMC vulnerabilities have shown how even widely deployed AI infrastructure can become stealthy attack vectors. Once embedded, such threats are difficult to detect and nearly impossible to remediate at scale.

Meanwhile, industries racing to deploy GenAI at speed risk falling behind in cybersecurity. This trade-off is unsustainable for AI infrastructure. Without secure compute, network and storage hardware infrastructure, model parameters, inference data and intellectual property are at risk of being exposed or poisoned.

For business and security executives, the question is no longer whether AI infrastructure needs to be secured but, rather, how much exposure exists today and what role leadership must play in addressing it.

Vetting AI Partners: Three AI Security Questions That Demand Answers

The AI data centers of tomorrow must offer not just speed and scale but also provable guarantees of confidentiality, integrity and supply chain trust. That means requiring cryptographic verification and attestation of firmware and hardware assets at every stage to detect tampering or counterfeit components before they compromise critical workloads.

CISOs and CIOs must transform these principles into procurement criteria by asking:

1. How do you ensure your GPUs and GPU servers don’t contain critical vulnerabilities before deployment and during operation?

As we have seen with Nvidia DGX vulnerabilities and BMC firmware attacks, hardware components in AI infrastructure can introduce critical vulnerabilities that put entire data centers at risk. Scanning critical hardware for vulnerabilities both before deployment and continuously while in production is a must.

2. How do you verify trust across a fragmented supply chain?

With components sourced from dozens of countries, AI hardware and firmware are prime targets for tampering. For example, the actively exploited AMI MegaRAC firmware vulnerability and the BlackLotus UEFI Bootkit exposed how even widely deployed firmware can become an entry point for attackers.

Leading providers must implement cryptographic verification (e.g., secure boot, TPMs, DICE) and continuous firmware monitoring. Trust cannot be assumed. Rather, demand transparency from vendors and partners about how they verify hardware and firmware security across global supply chains.

3. Can you verify that your hardware vendors offer and enable built-in hardware security capabilities?

Hardware is the foundation of AI infrastructure, so securing it should be a top priority. Anyone building out AI data centers must be able to validate that their hardware vendors offer and enable built-in hardware security capabilities such as secure boot, hardware root of trust, TPM attestation, runtime memory encryption capabilities, DMA protection, runtime memory exploit prevention capabilities and confidential computing.

Compromised hardware undermines every security control. Your defenses must reflect that reality, whether you’re a hyperscaler or an enterprise building on top of one.

The global race for AI dominance shows little sign of slowing down. In that race, however, resilience will matter as much as speed. It’s a matter of protecting critical infrastructure for national AI dominance. For those leading the charge, the ability to secure the silicon, the code and the supply chains behind it will determine who builds the future, not just who fuels it. The organizations that get this right will not only avoid costly breaches, but they’ll also gain a strategic edge in the increasingly AI-driven global economy.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?