Infrastructure Explained: Compute Power

Imagine you’re visiting a new city and want to find a restaurant that impresses your vegan in-laws. Using voice conversations on the Meta AI app, you ask, “Hey Meta, what are the best vegan options around?” 

Within seconds, Meta AI — powered by Muse Spark — responds with a list of local vegan restaurants, a short description of each restaurant’s vibe, and a map showing you exactly where the restaurants are. It’s a quick and seamless interaction that feels effortless, but behind that brief exchange are layers of calculations enabled by compute power.

What Is Compute Power?

Simply put, compute power is the measure of how much work a computer chip can do and how fast it can do it — like horsepower in a car engine. Compute power is measured in FLOPS: floating-point operations per second, or the number of calculations that a chip can perform in one second. FLOPS measure the speed of compute and gigawatts measure the scale of it, or how many chips you can keep running at once. 

When you ask Meta AI to find a vegan restaurant, it runs billions of calculations in just a few seconds. Your voice is captured, converted from sound waves into text, and routed to computers or servers inside a data center. From there, a large language model (LLM) , and the result is delivered right to your ear. 

Even simple actions, searching for a local barbershop on Instagram, require layers of computation: understanding language, processing your query, scanning an index, generating results, and delivering it back to you, all before your thumb leaves the screen. All of this processing power is made possible by processing chips inside the servers inside our data centers.  

So why does the future of AI depend on compute? Here’s a closer look.

The Building Blocks of Compute

Compute is an abstract concept, but it’s delivered by physical chips. Different chips are designed to handle different types of calculations and workloads. 

  • Central Processing Units (CPUs) are the processors in computers that make AI training and inference possible. Traditional CPUs were designed to handle tasks one at a time, and are great at managing network traffic, running application logic, and coordinating workflows across systems.

 

  • Graphics Processing Units (GPUs) are processors that were initially designed for rendering graphics, but are also great at doing thousands of calculations simultaneously — the exact kind of processing we need to power AI. Training a model to understand languages, recognize images, or even engage in conversation requires large-scale calculations running simultaneously, repeatedly, and for weeks or even months on end. Both CPUs and GPUs exist in consumer products like laptops and smart phones, but the ones in data centers are built to be much more powerful.

Helios AI rack for GPUs designed to run AI workloads

  • Custom chips are processors built for specific workloads, designed to maximize efficiency for tasks like ranking, recommendations, and generative AI. Meta has developed Meta Training and Inference Accelerator (MTIA), a family of custom silicon chips designed specifically for our AI workloads. Mainstream GPUs are typically built for large-scale AI training then applied less cost-effectively to other AI workloads like inference. MTIA takes a different approach: to prepare for the growth in AI inference demand, we build chips that are optimized for our inference workloads but are also able to support all workloads including training. This offers flexibility and efficiency that’s unmatched by any combination of general-purpose chips and enables us to innovate for the future of AI.

How Does Compute Power Meta’s AI?

At Meta, we’re building a global network of AI-optimized data centers, each designed with the flexibility to support both our AI workloads and the other workloads that are central to our apps and services. We believe that building at this scale requires a diversified approach to infrastructure. That’s why we’re sourcing silicon from a range of partners to ensure the right chips are matched with the right workload, allowing us to build and deliver new AI experiences at a faster pace

Our custom MTIA silicon is an essential part of our efforts. We’re developing and deploying four new generations of chips within the next two years to support ranking, recommendations, and generative AI workloads. In April, we announced an expanded partnership with Broadcom to co-develop multiple generations of MTIA chips.

And earlier this year, we announced a partnership with Arm to co-develop the Arm AGI CPU, — the first data center processor specifically designed to handle the massive amount of data movement demanded by AI workloads. We’ve also announced partnerships with industry leaders AWS, AMD, and NVIDIA to supply chips for our compute portfolio. 

These partnerships will enable us to continue innovating and building AI tools for the future. We recently announced Muse Spark, our most advanced AI model to date and the first LLM built by Meta Superintelligence Labs. Muse Spark is natively multimodal, processing voice, text, and images together. What makes it possible is compute at every level — from training models across thousands of GPUs to supporting billions of inferences each day on custom MTIA chips — and all of it running through efficient networks of servers at data centers around the world.  

The demand for more powerful and efficient compute will only accelerate. As AI continues to become more capable, personal, and integrated into people’s lives, we’ll keep building the infrastructure needed to power it.


Share on Threads Share on Facebook Share on X

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy