Introducing Our Next Generation Infrastructure for AI

2 years ago

Takeaways

We’re introducing the next generation of Meta’s custom-made chips designed for our AI workloads.
This latest version significantly improves performance compared to the last generation and helps power our ranking and recommendation ads models on Facebook and Instagram.
These chips are part of our growing investment in our AI infrastructure and will enable us to deliver new and better experiences across our apps and technologies.

The next generation of Meta’s large-scale infrastructure is being built with AI in mind, including supporting new generative AI products, recommendation systems and advanced AI research. It’s an investment we expect will grow in the years ahead, as the compute requirements to support AI models increase alongside the models’ sophistication.

Last year, we unveiled our Meta Training and Inference Accelerator (MTIA) v1, our first-generation AI inference accelerator that we designed in-house with Meta’s AI workloads in mind. It was designed specifically for our deep learning recommendation models that are improving a variety of experiences across our apps and technologies.

MTIA is a long-term bet to provide the most efficient architecture for Meta’s unique workloads. As AI workloads become increasingly important to our products and services, this efficiency will be central to our ability to provide the best experiences for our users around the world. MTIA v1 was an important step in improving the compute efficiency of our infrastructure and better supporting our software developers as they build AI models that will facilitate new and better user experiences.

The next generation of MTIA is part of our broader full-stack development program for custom, domain-specific silicon that addresses our unique workloads and systems. This new version of MTIA more than doubles the compute and memory bandwidth of our previous solution while maintaining our close tie-in to our workloads. It is designed to efficiently serve the ranking and recommendation models that provide high-quality recommendations to users.

This chip’s architecture is fundamentally focused on providing the right balance of compute, memory bandwidth and memory capacity for serving ranking and recommendation models.

MTIA has been deployed in our data centers and is now serving models in production. We are already seeing the positive results of this program as it’s allowing us to dedicate and invest in more compute power for our more intensive AI workloads.

The results so far show that this MTIA chip can handle both low complexity and high complexity ranking and recommendation models which are key components of Meta’s products. Because we control the whole stack, we can achieve greater efficiency compared to commercially available GPUs (graphics processing units).

Meta’s Ongoing Investment in Custom Silicon

MTIA will be an important piece of our long-term roadmap to build and scale the most powerful and efficient infrastructure possible for Meta’s unique AI workloads.

We’re designing our custom silicon to work in cooperation with our existing infrastructure as well as with new, more advanced hardware (including next-generation GPUs) that we may leverage in the future. Meeting our ambitions for our custom silicon means investing not only in compute silicon but also in memory bandwidth, networking and capacity, as well as other next-generation hardware systems.

We currently have several programs underway aimed at expanding the scope of MTIA, including support for GenAI workloads. And we’re only at the beginning of this journey.