A Decade of Advancing the State-of-the-Art in AI Through Open Research

5 months ago

Takeaways

We’re celebrating 10 years of Meta’s Fundamental AI Research (FAIR) team.
We’re also introducing some new AI models and datasets including Ego-Exo4D, Audiobox and Seamless Communication — and their breakthroughs in combining first-person and external views, audio generation and language translation.

Today we’re celebrating the 10-year anniversary of Meta’s Fundamental AI Research (FAIR) team. For the last decade, FAIR has been the source of many AI breakthroughs and a beacon for doing research in an open and responsible way. We are committed to open science and sharing our work, whether it be papers, code, models, demos or responsible use guides.

We’ve made impressive strides in the past 10 years in object detection with Segment Anything, which recognizes objects in images. Additionally, we were among the first to pioneer techniques for unsupervised machine translation, allowing us to build a model that can translate across 100 languages without relying on English. This led to our No Language Left Behind breakthrough, which most recently expanded text-to-speech and speech-to-text technology to more than 1,000 languages.

Earlier this year we released Llama, an open, pre-trained large language model, followed by Llama 2, which is free for research and commercial use. And at Connect, we unveiled new AI products and experiences that are now in the hands of millions of people — the culmination of early research work that Meta’s Generative AI and product teams built upon.

Today, we’re sharing our latest advancements in Ego-Exo4D, Audiobox and Seamless Communication

Giving AI Models Both Egocentric and Exocentric Views

In our efforts to teach AI to perceive the world through our eyes, we’ve made updates to Ego-Exo. The latest Ego-Exo4D simultaneously captures first-person (egocentric) views from a wearable camera, as well as external (exocentric) views from cameras surrounding the person. Together, these perspectives give AI models a window into what people see and hear combined with more context about the environment.

In the future, these advances in AI will allow a person wearing smart glasses to quickly pick up new skills with a virtual AI coach guiding them through a how-to video. For example, imagine watching an expert repair a bike tire, juggle a soccer ball or fold an origami swan, and then being able to map their steps to your own actions.

Generating Voices and Sound Effects With Audiobox

Earlier this year, we introduced Voicebox, a generative AI model that can help with audio editing, sampling and styling. Now Audiobox, its successor, advances generative AI for audio even further. With Audiobox, you can use voice prompts or text descriptions to describe sounds or types of speech you’d like to generate. For example, you could create a soundtrack with a prompt like, “a running river and birds chirping.” You can even generate a voice by saying, “a young woman speaks with a high pitch and fast pace.” Audiobox makes it easy to create custom audio for all of your projects.

https://about.fb.com/wp-content/uploads/2023/11/03_Audiobox_Text-to-Audio.mp4?_=1

Unlocking Seamless Language Translation

Building on our work with SeamlessM4T, we’re now introducing Seamless Communication: a suite of AI translation models that better preserve expression across languages and translate while the speaker is still talking to improve speed.

Earlier versions of language translation services often struggle to capture tone of voice, pauses and emphasis, missing important signals that help us share emotions and intent. SeamlessExpressive is the first publicly available system that unlocks expressive cross-lingual communication. It uses a model that preserves the speaker’s emotion and style, and addresses the rate and rhythm of speech. The model currently works for English, Spanish, German, French, Italian and Chinese.

SeamlessStreaming unlocks real-time conversations with someone who speaks a different language. In contrast to conventional systems which translate when the speaker has finished their sentence, SeamlessStreaming translates while the speaker is still talking, allowing the person listening to hear a translation faster.

https://about.fb.com/wp-content/uploads/2023/11/04_Seamless-Overview-1.mp4?_=2

Meta is uniquely poised to solve AI’s biggest challenges. Our investments in software, hardware and infrastructure allow us to weave learnings from our research into products that can benefit billions of people.

FAIR is a critical piece to Meta’s success, and one of the only groups in the world with all the requirements to deliver true breakthroughs: some of the brightest minds in the industry, a culture of openness, and most importantly, the freedom to conduct exploratory research. This freedom has helped us stay agile and contribute to building the future of social connection.

Responsible AI Research

We value responsible AI research and openness because sharing thoughtful work through the scrutiny of peers pushes us towards excellence and builds trust in our advances. It also allows us to collaborate with the wider community, which brings faster progress and a more diverse set of contributors. Learn more about how we’re conducting AI research responsibly.