New Segment Anything Models Make it Easier to Detect Objects and Create 3D Reconstructions

Takeaways

We’re introducing SAM 3 and SAM 3D, the newest additions to our Segment Anything Collection, which advance AI understanding of the visual world.

SAM 3 enables detection and tracking of objects in images and video using text and visual prompts, and SAM 3D enables reconstruction of 3D objects and people from a single image.

You can try out SAM 3 and SAM 3D on the Segment Anything Playground, our new platform for experimenting with the capabilities of our state-of-the-art SAM models.

Today, we’re excited to announce SAM 3 and SAM 3D, the newest additions to our Segment Anything Collection. SAM 3 enables detection and tracking of objects in images and video using text and visual prompts, and SAM 3D enables 3D reconstruction of objects and people based on a single image. You can experiment with both models now on our new platform, Segment Anything Playground.

These advancements will help us build the next generation of creative media tools, simplifying video editing and giving us new ways to interact with and understand the visual world.

Say What You Mean

SAM 3 makes it easy to detect, segment, and track objects in images and video — functions that can be used to edit and transform videos and images. SAM 1 and 2 supported segmentation based on visual prompts, and now, SAM 3 enables you to segment using detailed text prompts describing the objects you want to segment.

Traditionally, AI models have struggled to link language to specific visual elements in images or videos. Existing models typically have a fixed set of text labels and are able to segment simple concepts like “bus” or “car,” but struggle with more detailed concepts like “yellow school bus.”

SAM 3 overcomes this limitation, accepting a much larger range of text prompts. Type in “red baseball cap” and SAM 3 will segment all matching objects in the image or video. SAM 3 can also be used with multimodal large language models to understand longer, more complex text prompts, like “people sitting down, but not wearing a red baseball cap.”

We’re using SAM 3 to build a new generation of creative media tools. In Edits, our video creation app, we will soon introduce effects that creators can apply to specific people or objects in their videos. New SAM 3-enabled creation experiences will also be coming to Vibes on the Meta AI app and meta.ai.

Bring a Picture to Life

SAM 3D consists of two open source models that enable you to reconstruct a 3D object from a single image, setting a new standard for AI-guided 3D reconstruction of the physical world. SAM 3D Objects enables object and scene reconstruction, while SAM 3D Body enables human body and shape estimation. Both models deliver robust, state-of-the-art performance, and SAM 3D Objects significantly outperforms existing methods. We also collaborated with artists to build SAM 3D Artist Objects, a first-of-its-kind evaluation dataset that features diverse images and objects, representing a new, more rigorous way to measure research progress in 3D.

The SAM 3D release marks a significant step in leveraging large scale data to address the complexity of the physical world. It has the potential to significantly advance critical fields like robotics, science, and sports medicine, and also has a range of creative use cases. Whether you’re a researcher exploring new frontiers in AR/VR, a creator looking to generate assets for a game, or simply curious about the possibilities of AI-enabled 3D modeling, SAM 3D opens up new ways to interact with and understand the visual world.

We’re using SAM 3D to enable the new View in Room feature on Facebook Marketplace, helping people visualize the style and fit of home decor items, like a lamp or a table, in their spaces before purchasing.

Explore Our Cutting-Edge Models

You can try SAM 3 and SAM 3D on the Segment Anything Playground, our new platform that offers everyone access to our cutting-edge models — no technical expertise needed. Start from scratch by uploading an image or video, then prompt SAM 3 with a short text phrase to cut out all matching objects or use SAM 3D to view a scene from a new perspective, virtually rearrange it, or add cool 3D effects. Or you can jump in using one of our templates, which range from practical options like pixelating faces, license plates, and screens, to fun video edits like spotlight effects, motion trails, or magnifying specific objects.

As part of this release, we’re sharing the SAM 3 model weights, a new evaluation benchmark dataset for open vocabulary segmentation, and a research paper that details how we built SAM 3. We’re also partnering with the Roboflow annotation platform so you can annotate data and fine-tune SAM 3 for your particular needs.

For SAM 3D, we’re sharing model checkpoints and inference code, and introducing a novel benchmark for 3D reconstruction. This dataset features a diverse array of images and objects, offering a level of realism and challenge that surpasses existing 3D benchmarks. It represents a new standard for measuring research progress in 3D and pushes the field toward a deeper understanding of the physical world.

We’re excited to share these innovative new models with you, and hope they empower everyone to explore their creativity, build, and push the boundaries of what’s possible. We can’t wait to see what you create.

Learn more about SAM 3 and SAM 3D on the AI at Meta blog.

Ray-Ban Meta

Oakley Meta

Meta Ray-Ban Display

Learn about all technologies

New Segment Anything Models Make it Easier to Detect Objects and Create 3D Reconstructions

Takeaways

Say What You Mean

Bring a Picture to Life

Explore Our Cutting-Edge Models

Takeaways

Say What You Mean

Bring a Picture to Life

Explore Our Cutting-Edge Models

Related News

Follow Meta Newsroom

Press Resources