Open-Sourcing Photo- and Video-Matching Technology to Make the Internet Safer

5 years ago

By Antigone Davis, Global Head of Safety, and Guy Rosen, VP of Integrity

At Facebook, we rely on a combination of technology and people to help keep our platforms safe. When we identify a harmful piece of content, such as child exploitation, terrorist propaganda, or graphic violence, technology can help us find duplicates and prevent them from being shared.

Today, we are open-sourcing two technologies that detect identical and nearly identical photos and videos — sharing some of the tech we use to fight abuse on our platform with others who are working to keep the internet safe. These algorithms will be open-sourced on GitHub so our industry partners, smaller developers and non-profits can use them to more easily identify abusive content and share hashes — or digital fingerprints — of different types of harmful content. For those who already use their own or other content matching technology, these technologies are another layer of defense and allow hash-sharing systems to talk to each other, making the systems that much more powerful.

“In just one year, we witnessed a 541% increase in the number of child sexual abuse videos reported by the tech industry to the CyberTipline. We’re confident that Facebook’s generous contribution of this open-source technology will ultimately lead to the identification and rescue of more child sexual abuse victims,” said John Clark, President and CEO of the National Center for Missing and Exploited Children (NCMEC).

Over the years, Facebook has contributed hundreds of open-source projects to share our technology with the wider community, but this is the first time we’ve shared any photo- or video-matching technology. Building on Microsoft’s generous contribution of PhotoDNA to fight child exploitation 10 years years ago and the more recent launch of Google Content Safety API, today’s announcement also is part of an industry-wide commitment to building a safer internet.

Known as PDQ and TMK+PDQF, these technologies are part of a suite of tools we use at Facebook to detect harmful content, and there are other algorithms and implementations available to industry such as pHash, Microsoft’s PhotoDNA, aHash, and dHash. Our photo-matching algorithm, PDQ, owes much inspiration to pHash although was built from the ground up as a distinct algorithm with independent software implementation. The video-matching technology, TMK+PDQF, was developed together by Facebook’s Artificial Intelligence Research team and academics from the University of Modena and Reggio Emilia in Italy.

These technologies create an efficient way to store files as short digital hashes that can determine whether two files are the same or similar, even without the original image or video. Hashes can also be more easily shared with other companies and non-profits. For example, when we identify terrorist propaganda on our platforms, we remove it and hash it using a variety of techniques, including the algorithms we’re sharing today. Then we share the hashes with industry partners, including smaller companies, through GIFCT so they can also take down the same content if it appears on their services.

PDQ and TMK+PDQF were designed to operate at high scale, supporting video-frame-hashing and real-time applications. We designed these technologies based on our experience detecting abuse across billions of posts on Facebook. We hope that by contributing back to the community we’ll enable more companies to keep their services safe and empower non-profits that work in the space. This work is in addition to our ongoing research in these areas, including our partnership with The University of Maryland, Cornell University, Massachusetts Institute of Technology, and The University of California, Berkeley to research new techniques to detect intentional adversarial manipulations of videos and photos to circumvent our systems

We’re announcing these technologies today to support our fourth annual cross-industry Child Safety Hackathon at Facebook’s headquarters in Menlo Park, California. The two-day event brings together nearly 80 engineers and data scientists from Technology Coalition partner companies and others to develop new technologies that help safeguard children.

This year’s event is focused on developing new tools to help our partners, NCMEC and Thorn. For example, some teams will develop a prototype feature that will allow NCMEC’s CyberTipline case management tool to query and compare data points within other nonprofit organizations’ databases for known hashes and other key information. This will help identify children at risk and highlight high value reports. The open source code released today also will be made available to teams at the hackathon.

Hackathons are an exciting way to bring people together from different organizations with a wide range of expertise to build tools that tackle problems such as the online sexual exploitation of children. All non-open-source code and prototypes developed at the event will be donated back to the Technology Coalition and our partners to be used in their child-safety efforts.

We will continue to expand and improve our own products and features to find harmful content. Read more about how Facebook is using technology to combat child exploitation here.

Update on August 9, 2019 at 10:45AM PT: Here is a video from this year’s Child Safety Hackathon.