Meta

Scraping by the Numbers

Over the last month, we’ve been providing information about an internet-wide issue known as scraping. Scraping is the automated collection of data from a website or app. It can be done through authorized means, such as web crawling by a search engine, or through unauthorized means, which involves using automation to collect information in violation of our terms of service. Those who do it through unauthorized means often try to disguise their activity so that it blends in with ordinary usage. 

We’ve previously posted about how scraping works and how we are combating it. In this post, we will provide more details about our efforts to fight unauthorized scraping, and offer a deeper dive into the topic of “phone number enumeration” — a scraping technique that was at the center of recent reports about scraping on our platform.

We believe it’s important to be more transparent about our work to combat different forms of abuse on our platform. That’s why today we also launched our new Transparency Center which provides a single destination for our integrity efforts. We also just published our latest Transparency Report for the second half of 2020, as well as our Community Standards Enforcement Report for the first quarter of this year. 

How We Protect Against Data Misuse

Scraping affects a wide variety of companies and industries. Beyond social media platforms like Facebook, LinkedIn and Clubhouse, data scrapers have also collected personal information from home fitness equipment companies like Echelon and health apps like Strava as well as industries like banks, e-commerce and hospitality. Any website or app through which data can be publicly accessed is a potential scraping target.  

Facebook is well aware of this risk, and while we can never eliminate it entirely, we have several measures in place to mitigate the risk of scraping on our platform. For example:

Phone Number Enumeration

One particular scraping technique that we have worked hard to combat is known as “phone number enumeration.” This involves using automated tools at scale to retrieve information about people based on their phone numbers. 

Before a set of improvements we made in September 2019, scrapers found ways to abuse various contact discovery features we had which were designed to allow people to find and connect with their contacts on Facebook. These features include the contact importer feature that people could use to upload their contacts from their mobile devices to Facebook and find matching people based on their phone numbers. We believe the scrapers used phone number enumeration to abuse this feature and scrape information. Here’s how phone number enumeration generally works using contact importer functionality. You can also check out this visual depiction of the process to see how we work to combat this technique.

The changes to the contact importer feature that we described above were focused on combating this technique. Because scrapers are always changing their methods, we regularly review and update our defenses to try to stay ahead of them. We detailed some of our methods, including rate limits, data limits, behavioral detection and other protections in a previous post

To be clear, our first line of defense against unauthorized scraping is to make it as hard as we can for people’s data to be collected at scale. We want people to feel comfortable using our services, with confidence that we protect their information, so we work to limit access to our features by scrapers while enabling people to continue using those features in order to connect and share with others.