Global DPAs Urge Social Media Platforms to Prevent Unlawful Data Scraping

data scrapingAI

Nov 11

As artificial intelligence (AI) continues to transform industries worldwide, data privacy remains a crucial concern. Recently, data protection authorities (DPAs) from 16 global jurisdictions, including the UK, Canada, and Australia, issued a follow-up statement addressing the legal and ethical complexities around data scraping.

Specifically, these DPAs have highlighted how companies using personal data to develop AI Large Language Models (LLMs) must align with privacy and data protection laws. Here’s what you need to know about this critical update, its implications, and the future expectations for companies.

Why This Matters: The Rise of Data Scraping in AI

Data scraping, the automated extraction of data from web sources, including social media platforms has become a staple practice for businesses in AI development. However, the mass collection of personal information raises concerns about user privacy and data security. With AI-powered tools and Large Language Models relying on vast datasets for training, there’s an urgent need to balance innovation with responsible data handling.

The DPAs’ latest joint statement not only emphasises the need for compliance but also sets out concrete expectations for companies to enhance their data protection measures. As the ICO and other global regulators noted, these expectations are especially relevant for social media giants whose platforms are major data sources. Their collaborative statement calls for a combination of safeguarding measures and contract terms that ensure lawful data use.

Key Takeaways: What Companies Are Expected to Do

The latest guidance serves as a guide for companies looking to responsibly harness data for AI without compromising user privacy. Here are the core expectations laid out by the DPAs:

1. Compliance with Privacy Laws

Companies must adhere to privacy and data protection laws when using personal information for AI development, even when collecting data from their own platforms. This includes protecting users’ information from unauthorised scraping and ensuring that any data used for LLMs is handled with transparency and consent.

2. Evolving Safeguarding Techniques

Recognising the evolving sophistication of scraping technology, DPAs urge companies to adopt layered safeguarding measures and regularly review their practices. This approach allows organisations to stay ahead of new scraping techniques, combining artificial intelligence, platform design adjustments, and other technical barriers to prevent unauthorised data extraction.

3. Legal and Contractual Boundaries for Scraping

When data scraping is permissible, such as for certain commercial or socially beneficial uses, companies must ensure that all activities comply with strict legal frameworks and are supported by enforceable contract terms. This protects both the data subjects and the integrity of the data use case.

This statement follows a year-long engagement between DPAs and some of the largest social media platforms, including Meta (Instagram, Facebook), ByteDance (TikTok), Microsoft (LinkedIn), and X Corp. (formerly Twitter). Through constructive dialogue, the DPAs gained insights into the challenges social media companies face in combating unauthorised scraping, such as differentiating between legitimate users and automated scrapers, and keeping pace with rapid advancements in scraping technology.

Practical Safeguarding Measures

In response to these expectations, companies have started implementing a range of measures, including:

Platform Design Tweaks: Adjusting platform features to make automated scraping more challenging.
AI-Driven Safeguards: Leveraging machine learning tools to detect and block scraping attempts.
Budget-Friendly Solutions for SMEs: Introducing cost-effective safeguards that help smaller companies protect data with limited resources.

Key Takeaways for Social Media Companies

The main takeaway is that even publicly accessible data is protected under privacy laws in most jurisdictions, requiring platforms to secure user information.

Multi-Layered Safeguards for Scraping Protection

Social media companies face mounting challenges as scrapers employ advanced AI to mimic user behaviour. In response, companies are implementing multi-layered safeguards such as CAPTCHA, rate-limiting, and random URLs, along with monitoring for unusual account activity. While effective, these defences must evolve continually to address increasingly sophisticated scraping.

Small and Medium Enterprises (SMEs)

Despite limited resources, SMEs are also expected to prevent unlawful scraping. Tools like bot detection and rate-limiting offer affordable options, and third-party providers can assist in compliance, although SMEs retain ultimate responsibility for protecting user data.

Permissible and Contracted Data Use

Some companies authorise scraping for commercial purposes through strict contractual terms, but contracts alone don’t ensure legality. Organisations must monitor third-party compliance and restrict usage to lawful purposes.

Data Access for Research and AI Development

The DPAs support responsible data use for research, recommending APIs as secure access points. They also emphasise that Social media companies using data for AI model training must follow strict privacy standards, balancing innovation with data protection laws.

This DPA guidance serves as a reminder: companies must be vigilant, ensuring that data privacy is central to their policies, especially as AI and data usage practices evolve.

Looking Ahead: Building a Responsible Future for AI

As regulators continue to keep data scraping on their radar, the joint statement signals an era of more collaborative and proactive engagement with the tech industry. By addressing privacy risks early on, companies and data protection authorities alike are working to establish a foundation where AI development respects user rights without stifling innovation.

With the DPAs’ ongoing oversight, businesses must prioritise transparent and ethical data practices. For AI developers and social media companies, adapting to these guidelines isn’t just about compliance, it’s about building trust and setting a responsible precedent for the future of AI.

How Can Gerrish Legal Help?

Gerrish Legal is a dynamic digital law firm. We pride ourselves on giving high-quality and expert legal advice to our valued clients. We specialise in many aspects of digital law such as GDPR, data privacy, digital and technology law, commercial law, and intellectual property.

We give companies the support they need to successfully and confidently run their businesses whilst complying with legal regulations without the burdens of keeping up with ever-changing digital requirements.

We are here to help you, get in contact with us today for more information.

artificial intelligencenew technologiesdigitalpersonal datadata scrapingDPAbusinesslegal advicedata protection

Nathalie Pouderoux