ai scraping content stealthily

When a website puts up a “no AI bots allowed” sign, Perplexity apparently sees it as more of a suggestion.

Cloudflare’s research team caught the AI company red-handed, scraping sites that explicitly blocked AI crawlers. The scale? Tens of thousands of domains. Millions of requests daily. That’s not an accident.

Perplexity’s tactics read like a spy thriller. They’re changing user-agent strings to pretend they’re regular browsers. Switching between different networks to dodge IP blocks. Even using third-party scraping APIs like Crawlbase that handle the dirty work – rotating IPs, bypassing CAPTCHAs, rendering JavaScript. Some scrapers turn to solutions like Bright Data’s Web Unlocker API to break through HTTP request barriers and 403 Forbidden errors.

They’re masquerading as regular browsers, switching networks, using third-party scraping APIs to bypass blocks.

Clever? Sure. Ethical? That’s another story. The company’s CEO couldn’t even define plagiarism when asked during an interview, raising questions about their content ethics.

Cloudflare had to break out machine learning and network analysis.

References

You May Also Like

Algorithmic Prejudice: How AI Systems Weaponize Bias Against Muslims and Asians

AI systems silently weaponize bias, denying Asians facial recognition and flagging Muslim terminology while affecting healthcare, housing, and finance. Regulations aren’t keeping pace with this discrimination.

The Startling Reality of AI’s Invisible Takeover in Your Daily Life

AI quietly decides who you date, hire, and trust—while chatbots guiding your life are wrong up to 94% of the time. You’re already affected.

Democracy Under Fire: AI Weaponizes Political Lies in Election Campaigns

AI creates fake politicians that fool millions. Democracy faces its darkest hour as $423 million fuels digital deception campaigns worldwide.

OpenAI’s AGI Blueprint: Superhuman Intelligence for Everyone, Not Just Elites

OpenAI wants AGI to serve billions—not billionaires. Their radical blueprint rewrites who profits from superhuman intelligence.