ai scraping content stealthily

When a website puts up a “no AI bots allowed” sign, Perplexity apparently sees it as more of a suggestion.

Cloudflare’s research team caught the AI company red-handed, scraping sites that explicitly blocked AI crawlers. The scale? Tens of thousands of domains. Millions of requests daily. That’s not an accident.

Perplexity’s tactics read like a spy thriller. They’re changing user-agent strings to pretend they’re regular browsers. Switching between different networks to dodge IP blocks. Even using third-party scraping APIs like Crawlbase that handle the dirty work – rotating IPs, bypassing CAPTCHAs, rendering JavaScript. Some scrapers turn to solutions like Bright Data’s Web Unlocker API to break through HTTP request barriers and 403 Forbidden errors.

They’re masquerading as regular browsers, switching networks, using third-party scraping APIs to bypass blocks.

Clever? Sure. Ethical? That’s another story. The company’s CEO couldn’t even define plagiarism when asked during an interview, raising questions about their content ethics.

Cloudflare had to break out machine learning and network analysis.

References

You May Also Like

Sutskever’s Radical Vision: Teaching AI to Feel Before It Becomes Too Powerful

Ilya Sutskever believes AI must learn to feel emotions before it replaces every human job within the next decade.

The AI 911 Paradox: Emergency Savior or Silent Threat?

AI saves lives in 911 calls—but what happens when algorithms decide your emergency isn’t real enough to matter?

Global AI Arms Race Threatens Nuclear Stability, Experts Demand Urgent Action

AI doesn’t just outthink humans—it could trigger nuclear war. As nations race to weaponize algorithms, experts demand safeguards before machines make civilization-ending decisions.

UK Judges Threaten Lawyers With Contempt for Using Ai’s Fake Legal Cases

UK judges threaten lawyers with criminal prosecution for submitting AI-generated fake cases, risking life sentences and career destruction.