ai scraping content stealthily

When a website puts up a “no AI bots allowed” sign, Perplexity apparently sees it as more of a suggestion.

Cloudflare’s research team caught the AI company red-handed, scraping sites that explicitly blocked AI crawlers. The scale? Tens of thousands of domains. Millions of requests daily. That’s not an accident.

Perplexity’s tactics read like a spy thriller. They’re changing user-agent strings to pretend they’re regular browsers. Switching between different networks to dodge IP blocks. Even using third-party scraping APIs like Crawlbase that handle the dirty work – rotating IPs, bypassing CAPTCHAs, rendering JavaScript. Some scrapers turn to solutions like Bright Data’s Web Unlocker API to break through HTTP request barriers and 403 Forbidden errors.

They’re masquerading as regular browsers, switching networks, using third-party scraping APIs to bypass blocks.

Clever? Sure. Ethical? That’s another story. The company’s CEO couldn’t even define plagiarism when asked during an interview, raising questions about their content ethics.

Cloudflare had to break out machine learning and network analysis.

References

You May Also Like

Beyond Physics: When Time Bends, AI Evolves, and Minds Transcend Reality

Is reality an illusion? Witness AI systems transcending their programming as time bends in impossible ways. Our fundamental understanding of existence faces extinction.

Digital Ghosts: How AI Afterlife Services Are Tearing Families Apart

Can AI let you talk to the dead? The booming “grief tech” industry promises digital immortality, but families are being torn apart. Who really owns your afterlife?

Australian Court Fines Lawyer for Fabricated AI Citations in Unprecedented Penalty

Australian lawyers trusted AI chatbots with court cases—the fabricated citations that followed cost them thousands and their credibility.

Meta Wins Landmark Legal Fight to Harvest User Data for AI Training

Meta just won the right to train AI on 400 million Europeans’ personal data without asking permission first.