Common crawl’s scraper never executes that code, so it gets the full articles Thus, by my estimate, the foundation’s archives contain millions of articles from news organizations around the world, including the economist, the los angeles times, the wall street journal, the new york times, the new yorker, harper’s, and the atlantic. The title of yudkowsky’s new book on. The atlantic on common crawl, the nonprofit funneling paywalled articles to ai companies a brutally efficient exposé, alex reisner caught them in several lies by simply looking at their crawl data A recent article in the atlantic makes several false and misleading claims about the common crawl foundation, including the accusation that our organization has “lied to publishers” about our activities. For more than a decade, the nonprofit common crawl has been scraping billions of webpages to build a massive archive of the internet, notes the atlantic, making it freely available for research
In recent years, however, this archive has been put to a controversial purpose
WATCH