{"id":40673,"date":"2025-07-04T06:06:47","date_gmt":"2025-07-04T06:06:47","guid":{"rendered":"http:\/\/flwnet.com\/index.php\/2025\/07\/04\/here-are-the-biggest-misconceptions-about-ai-content-scraping\/"},"modified":"2025-07-04T06:06:47","modified_gmt":"2025-07-04T06:06:47","slug":"here-are-the-biggest-misconceptions-about-ai-content-scraping","status":"publish","type":"post","link":"http:\/\/flwnet.com\/index.php\/2025\/07\/04\/here-are-the-biggest-misconceptions-about-ai-content-scraping\/","title":{"rendered":"Here are the biggest misconceptions about AI content scraping"},"content":{"rendered":"<p>AI bots scraping publishers\u2019 sites for real-time information are now scraping publishers\u2019 sites more than the bots used to train large language models. And they\u2019re harder to detect.<\/p>\n<p>That\u2019s according to the <a href=\"https:\/\/tollbit.com\/bots\/25q1\/\">latest report from TollBit<\/a>, a data marketplace for publishers and AI companies. From Q4 2024 to Q1 2025, bot scrapes used for <a href=\"https:\/\/digiday.com\/marketing\/wtf-is-retrieval-augmented-generation-for-ai-chatbots-and-large-language-models\/\">Retrieval Augmented Generation, or RAG,<\/a> per site grew 49%. That is nearly 2.5 times the rate of training bot scrapes (which grew by 18%) in the same time period.\u00a0<\/p>\n<p>An increase in bots scraping content from publishers\u2019 sites represents a threat to their businesses. But scraping for AI training and scraping for real-time outputs present different challenges \u2014 and some opportunities \u2014 for publishers. And not all of them are fully understood.\u00a0<\/p>\n<p><em>Continue reading this article on <a href=\"https:\/\/digiday.com\/media\/here-are-the-biggest-misconceptions-about-ai-content-scraping\/?utm_campaign=digidaydis&amp;utm_medium=rss&amp;utm_source=general-rss\">digiday.com<\/a>. Sign up for <a href=\"https:\/\/digiday.com\/newsletters\/?utm_campaign=digidaydis&amp;utm_medium=rss&amp;utm_source=general-rss\">Digiday newsletters<\/a> to get the latest on media, marketing and the future of TV.<\/em><\/p>","protected":false},"excerpt":{"rendered":"<p>AI bots scraping publishers\u2019 sites for real-time information are now scraping publishers\u2019 sites more than the bots used to train large language models. And they\u2019re harder to detect. That\u2019s according to the latest report from TollBit, a data marketplace for publishers and AI companies. From Q4 2024 to Q1 2025, bot scrapes used for Retrieval [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-40673","post","type-post","status-publish","format-standard","hentry","category-media-entertainment"],"_links":{"self":[{"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/posts\/40673","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/comments?post=40673"}],"version-history":[{"count":0,"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/posts\/40673\/revisions"}],"wp:attachment":[{"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/media?parent=40673"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/categories?post=40673"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/flwnet.com\/index.php\/wp-json\/wp\/v2\/tags?post=40673"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}