OpenAI's ChatGPT New Web Crawler - GPTBot

Aug 7, 2023 - 7:21 am 1 by

Robot Block

OpenAI, the folks behind ChatGPT, have published information on its web crawler named GPTBot. You can now see if OpenAI is crawling your site, how much so, and you can disallow access to all or part of your site with the robots.txt protocol.

You can see the documentation for GPTBot over here.

  • User agent token: GPTBot
  • Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

You can then disallow using the user-agent GPTBot like you would any other crawler.

Currently, the IP range listed for GPTbot is just 40.83.2.64/28 but that can change, so check that file for updates.

OpenAI lists GPTBot's usage as, "Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site."

Yesterday, I spotted a new thread at WebmasterWorld with complaints about GPTBot activity. The webmaster said, "Just had over 1000 hits from this bot, hitting individual pages. As it happens my site automatically served a 403 for each hit because the bot is not in my whitelist, nor did it pass the 'human' test."

Previously, you were only able to block ChatGPT plugins. And it seems like Google and others are working on an alternative to robots.txt for AI search purposes.

Forum discussion at WebmasterWorld.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Google Search Ranking Volatility, Site Reputation Abuse Enforcement & Pichai On Search Quality - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google Search Engine Optimization

Google Again On Difference Between Algorithm Updates & Data Refreshes

May 13, 2024 - 7:51 am
Google Ads

Google Ads Updates Disclosure Policy For Event Ticket Sale

May 13, 2024 - 7:41 am
Google Search Engine Optimization

Google Search Console Doesn't Keep Data For Most De-Indexed Pages

May 13, 2024 - 7:31 am
Google Search Engine Optimization

Google Lifts Some Site Reputation Abuse Policy Penalties

May 13, 2024 - 7:21 am
Google

Google: Video Pages Need To Be Super-Obvious Video Play Pages

May 13, 2024 - 7:11 am
Search Forum Recap

Daily Search Forum Recap: May 10, 2024

May 10, 2024 - 4:00 pm
Previous Story: Google Ads Releases New Responsive Search Ads Guide (PDF)