OpenAI’s ChatGPT New Web Crawler – GPTBot

Vernon August 7, 2023

0 1 minute read

OpenAI’s ChatGPT New Web Crawler – GPTBot

OpenAI, the folks behind ChatGPT, have published information on its web crawler named GPTBot. You can now see if OpenAI is crawling your site, how much so, and you can disallow access to all or part of your site with the robots.txt protocol.

You can see the documentation for GPTBot over here.

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

You can then disallow using the user-agent GPTBot like you would any other crawler.

Currently, the IP range listed for GPTbot is just 40.83.2.64/28 but that can change, so check that file for updates.

OpenAI lists GPTBot’s usage as, “Web pages crawled with the GPTBot user agent may potentially be used to improve future ****** and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI ****** become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.”

Yesterday, I spotted a new thread at WebmasterWorld with complaints about GPTBot activity. The webmaster said, “Just had over 1000 hits from this bot, hitting individual pages. As it happens my site automatically served a 403 for each hit because the bot is not in my whitelist, nor did it pass the ‘human’ test.”

Previously, you were only able to block ChatGPT plugins. And it seems like Google and others are working on an alternative to robots.txt for AI search purposes.

Forum discussion at WebmasterWorld.

Source link : Seroundtable.com

Share on Facebook

OpenAI’s ChatGPT New Web Crawler – GPTBot

Vernon

Google AdSense Removed Privacy Policy As Place To Withdraw Consent

The March 2024 Core Update. What to know and how to adapt. – Marie Haynes

Google Search Testing Custom Filters & Templates

Localized SERPs: Winning traffic and leads with service area pages

Google May Recrawl URLs Multiple Times Per Day To Every Few Months

Medical health cover

Medical card malaysia

Home Healthcare Agency Miami | Home Care Assistance – 24/7 Nursing Care

VONTAR G10 Voice Remote Control

Google AdSense Removed Privacy Policy As Place To Withdraw Consent

Audio Visual Rentals in Los Angeles – GeoEvent

Share this:

Subscribe to our mailing list to get the new updates!

Brand Building in the Digital Age

How SEOs and UX Designers Can Work Better Together — Whiteboard Friday

Related Articles

Medical health cover

Medical card malaysia

Home Healthcare Agency Miami | Home Care Assistance – 24/7 Nursing Care

VONTAR G10 Voice Remote Control

Google AdSense Removed Privacy Policy As Place To Withdraw Consent

Audio Visual Rentals in Los Angeles – GeoEvent

Enjoy Our Website? Please share :) Thank you!

New Social Bookmarking Site Lists 2023

Announcement