Agent & crawler directory

Every AI agent and crawler touching your storefront.

A merchant-facing reference for the bots, indexers, and autonomous shopping agents that fetch your product pages, train on your copy, or attempt checkout on behalf of a human. Use it to tune robots.txt, recognize legitimate traffic in your logs, and decide who deserves a verified pass into your checkout flow.

User-agent strings change. Operators rotate IPs. Treat this as a starting map, not a firewall rule — Cartograph adds the cryptographic verification layer on top.

Tracked agents

14+

Operators

9

Honor robots.txt

11 / 14

AI search index

3 agents
  • OAI-SearchBot

    OpenAI

    robots ✓

    Builds OpenAI's search index for ChatGPT search answers.

    OAI-SearchBot/1.x
    Operator docs →
  • PerplexityBot

    Perplexity

    robots ✓

    Builds Perplexity's answer index and citation graph.

    PerplexityBot/1.0
    Operator docs →
  • Amazonbot

    Amazon

    robots ✓

    Indexes for Alexa and Amazon's product knowledge graph.

    Amazonbot/0.1

Model training

7 agents
  • GPTBot

    OpenAI

    robots ✓

    Crawls public pages to train OpenAI's foundation models.

    GPTBot/1.x
    Operator docs →
  • ClaudeBot

    Anthropic

    robots ✓

    Trains Anthropic's Claude family of models.

    ClaudeBot/1.x
    Operator docs →
  • Google-Extended

    Google

    robots ✓

    Opt-out token Google honors when training Gemini and Vertex AI models on your content.

    Google-Extended (control token, not a UA string)
  • Applebot-Extended

    Apple

    robots ✓

    Opt-out token for Apple Intelligence training. Applebot itself still indexes for Siri/Spotlight.

    Applebot-Extended (control token)
  • Meta-ExternalAgent

    Meta

    robots ✓

    Crawls public pages to train Meta's Llama and related models.

    Meta-ExternalAgent/1.x
  • CCBot

    Common Crawl

    robots ✓

    Builds the Common Crawl corpus, a primary training set for most open LLMs.

    CCBot/2.0
  • Bytespider

    ByteDance

    robots ✗

    Trains ByteDance's Doubao/Cici models. Known for aggressive crawl rates.

    Bytespider

Shopping / task agent

4 agents
  • ChatGPT-User

    OpenAI

    robots ✓

    On-demand fetch when a ChatGPT user (or agent) follows a link, including shopping flows.

    ChatGPT-User/1.x
    Operator docs →
  • Claude-Web / anthropic-ai

    Anthropic

    robots ✓

    On-demand fetches initiated by Claude users, including agentic checkout tasks.

    Claude-Web/1.0 · anthropic-ai/1.0
  • Perplexity-User

    Perplexity

    robots ✗

    On-demand fetch for a Perplexity user following a citation or product link.

    Perplexity-User/1.0
  • Operator (browser agent)

    OpenAI

    robots ✗

    Headful browser agent that fills carts and completes checkouts on behalf of a user.

    Mozilla/5.0 … (no distinct UA — drives a real browser)

User-agent strings aren't proof

Any client can claim to be GPTBot or ChatGPT-User. Operators publish reverse-DNS ranges and, increasingly, signed request headers — but enforcing them is on you. Cartograph's evidence layer verifies agent identity at the edge and gives you an auditable log of who actually touched the storefront, so you can let verified agents through and quietly rate-limit the rest.