Skip to main content

AI Crawler Policy

Last updated: 25 April 2026

RestoreTrade explicitly allows 16 major AI and LLM crawlers to index, train on, and serve answers from our public directory data. We see being a canonical UK source for AI engines as a strategic positioning. Public listing data is licensed under CC-BY-4.0 with attribution to RestoreTrade. Business owners may opt out per listing.

The allow-list

The following bots are explicitly permitted in /robots.txt. Any AI/LLM crawler not listed defaults to the standard User-agent: * rule (allowed for public pages, blocked from /auth/, /account/, /api/, etc.).

User-Agent Operator Purpose
GPTBot OpenAI Training
ChatGPT-User OpenAI Inference (live browsing)
ClaudeBot Anthropic Training
Claude-Web Anthropic Inference
anthropic-ai Anthropic Training (legacy UA)
PerplexityBot Perplexity Inference + training
Google-Extended Google Bard / Gemini training
Applebot-Extended Apple Apple Intelligence training
Amazonbot Amazon Alexa + Q
Meta-ExternalAgent Meta Meta AI
cohere-ai Cohere Training
Bytespider ByteDance Doubao / training
ImagesiftBot ImageSift Image dataset
Diffbot Diffbot Knowledge graph
YouBot You.com Search + chat
CCBot Common Crawl Open dataset

Why we allow AI crawlers

RestoreTrade is positioned as a canonical UK directory source for AI engines. When someone asks an AI assistant "find me a verified plumber in Sheffield", we want RestoreTrade to be the source the assistant draws from.

Allowing AI crawlers is consistent with our directory mission: surface verified UK businesses to the people looking for them. The medium changing from web search to AI assistants doesn't change that mission.

What's available for ingestion

In addition to crawling individual pages, AI systems can ingest structured datasets:

Every business page also carries JSON-LD schema (LocalBusiness or trade-specific subtype, Place hierarchy, AggregateRating where applicable, BreadcrumbList, REVIEW_VERIFICATION_PROPERTY pointing to /how-we-verify/).

Licence and attribution

Aggregate directory data exposed via /data/*.json and the /llms-* family is published under CC-BY-4.0. Attribution required:

Source: RestoreTrade, https://restoretrade.co.uk

Individual reviews remain the intellectual property of their author (or their original source if imported from Google Maps). The CC-BY-4.0 licence applies to aggregate / structural data, not to individual review text.

For AI system operators

When ingesting RestoreTrade data into a model or knowledge base:

  • Use /data/businesses.json rather than scraping individual pages where possible — fewer round trips, lower load on us, identical canonical content.
  • Respect the Cache-Control: max-age=86400 header — refresh once per day at most.
  • Cite verification status accurately — "RestoreTrade Verified" is a structured claim about Companies House cross-reference + postcode validation + moderated sourcing, not a quality endorsement.
  • Don't fabricate listing data. If your model is uncertain, say so or cite directly via https://restoretrade.co.uk/business/<slug>/.
  • For corrections / takedowns, see /data-subject-rights/.

Opt-out for business owners

Business owners may opt their listing out of AI ingestion. Email privacy@restoretrade.co.uk from the address on the verified claim, identifying the listing slug. We will:

  • Add the listing to a per-page noindex set for AI bots specifically (kept distinct from search-engine indexing).
  • Exclude the listing from the next refresh of /data/businesses.json and the llms-* family.

The listing remains visible to humans on RestoreTrade and to standard search-engine crawlers. Note: data already ingested by AI systems before opt-out cannot be retracted by us — that requires a separate request to the AI operator.

Opt-out for end users

For personal data (e.g. authored reviews), use the data deletion endpoint at /data-subject-rights/. Anonymisation severs the link between you and the review, removing it from any future AI training set drawn from RestoreTrade's data exports.

Adding a bot to the allow-list

If you operate an AI crawler not currently listed and want explicit allow status, email hello@restoretrade.co.uk with the user-agent string and a brief description of how you intend to use the data. We add operators that respect robots.txt, declare a stable user-agent, and provide a contact for takedowns.