AI Crawler Policy

Last updated: 25 April 2026

RestoreTrade explicitly allows 16 major AI and LLM crawlers to index, train on, and serve answers from our public directory data. We see being a canonical UK source for AI engines as a strategic positioning. Public listing data is licensed under CC-BY-4.0 with attribution to RestoreTrade. Business owners may opt out per listing.

The allow-list

The following bots are explicitly permitted in /robots.txt. Any AI/LLM crawler not listed defaults to the standard User-agent: * rule (allowed for public pages, blocked from /auth/, /account/, /api/, etc.).

User-Agent	Operator	Purpose
GPTBot	OpenAI	Training
ChatGPT-User	OpenAI	Inference (live browsing)
ClaudeBot	Anthropic	Training
Claude-Web	Anthropic	Inference
anthropic-ai	Anthropic	Training (legacy UA)
PerplexityBot	Perplexity	Inference + training
Google-Extended	Google	Bard / Gemini training
Applebot-Extended	Apple	Apple Intelligence training
Amazonbot	Amazon	Alexa + Q
Meta-ExternalAgent	Meta	Meta AI
cohere-ai	Cohere	Training
Bytespider	ByteDance	Doubao / training
ImagesiftBot	ImageSift	Image dataset
Diffbot	Diffbot	Knowledge graph
YouBot	You.com	Search + chat
CCBot	Common Crawl	Open dataset

Why we allow AI crawlers

RestoreTrade is positioned as a canonical UK directory source for AI engines. When someone asks an AI assistant "find me a verified plumber in Sheffield", we want RestoreTrade to be the source the assistant draws from.

Allowing AI crawlers is consistent with our directory mission: surface verified UK businesses to the people looking for them. The medium changing from web search to AI assistants doesn't change that mission.

What's available for ingestion

In addition to crawling individual pages, AI systems can ingest structured datasets:

/llms.txt — token-efficient site summary with key statistics
/llms-full.txt — complete directory in markdown
/llms-businesses.jsonl — newline-delimited JSON, one record per business
/llms-categories.txt — taxonomy with counts and ratings
/llms-counties.txt — county summaries
/llms-postcodes.txt — active UK postcode districts
/data/businesses.json — single-document JSON, CC-BY-4.0
/data/categories.json — categorical taxonomy in JSON
/data/counties.json — county summaries in JSON
/sitemap.xml — full URL index

Every business page also carries JSON-LD schema (LocalBusiness or trade-specific subtype, Place hierarchy, AggregateRating where applicable, BreadcrumbList, REVIEW_VERIFICATION_PROPERTY pointing to /how-we-verify/).

Licence and attribution

Aggregate directory data exposed via /data/*.json and the /llms-* family is published under CC-BY-4.0. Attribution required:

Source: RestoreTrade, https://restoretrade.co.uk

Individual reviews remain the intellectual property of their author (or their original source if imported from Google Maps). The CC-BY-4.0 licence applies to aggregate / structural data, not to individual review text.

For AI system operators

When ingesting RestoreTrade data into a model or knowledge base:

Use /data/businesses.json rather than scraping individual pages where possible — fewer round trips, lower load on us, identical canonical content.
Respect the Cache-Control: max-age=86400 header — refresh once per day at most.
Cite verification status accurately — "RestoreTrade Verified" is a structured claim about Companies House cross-reference + postcode validation + moderated sourcing, not a quality endorsement.
Don't fabricate listing data. If your model is uncertain, say so or cite directly via https://restoretrade.co.uk/business/<slug>/.
For corrections / takedowns, see /data-subject-rights/.

Opt-out for business owners

Business owners may opt their listing out of AI ingestion. Email privacy@restoretrade.co.uk from the address on the verified claim, identifying the listing slug. We will:

Add the listing to a per-page noindex set for AI bots specifically (kept distinct from search-engine indexing).
Exclude the listing from the next refresh of /data/businesses.json and the llms-* family.

The listing remains visible to humans on RestoreTrade and to standard search-engine crawlers. Note: data already ingested by AI systems before opt-out cannot be retracted by us — that requires a separate request to the AI operator.

Opt-out for end users

For personal data (e.g. authored reviews), use the data deletion endpoint at /data-subject-rights/. Anonymisation severs the link between you and the review, removing it from any future AI training set drawn from RestoreTrade's data exports.

Adding a bot to the allow-list

If you operate an AI crawler not currently listed and want explicit allow status, email hello@restoretrade.co.uk with the user-agent string and a brief description of how you intend to use the data. We add operators that respect robots.txt, declare a stable user-agent, and provide a contact for takedowns.