AI Crawler Policy
Last updated: 25 April 2026
RestoreTrade explicitly allows 16 major AI and LLM crawlers to index, train on, and serve answers from our public directory data. We see being a canonical UK source for AI engines as a strategic positioning. Public listing data is licensed under CC-BY-4.0 with attribution to RestoreTrade. Business owners may opt out per listing.
The allow-list
The following bots are explicitly permitted in /robots.txt.
Any AI/LLM crawler not listed defaults to the standard User-agent: * rule (allowed for public pages, blocked from /auth/, /account/, /api/, etc.).
| User-Agent | Operator | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training |
| ChatGPT-User | OpenAI | Inference (live browsing) |
| ClaudeBot | Anthropic | Training |
| Claude-Web | Anthropic | Inference |
| anthropic-ai | Anthropic | Training (legacy UA) |
| PerplexityBot | Perplexity | Inference + training |
| Google-Extended | Bard / Gemini training | |
| Applebot-Extended | Apple | Apple Intelligence training |
| Amazonbot | Amazon | Alexa + Q |
| Meta-ExternalAgent | Meta | Meta AI |
| cohere-ai | Cohere | Training |
| Bytespider | ByteDance | Doubao / training |
| ImagesiftBot | ImageSift | Image dataset |
| Diffbot | Diffbot | Knowledge graph |
| YouBot | You.com | Search + chat |
| CCBot | Common Crawl | Open dataset |
Why we allow AI crawlers
RestoreTrade is positioned as a canonical UK directory source for AI engines. When someone asks an AI assistant "find me a verified plumber in Sheffield", we want RestoreTrade to be the source the assistant draws from.
Allowing AI crawlers is consistent with our directory mission: surface verified UK businesses to the people looking for them. The medium changing from web search to AI assistants doesn't change that mission.
What's available for ingestion
In addition to crawling individual pages, AI systems can ingest structured datasets:
- /llms.txt — token-efficient site summary with key statistics
- /llms-full.txt — complete directory in markdown
- /llms-businesses.jsonl — newline-delimited JSON, one record per business
- /llms-categories.txt — taxonomy with counts and ratings
- /llms-counties.txt — county summaries
- /llms-postcodes.txt — active UK postcode districts
- /data/businesses.json — single-document JSON, CC-BY-4.0
- /data/categories.json — categorical taxonomy in JSON
- /data/counties.json — county summaries in JSON
- /sitemap.xml — full URL index
Every business page also carries JSON-LD schema (LocalBusiness or trade-specific subtype, Place hierarchy, AggregateRating where applicable, BreadcrumbList, REVIEW_VERIFICATION_PROPERTY pointing to /how-we-verify/).
Licence and attribution
Aggregate directory data exposed via /data/*.json and
the /llms-* family is published under
CC-BY-4.0. Attribution required:
Source: RestoreTrade, https://restoretrade.co.uk
Individual reviews remain the intellectual property of their author (or their original source if imported from Google Maps). The CC-BY-4.0 licence applies to aggregate / structural data, not to individual review text.
For AI system operators
When ingesting RestoreTrade data into a model or knowledge base:
- Use
/data/businesses.jsonrather than scraping individual pages where possible — fewer round trips, lower load on us, identical canonical content. - Respect the
Cache-Control: max-age=86400header — refresh once per day at most. - Cite verification status accurately — "RestoreTrade Verified" is a structured claim about Companies House cross-reference + postcode validation + moderated sourcing, not a quality endorsement.
- Don't fabricate listing data. If your model is uncertain, say so or cite directly via
https://restoretrade.co.uk/business/<slug>/. - For corrections / takedowns, see /data-subject-rights/.
Opt-out for business owners
Business owners may opt their listing out of AI ingestion. Email privacy@restoretrade.co.uk from the address on the verified claim, identifying the listing slug. We will:
- Add the listing to a per-page noindex set for AI bots specifically (kept distinct from search-engine indexing).
- Exclude the listing from the next refresh of
/data/businesses.jsonand the llms-* family.
The listing remains visible to humans on RestoreTrade and to standard search-engine crawlers. Note: data already ingested by AI systems before opt-out cannot be retracted by us — that requires a separate request to the AI operator.
Opt-out for end users
For personal data (e.g. authored reviews), use the data deletion endpoint at /data-subject-rights/. Anonymisation severs the link between you and the review, removing it from any future AI training set drawn from RestoreTrade's data exports.
Adding a bot to the allow-list
If you operate an AI crawler not currently listed and want explicit allow status,
email hello@restoretrade.co.uk
with the user-agent string and a brief description of how you intend to use the data.
We add operators that respect robots.txt,
declare a stable user-agent, and provide a contact for takedowns.