Skip to content

updated robots.txt#884

Merged
TimLFletcher merged 4 commits intomasterfrom
DOC-14190
Apr 10, 2026
Merged

updated robots.txt#884
TimLFletcher merged 4 commits intomasterfrom
DOC-14190

Conversation

@TimLFletcher
Copy link
Copy Markdown
Contributor

As per the ticket: https://jira.issues.couchbase.com/browse/DOC-14190

_This is mostly political and boiler plate. We already allow blanket access for agents but many tools that critique AI discoverability will not recognise this or penalise us for not having more granular rules.

Since it doesn’t actually harm anything to implement such rules… I’m going to suggest we just do it._

Let me know if you think this is a bad idea.

@TimLFletcher TimLFletcher requested a review from osfameron March 30, 2026 12:13
Comment thread antora-playbook.yml
User-agent: GPTBot
User-agent: PerplexityBot
User-agent: Google-Extended
Allow: /
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this override the all-agent setting below?
We still want to disallow /sdk-api/ (to encourage crawlers to only crawl the most recent versions)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a black box and likely depends on how each specific crawler behaves.

The AI discoverability tools are yelling at me about disallowing .md now because there's no explicit allowance for crawlers to see it. When I looked up how these rules work there's a handwaving "more explicit rules overrule more general rules".

So I've made some changes. Now it gives a set of explicit rules for the bots and then a set of explicit rules for general user agents. I think this should work but.... I dunno. Take a look.

Comment thread local-antora-playbook-k8s.yml Outdated
@@ -0,0 +1,81 @@
# npx antora --clean --fetch antora-playbook.yml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to check in this local playbook?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not. Unsure why this one change persisted but I've cleaned it up.

Comment thread antora-playbook.yml
Allow: /sdk-api/couchbase-core-io/
Allow: /sdk-api/couchbase-transactions-dotnet/
# Sitemap and LLM index
Sitemap: https://docs.couchbase.com/sitemap.xml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Sitemap directive "top-level" or also scoped under the User-agent bracket? 🤔
If latter, might be worth repeating for *

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters but the redundancy is unlikely to hurt so we'll pop it in for now.

@TimLFletcher
Copy link
Copy Markdown
Contributor Author

Made some further changes to try and make the robots.txt permissions as explicit and granular as possible.

@TimLFletcher TimLFletcher requested a review from osfameron April 9, 2026 12:58
Copy link
Copy Markdown
Collaborator

@osfameron osfameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, I still don't fully understand why this is needed, but doesn't look like it'll do any harm, and the explicit Sitemap: directive is a good call 👍

@TimLFletcher TimLFletcher merged commit 7954d61 into master Apr 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants