Conversation
| User-agent: GPTBot | ||
| User-agent: PerplexityBot | ||
| User-agent: Google-Extended | ||
| Allow: / |
There was a problem hiding this comment.
does this override the all-agent setting below?
We still want to disallow /sdk-api/ (to encourage crawlers to only crawl the most recent versions)
There was a problem hiding this comment.
It's a bit of a black box and likely depends on how each specific crawler behaves.
The AI discoverability tools are yelling at me about disallowing .md now because there's no explicit allowance for crawlers to see it. When I looked up how these rules work there's a handwaving "more explicit rules overrule more general rules".
So I've made some changes. Now it gives a set of explicit rules for the bots and then a set of explicit rules for general user agents. I think this should work but.... I dunno. Take a look.
| @@ -0,0 +1,81 @@ | |||
| # npx antora --clean --fetch antora-playbook.yml | |||
There was a problem hiding this comment.
did you mean to check in this local playbook?
There was a problem hiding this comment.
I did not. Unsure why this one change persisted but I've cleaned it up.
| Allow: /sdk-api/couchbase-core-io/ | ||
| Allow: /sdk-api/couchbase-transactions-dotnet/ | ||
| # Sitemap and LLM index | ||
| Sitemap: https://docs.couchbase.com/sitemap.xml |
There was a problem hiding this comment.
Is the Sitemap directive "top-level" or also scoped under the User-agent bracket? 🤔
If latter, might be worth repeating for *
There was a problem hiding this comment.
I don't think it matters but the redundancy is unlikely to hurt so we'll pop it in for now.
|
Made some further changes to try and make the robots.txt permissions as explicit and granular as possible. |
osfameron
left a comment
There was a problem hiding this comment.
Looks OK, I still don't fully understand why this is needed, but doesn't look like it'll do any harm, and the explicit Sitemap: directive is a good call 👍
As per the ticket: https://jira.issues.couchbase.com/browse/DOC-14190
_This is mostly political and boiler plate. We already allow blanket access for agents but many tools that critique AI discoverability will not recognise this or penalise us for not having more granular rules.
Since it doesn’t actually harm anything to implement such rules… I’m going to suggest we just do it._
Let me know if you think this is a bad idea.