Skip to content

docs: document AI contribution policy and agent guidelines#27

Open
SamBarker wants to merge 9 commits intokroxylicious:mainfrom
SamBarker:ai-contribution-policy
Open

docs: document AI contribution policy and agent guidelines#27
SamBarker wants to merge 9 commits intokroxylicious:mainfrom
SamBarker:ai-contribution-policy

Conversation

@SamBarker
Copy link
Member

Summary

Addresses #26 — documenting how AI may be used when crafting contributions to the project.

  • Adds an "Use of AI Assistance" section to CONTRIBUTING.md establishing the project's position: AI tools are permitted, but the contributor owns what they submit, must understand it, and must disclose significant AI usage.
  • Adds an "About the Project" section to CONTRIBUTING.md noting the Java/Maven foundation.
  • Clarifies the PR review section: AI-assisted reviews supplement but do not substitute for Committer review; merge decisions follow the project's decision making framework.
  • Adds an org-level AGENTS.md providing AI coding tools with process expectations (DCO, commit discipline, PR standards, naming conventions). Individual repositories can add their own AGENTS.md with repo-specific technical details.

Why Assisted-by rather than Apache's Generated-by

Apache's Generated-by trailer is primarily about provenance tracking — an audit trail so the foundation can later query "which artifacts did model X generate?" if licensing concerns emerge around a model's training data. The focus is on the output's origin.

Kroxylicious's Assisted-by trailer is primarily about contributor responsibility. The policy's core message is "you are the contributor" — the trailer reinforces that the human is in the driving seat and the tool assisted them, rather than implying the tool produced the output and the human accepted it. The DCO sign-off already establishes legal accountability; Assisted-by extends that spirit to tooling disclosure.

In practice, both provide the same audit trail if needed. The difference is philosophical: Generated-by frames the tool as the actor, Assisted-by frames the contributor as the actor. The latter is more consistent with this project's emphasis on contributor ownership and understanding.

References consulted

Test plan

  • Review CONTRIBUTING.md for tone, completeness, and consistency with governance model
  • Review AGENTS.md for clarity as instructions to AI tools
  • Verify links to GOVERNANCE.md#decision-making, DCO.txt, and LICENSE resolve correctly
  • Discuss whether the Assisted-by trailer format meets the project's needs

🤖 Generated with Claude Code

Sets out the project's position on AI-assisted contributions:
contributors may use AI tools, but they own what they submit,
must understand it, and must disclose significant AI usage.
Also introduces the concept of AGENTS.md files in repositories.

Closes kroxylicious#26

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds guidance on using an Assisted-by trailer in commit messages
to identify the AI tool and model used. The trailer is intended
to be populated by the tooling itself, with AGENTS.md providing
tool-specific configuration details.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds a brief 'About the Project' section noting Kroxylicious
is a Java project built with Apache Maven.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Provides AI coding tools with process expectations including
DCO sign-off, Assisted-by trailers, commit discipline, and
pull request review requirements. Clarifies that human committer
review and merge decisions are not substituted by AI reviews.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds language to both CONTRIBUTING.md and AGENTS.md clarifying
that AI-assisted reviews supplement but do not substitute for
Committer review, and that merge decisions follow the project's
decision making framework.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds guidance on commit messages (why not what), cohesive PRs,
PR descriptions focused on problems and trade-offs, and naming
conventions that prefer intent over encoded logic.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
@SamBarker SamBarker requested a review from a team as a code owner March 9, 2026 01:06
@tombentley
Copy link
Member

@k-wall's issue gave the following reasons for wanting an AI contribution policy:

guide contributors as to acceptable use of AI, attribution, and the avoidance to licensing conflicts.

I think there are broadly two way we can look at those things:

  1. What's legally necessary to protect the project.

    • I think it could be a serious problem is someone contributed changes which later we found to have transcluded copyrighted material. In CONTRIBUTING we should at least be requesting people turn on those controls in their AI's configuration which try to address this. And in AGENTS.md we should say something to instruct the agent to turn that up to the max, or to not generate anything at all and tell the developer if it lacks such controls. Perhaps we should also look into tools which can detect snippets of copyrighted material.
    • My understanding of US copyright law is that AI generated content is thought to not be copyrightable. In principle that could mean that over time there is less copyrightable code in the project upon which we can assert our license conditions. In order words, if contributors end up rewriting more and more of the code base using these tools we're watering down our ability to enforce our own licensing. Using Assisted-By will make it easier to figure out that this is the case. But it makes it easier for both us, and someone hostile to the goals of the project, so it's a double edged sword.
    • Likewise using Assisted-By makes it easier for someone to identify projects which have accepted lots of contributions from particular model. Perhaps that becomes a risk further down the line if it became apparent the model was trained on copyrighted data, or if there was a change in copyright law in some jurisdiction.
  2. What's socially necessary to protect the project. We have not yet been deluged by AI-authored slop, though it seems likely to happen sooner or later.

    • Are we acting too soon, before we've seen how our contributors are actually using AI?

    • If now is the time to act, let's be clear that the rules should be about preserving community bandwidth. AIs can generate words a lot fast that we can consume them, and contributors need to be respectful of that. So I think the CONTRIBUTING should have this bit from the Sarama example you linked to:

      If you open a pull request you must be able to clearly explain what your changes do and how they alter the behaviour of Sarama without relying upon AI tools or prompting to roundtrip the reviewer's questions. If you cannot confidently explain and defend your contribution during review, do not submit it until you can.

      And I would add:

      We will close PRs where we suspect the contributor does not understand the code they're contributing without recourse to an AI.

    • I also liked this bit:

      AI assistance may be used when drafting issues, proposals, or discussion posts, but a human must remain fully in the loop and all AI-generated content must be reviewed, fact-checked, and edited before submission. Ensure your prompts steer it to remove unnecessary fluff, verbosity, filler and irrelevant content.

Finally, It's not clear to me that AGENTS.md is really an adopted thing that AIs are likely to respect. Certainly my own conversation with an AI suggested it was an emerging rather than recognised standard. It suggested symlinking to .cursorrules or .clinerules for the time being.

@k-wall
Copy link
Member

k-wall commented Mar 9, 2026

I like the suggestions that @tombentley is making under the 2) bullet.

Explicitly requires AI-generated content must not reproduce
copyrighted material and that contributors enable available
controls to reduce that risk. Adds that PRs may be closed where
the contributor does not appear to understand their submission.
Adds matching copyright instruction to AGENTS.md.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds conciseness requirement to both CONTRIBUTING.md and
AGENTS.md. Adds PR review guidance that unfocused or oversized
PRs may be closed and the contributor asked to break them down.
These apply to all contributions regardless of how they were
produced.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Routine IDE-like AI features (code completion, spelling) do not
require disclosure. Disclosure is expected when AI generates
substantial content such as functions, tests, or documentation.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Commits should include an `Assisted-by` trailer identifying the tool and model used (e.g. `Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>`).
Most AI coding tools can be configured to add this automatically — see the repository's `AGENTS.md` for details.
Use of AI features in the same way you would use an IDE — code completion, spelling, and the like — does not require disclosure.
Disclosure is expected when AI tools are used to generate substantial content such as functions, tests, documentation, or design approaches.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: AI-assisted thinking vs AI-assisted production

One scenario worth considering: a contributor discusses design options with an AI tool but then writes the code and PR themselves, without the AI being directly involved in producing the contribution.

Under this policy, we don't think this requires disclosure. The contributor understood the problem, evaluated the options, and wrote the code — the AI influenced their thinking in much the same way that reading a blog post, discussing ideas with a colleague, or whiteboarding a design would. The policy is concerned with AI tools producing the content of a contribution, not with how a contributor arrived at their ideas.

This also helps clarify the intent behind "played a significant role in producing a contribution" — it's about the production of the submitted content, not about the contributor's broader learning or decision-making process.

Does this reading match others' expectations, or should the policy say something explicit about this distinction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this reading match others' expectations, or should the policy say something explicit about this distinction?

It matches my expectations. No need to say anything explicit.

Copy link

@rgodfrey rgodfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants