Skip to content

docs: draft agent-oriented linting paper#67

Draft
danielchen0 wants to merge 25 commits into
mainfrom
paper/agent-oriented-linting
Draft

docs: draft agent-oriented linting paper#67
danielchen0 wants to merge 25 commits into
mainfrom
paper/agent-oriented-linting

Conversation

@danielchen0
Copy link
Copy Markdown
Collaborator

@danielchen0 danielchen0 commented May 17, 2026

Summary

Drafts an arXiv-style paper for laint around agent-oriented linting for generated JSX/TSX applications. The current draft frames laint as both an expert-curated benchmark and a feedback-loop tool for surfacing framework-specific generated-app failures before slower build, preview, device, or runtime checks.

The PR now includes checked-in raw prompt-grid artifacts, generated result tables, and a repair-loop pilot. The repair results are framed as diagnostic-feedback compliance signals: 476 -> 101 reported findings, 375 net reduction, 445 rule-level findings resolved, and 70 introduced findings across the repair loop. The paper still treats these as raw benchmark signals until human precision/recall labeling and downstream build/runtime/user-acceptance checks are added.

Verification

  • npm run lint
  • npm run build
  • npm run knip
  • npm test
  • npm run paper:tables
  • make -C paper
  • Checked paper/main.log for undefined refs/citations and overfull/warning/error lines
  • Verified PR review thread is resolved and CI is green

Remaining Before Submission

  • Add final author affiliations/contact metadata
  • Label findings for precision and recall, or explicitly publish as an unlabeled pilot
  • Pair laint outcomes with typecheck/build/preview/runtime/user-acceptance labels
  • Decide whether to keep this as arXiv-only or target a workshop format too

Comment thread paper/main.tex
Comment thread paper/main.tex
\item an agent hook, a small integration point that runs after file edits and feeds findings back to the coding agent.
\end{itemize}

The hook interface is important because it shifts linting from a terminal command a human remembers to run into an automatic part of the agent's edit loop. A finding is not merely a report; it becomes a prompt for the next repair action.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"human remembers" is probably not a great example because we have agents who can run linters now. could probably hinge this on more immediate feedback/faster iteration

Comment thread paper/main.tex

\section{Motivation}

Generated applications fail in ways that reflect both the target framework and the generator's learned habits. In internal use, many defects were not exotic compiler problems. They were small but consequential choices: using a browser API in a server-rendered module, importing React Native primitives into a web project, omitting a \texttt{response.ok} check, using unsupported animation patterns, or forgetting an Expo-specific layout guard. These problems are easy to fix once identified, but expensive when discovered only after preview, deployment, or user interaction.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this fails to make a distinction of why the laint model is better for our target problem versus a normal linter

like in theory you could define custom lint rules with eslint to catch these things too. but there's a reason we want to hook on file edit

Comment thread paper/main.tex
The current laint implementation contains 55 rules and 59 test files. Table~\ref{tab:categories} summarizes the rule corpus by category. These categories are taken from the \texttt{category} field in each rule's metadata rather than assigned after the fact for the paper. The corpus contains 15 error-level rules and 40 warning-level rules. Seventeen rules are universal, while the remaining rules target Expo, web, backend, or a combination of platforms.

\paragraph{Version pinning.}
All rule counts and reported benchmark artifacts in this paper are tied to a fixed repository state: \texttt{main} commit \texttt{6a60a0295955ee6cc1d639c88955ea50722e3516}, dated 2026-05-14. The counts are reproducible from checked-in repository artifacts using the \texttt{paper:stats} script documented with the paper source, and the archived run artifacts include metadata for the runner, prompt IDs, model aliases, model IDs, and token or repair-turn limits. Future benchmark reports should cite either an immutable commit hash or a purpose-named git tag so that later rule additions, rule rewrites, or prompt-suite changes do not change the meaning of previously reported results.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this future benchmark reports line about? for citations or something? followup research?

Comment thread paper/main.tex
\paragraph{Version pinning.}
All rule counts and reported benchmark artifacts in this paper are tied to a fixed repository state: \texttt{main} commit \texttt{6a60a0295955ee6cc1d639c88955ea50722e3516}, dated 2026-05-14. The counts are reproducible from checked-in repository artifacts using the \texttt{paper:stats} script documented with the paper source, and the archived run artifacts include metadata for the runner, prompt IDs, model aliases, model IDs, and token or repair-turn limits. Future benchmark reports should cite either an immutable commit hash or a purpose-named git tag so that later rule additions, rule rewrites, or prompt-suite changes do not change the meaning of previously reported results.

\begin{table}[ht]
Copy link
Copy Markdown
Contributor

@arnavsurve arnavsurve May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets pushed to the next page when compiled btw, might want to try an [h!] or similar to force "here"

Image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be [ht] maybe just try [h] or [h!]

Comment thread paper/main.tex
Mobile and platform-compatibility rules prevent generated code from mixing incompatible APIs or violating layout constraints. Examples include checks for web/native import boundaries, Expo image imports, safe-area handling around notches and home indicators, keyboard avoidance around text inputs, and bottom padding for native tab screens. These are common in agent-written code because examples for web and native React are semantically similar but operationally distinct.

\paragraph{Framework conventions.}
Expo~\cite{expo}, Next.js~\cite{nextjs}, Tailwind, and screen-transition rules encode conventions that are not always enforced by the compiler. Examples include absolute route paths, tab header configuration, animation worklet directives, transition progress ranges, shared-transition tag matching, and animation class restrictions. These are not arbitrary style preferences; they are small framework contracts that generated code often violates while still remaining valid TypeScript.
Copy link
Copy Markdown
Contributor

@arnavsurve arnavsurve May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prompts me to think that these patterns must be documented - perhaps make a case how laint provides token efficiency by encoding this stuff instead of relying on non-deterministic documentation grepping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants