docs: draft agent-oriented linting paper by danielchen0 · Pull Request #67 · Create-Inc/laint

danielchen0 · 2026-05-17T01:00:36Z

Summary

Drafts an arXiv-style paper for laint around agent-oriented linting for generated JSX/TSX applications. The current draft frames laint as both an expert-curated benchmark and a feedback-loop tool for surfacing framework-specific generated-app failures before slower build, preview, device, or runtime checks.

The PR now includes checked-in raw prompt-grid artifacts, generated result tables, and a repair-loop pilot. The repair results are framed as diagnostic-feedback compliance signals: 476 -> 101 reported findings, 375 net reduction, 445 rule-level findings resolved, and 70 introduced findings across the repair loop. The paper still treats these as raw benchmark signals until human precision/recall labeling and downstream build/runtime/user-acceptance checks are added.

Verification

npm run lint
npm run build
npm run knip
npm test
npm run paper:tables
make -C paper
Checked paper/main.log for undefined refs/citations and overfull/warning/error lines
Verified PR review thread is resolved and CI is green

Remaining Before Submission

Add final author affiliations/contact metadata
Label findings for precision and recall, or explicitly publish as an unlabeled pilot
Pair laint outcomes with typecheck/build/preview/runtime/user-acceptance labels
Decide whether to keep this as arXiv-only or target a workshop format too

arnavsurve · 2026-05-27T23:48:18Z

+  \item an agent hook, a small integration point that runs after file edits and feeds findings back to the coding agent.
+\end{itemize}
+
+The hook interface is important because it shifts linting from a terminal command a human remembers to run into an automatic part of the agent's edit loop. A finding is not merely a report; it becomes a prompt for the next repair action.


"human remembers" is probably not a great example because we have agents who can run linters now. could probably hinge this on more immediate feedback/faster iteration

arnavsurve · 2026-05-27T23:50:56Z

+
+\section{Motivation}
+
+Generated applications fail in ways that reflect both the target framework and the generator's learned habits. In internal use, many defects were not exotic compiler problems. They were small but consequential choices: using a browser API in a server-rendered module, importing React Native primitives into a web project, omitting a \texttt{response.ok} check, using unsupported animation patterns, or forgetting an Expo-specific layout guard. These problems are easy to fix once identified, but expensive when discovered only after preview, deployment, or user interaction.


feels like this fails to make a distinction of why the laint model is better for our target problem versus a normal linter

like in theory you could define custom lint rules with eslint to catch these things too. but there's a reason we want to hook on file edit

arnavsurve · 2026-05-27T23:53:02Z

+The current laint implementation contains 55 rules and 59 test files. Table~\ref{tab:categories} summarizes the rule corpus by category. These categories are taken from the \texttt{category} field in each rule's metadata rather than assigned after the fact for the paper. The corpus contains 15 error-level rules and 40 warning-level rules. Seventeen rules are universal, while the remaining rules target Expo, web, backend, or a combination of platforms.
+
+\paragraph{Version pinning.}
+All rule counts and reported benchmark artifacts in this paper are tied to a fixed repository state: \texttt{main} commit \texttt{6a60a0295955ee6cc1d639c88955ea50722e3516}, dated 2026-05-14. The counts are reproducible from checked-in repository artifacts using the \texttt{paper:stats} script documented with the paper source, and the archived run artifacts include metadata for the runner, prompt IDs, model aliases, model IDs, and token or repair-turn limits. Future benchmark reports should cite either an immutable commit hash or a purpose-named git tag so that later rule additions, rule rewrites, or prompt-suite changes do not change the meaning of previously reported results.


what's this future benchmark reports line about? for citations or something? followup research?

arnavsurve · 2026-05-27T23:56:06Z

+\paragraph{Version pinning.}
+All rule counts and reported benchmark artifacts in this paper are tied to a fixed repository state: \texttt{main} commit \texttt{6a60a0295955ee6cc1d639c88955ea50722e3516}, dated 2026-05-14. The counts are reproducible from checked-in repository artifacts using the \texttt{paper:stats} script documented with the paper source, and the archived run artifacts include metadata for the runner, prompt IDs, model aliases, model IDs, and token or repair-turn limits. Future benchmark reports should cite either an immutable commit hash or a purpose-named git tag so that later rule additions, rule rewrites, or prompt-suite changes do not change the meaning of previously reported results.
+
+\begin{table}[ht]


this gets pushed to the next page when compiled btw, might want to try an [h!] or similar to force "here"

might be [ht] maybe just try [h] or [h!]

arnavsurve · 2026-05-27T23:58:23Z

+Mobile and platform-compatibility rules prevent generated code from mixing incompatible APIs or violating layout constraints. Examples include checks for web/native import boundaries, Expo image imports, safe-area handling around notches and home indicators, keyboard avoidance around text inputs, and bottom padding for native tab screens. These are common in agent-written code because examples for web and native React are semantically similar but operationally distinct.
+
+\paragraph{Framework conventions.}
+Expo~\cite{expo}, Next.js~\cite{nextjs}, Tailwind, and screen-transition rules encode conventions that are not always enforced by the compiler. Examples include absolute route paths, tab header configuration, animation worklet directives, transition progress ranges, shared-transition tag matching, and animation class restrictions. These are not arbitrary style preferences; they are small framework contracts that generated code often violates while still remaining valid TypeScript.


this prompts me to think that these patterns must be documented - perhaps make a case how laint provides token efficiency by encoding this stuff instead of relying on non-deterministic documentation grepping

danielchen0 added 18 commits May 16, 2026 18:00

docs: draft agent-oriented linting paper

b8b6e69

docs: format paper readme

3111272

docs: include mobile app framing in paper

da9695f

docs: describe prompt-to-code eval

4624ac9

docs: add prompt grid eval harness

286679d

docs: add preliminary eval numbers

70afa7b

docs: frame laint as llm benchmark

1be8c14

docs: clarify laint benchmark framing

fba83fd

docs: describe benchmark behavioral signals

d3146e5

docs: add Arnav Surve as paper author

759ddfd

docs: rename paper validity section to limitations

5be1551

docs: clarify local heuristic tradeoff

c72b770

docs: clarify paper terminology

2504637

docs: add recall to detector metrics

22fed69

docs: add f-score detector metric

e162974

docs: pin paper benchmark version

c74d8f0

docs: clarify rule category source

f9a403f

docs: make paper numbers reproducible

fb7af44

arnavsurve reviewed May 17, 2026

View reviewed changes

Comment thread paper/main.tex

danielchen0 added 7 commits May 16, 2026 22:34

docs: archive full prompt grid artifact

182c08f

docs: highlight edit-time repair loop

0c5d616

docs: add expanded grid data to paper

afa6a41

docs: add generated result tables to paper

77b4992

docs: add repair loop pilot to paper

d0ca77f

docs: frame repair loop as diagnostic compliance

e7da143

docs: tighten benchmark pilot framing

5e7eae4

arnavsurve reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: draft agent-oriented linting paper#67

docs: draft agent-oriented linting paper#67
danielchen0 wants to merge 25 commits into
mainfrom
paper/agent-oriented-linting

danielchen0 commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

arnavsurve May 27, 2026

Uh oh!

arnavsurve May 27, 2026

Uh oh!

arnavsurve May 27, 2026

Uh oh!

arnavsurve May 27, 2026 •

edited

Loading

Uh oh!

arnavsurve May 27, 2026

Uh oh!

arnavsurve May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		\section{Motivation}

		Generated applications fail in ways that reflect both the target framework and the generator's learned habits. In internal use, many defects were not exotic compiler problems. They were small but consequential choices: using a browser API in a server-rendered module, importing React Native primitives into a web project, omitting a \texttt{response.ok} check, using unsupported animation patterns, or forgetting an Expo-specific layout guard. These problems are easy to fix once identified, but expensive when discovered only after preview, deployment, or user interaction.

Conversation

danielchen0 commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Remaining Before Submission

Uh oh!

Uh oh!

arnavsurve May 27, 2026

Choose a reason for hiding this comment

Uh oh!

arnavsurve May 27, 2026

Choose a reason for hiding this comment

Uh oh!

arnavsurve May 27, 2026

Choose a reason for hiding this comment

Uh oh!

arnavsurve May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arnavsurve May 27, 2026

Choose a reason for hiding this comment

Uh oh!

arnavsurve May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielchen0 commented May 17, 2026 •

edited

Loading

arnavsurve May 27, 2026 •

edited

Loading

arnavsurve May 27, 2026 •

edited

Loading