The most efficient way to implement agentic coding into your stack.
Burning through context and constantly hitting your token limit? Is Claude hallucinating, making poor decisions, and forgetting? Devs today don't know how method to efficiently use of AI for coding.
AI generated code has become an asset as well as a liability. Engineers are no longer spending their time writing code from scratch. They use coding agents to churn out slop, then spend most of their time 1) debugging the slop or 2) studying the anatomy. In order to truly own what you ship, you must keep a human in the loop at every stage. You can do this by manipulating the speed at which AI writes code.
Let's say you have a general idea of what you want to build.
Invoke /grill-me, a skill that tells Claude or Codex to aggressively interrogate you with short questions to align on what the product will look like. For example:
# Claude
USER: Let's create a PRD together. I am going to pass the PRD to a coding agent with no context.
Here is a description of what I want to build: [general product description] /grill-me
CLAUDE: Question 1: Who is the user and what information should be passed into a unique user profile?
USER: [Your Answer]
CLAUDE: Question 2: Which frontend framework is best for this product? My recommendation is React JS for its component-based architecture, which mirrors your need for a rich user experience.
USER: ...
... And Claude keeps prompting until it has enough information to build a product. That means the product will only contain assumptions, calculations, features, and stack that YOU have verified. That keeps a human-in-the-loop.
grill-me skill: https://github.com/mattpocock/skills/tree/main - Matt Pocock
Now you have a PRD.md file, generated by Claude. You now prompt Codec to review and critique the file.
"Bouncing" from one LLM to another ensures ruthless criticism. Whether brainstorming or generating code, bouncing is a HIGHLY effective method for filtering out poor judgement and unreasonable assumptions.
# Codex
USER: I want to build [general product description]. Claude generated this PRD for me. Conduct QA. Critique its production-readiness, the backend logic, the chosen frameworks, and assumptions (can they be cited from literature or credible sources?).
[PRD.md]
CODEX: [review]
Note: it is important to tell Codex that Claude generated the PRD (or vice-versa if you reverse the stack) as it eliminates positive bias.
Now you can pass the review back to Claude.
# Claude
USER: Here is a critique of your PRD generated from Codex: [review]. Walk through each of the points and try to DEFEND your original assumption. Only accept the changes that you cannot defend.
CLAUDE: [PRD_revised.md]
- CODE + BOUNCE
Now you pass on the file to a coding agent-- whichever one you want. Now you have the hang of the pattern: align with agent, bounce to another for review, align, bounce.
# Codex:
USER: Read PRD_revised.md. Before writing any code, list every approved function you will call and confirm you understand which components are retired and why.
CODEX: [verifies alignment, potentially asks clarifying questions]
USER: Now execute.
CODEX: [Codebase]
Too many engineers prompt the agent to code immediately-- that's what produces slop. Testing the agent's understanding in this way ensures alignment once again.
# Claude
USER: Review this codebase and conduct QA. [Codebase] Test a few Validation Cases if available. Flag redundant chunks of code.
Note: AI often generates redundant code, and ruthless criticism is a good way to flag it.
Although this seems like running a lot of inference, this method is by far the most efficient way to ensure that you aren't burning through a million chats to debug.
Fixing low-quality AI-generated code is more wasteful than preemptively aligning and checking logic.
-
Know your token usage with /cost
-
Clear your context before 75k tokens -- that's when LLMs start to forget information at the beginning of your context window = hallucination.
-
Semantic context compression: summarize your context and start over to avoid polluting reasoning with unnecessary information.
In code: "/compact" OR
In chat: tell your LLM to summarize the chat prior, then paste the summary into another chat
Written by me, unassisted with AI.