Expenses get hard to categorize by keyword rules and exceptions with code alone. Using an LLM helps sort those random charges and handle exceptions with grace.
- upload a CSV file
- LLM will categorize based on static categories in the code (from when this was a personal app, code changes were easy)
- display spending by category in a graphic
- annotate expenses by: editing the category, adding a note, and flagging as reimbursable (reimbursable expenses are very helpful to identify for me)
- each consecutive run injects annotations in the sorting prompt - the LLM inference gets sharper every time with manual intervention
- vercel's proxy for LLM providers
- BYOK for Anthropic, set by the SDK and key provided to the Vercel console
- AI Gateway was experiencing throttling for free tier usage, so I could not use a standard proxy - AI Gateway handles fallback to another LLM provider and tracks spend across each under normal circumstance
- use the AI SDK to interact with Vercel's AI proxy
- created a tool called runAnalysis that can use LLM-generated JavaScript to query the database for spending insights
- our server spins up a Vercel Sandbox on demand, as an isolated compute instance to run this code, untrusted
- the LLM may call the tool repeatedly and get back structured data from each tool call
- insights are generated based on the user's profile and what their goals are - this is entered on the app and saved in the DB
- isolated compute instances specifically designed to execute untrusted code, with all the bells and whistles to support it
- fully serverless, spin them up on demand
- secret injection at egress: LLM code never has access to API keys because you can inject the secrets as the request leaves the instance
- audit trail shows us what the agent did during its inference, visible in the app
- layer 7 egress policies: Our sandbox can only connect to the postgres instance powering our app, nowhere else
- sandboxes are exactly the kind of compute that will allow agents to securely deliver on their promise for productivity
- LLM-driven categorization automates tasks that cannot be fully done by a determinstic system. Code only gets us so far, but AI gets us over the finish line for quality outcomes
- at scale or in business, security becomes a blocker/enabler when done appropriately: least privilege access with secret injection and network policies help IT/Security leadership sleep at night and make their decision easy when choosing an AI partner
This app was largely developed with coding assistants, but product choices, system design, and (most importantly) the writeup and storytelling were entirely done by me