Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 86 additions & 79 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,94 +1,101 @@
# Agent Development Guide

This document guides AI agents working on the Agora Conversational AI demo project.
This guide is for coding agents making changes in `agent-quickstart-python`.

## Project Overview
## Start Here

A real-time voice conversation application with AI agents, built with:
- **Frontend**: Next.js 16 + React 19 + TypeScript + Agora Web SDK
- **Backend**: Python FastAPI + Agora Conversational AI Agent SDK
- Read [README.md](./README.md) for setup, supported run modes, and verification.
- Use [ARCHITECTURE.md](./ARCHITECTURE.md) for system-level request flow.
- Use module guides only when working inside that module:
- [web-client/AGENTS.md](./web-client/AGENTS.md)
- [server-python/AGENTS.md](./server-python/AGENTS.md)

## Project Structure
## Current System Shape

```
.
├── web-client/ # Frontend application (Next.js + React)
└── server-python/ # Backend service (FastAPI + Agora Agent SDK)
```
- Frontend: Next.js 16, React 19, TypeScript, `agora-rtc-react`, `agora-rtm`, `agora-agent-client-toolkit`, `agora-agent-uikit`
- Local backend: Python FastAPI in `server-python`
- Deployed web backend: Next route handlers in `web-client/app/api`
- Auth: Token007 generated from `AGORA_APP_ID` and `AGORA_APP_CERTIFICATE`
- Default agent config: managed Deepgram STT, OpenAI LLM, and MiniMax TTS

## Supported Modes

### Local Python-Backed Development

- Run from the repo root with `bun run dev`
- Root scripts start:
- FastAPI on `http://localhost:8000`
- Next.js on `http://localhost:3000`
- In this mode, the web app still calls `/api/*`, but the Next route handlers proxy to the Python service through `AGENT_BACKEND_URL=http://localhost:8000`

### Single-Target Web Deployment

- Deploy `web-client` as a Next.js app
- `/api/get_config`, `/api/v2/startAgent`, and `/api/v2/stopAgent` run inside the Next app
- Do not assume a separate Python service exists in this mode

## Routing Ownership

- UI and RTC/RTM client lifecycle live in `web-client`
- `/api/*` entrypoints for the web app live in `web-client/app/api`
- Python agent lifecycle logic lives in `server-python/src`
- For deployability changes, update both the README and architecture docs if the owner of `/api/*` changes

## Key Files

- `README.md`: setup, local vs deploy modes, troubleshooting, verification
- `ARCHITECTURE.md`: top-level environment model
- `web-client/src/components/app.tsx`: conversation UI shell
- `web-client/src/hooks/useAgoraConnection.ts`: RTC, RTM, transcript, and token renewal lifecycle
- `web-client/src/lib/server/agora.ts`: shared server-side token and agent helpers for Next route handlers
- `server-python/src/server.py`: FastAPI entrypoints
- `server-python/src/agent.py`: async Agora agent lifecycle wrapper

## Working Rules

## Quick Start
- Prefer the smallest change that keeps local mode and deployed mode aligned.
- Do not reintroduce `web-client/proxy.ts`; the current proxy fallback is route-local through `AGENT_BACKEND_URL`.
- Do not assume Zustand or a separate client-side store exists.
- Do not require third-party vendor API keys unless the code actually introduces a non-managed path.
- Keep token expiry and renewal behavior aligned across the Python backend and Next route handlers.

## Standard Commands

From the repo root:

```bash
# Install dependencies
bun install

# Start both frontend and backend
bun run doctor
bun run doctor:local
bun run dev
bun run verify
bun run verify:local
```

Useful narrower checks:

```bash
bun run verify:web
bun run verify:local:fastapi
bun run verify:web:proxy
bun run verify:backend
```

# Frontend only (port 3000)
bun run frontend
Inside `web-client/`, use:

# Backend only (port 8000)
bun run backend
```bash
bun run doctor
bun run verify
```

## Module-Specific Guides

### Frontend (web-client/)
- [web-client/AGENTS.md](./web-client/AGENTS.md) — AI assistant guide for frontend development
- [web-client/ARCHITECTURE.md](./web-client/ARCHITECTURE.md) — Detailed frontend architecture

### Backend (server-python/)
- [server-python/AGENTS.md](./server-python/AGENTS.md) — AI assistant guide for backend development
- [server-python/ARCHITECTURE.md](./server-python/ARCHITECTURE.md) — Backend architecture details
- [server-python/README.md](./server-python/README.md) — Backend API documentation

### System Architecture
- [ARCHITECTURE.md](./ARCHITECTURE.md) — Overall system architecture and data flow

## Key Technologies

| Layer | Technologies |
|-------|-------------|
| Frontend | Next.js 16, React 19, TypeScript, Agora Web SDK (RTC + RTM), agora-agent-client-toolkit, Zustand, Tailwind CSS |
| Backend | Python 3.8+, FastAPI, agora-agent-server-sdk, uvicorn |
| Auth | Token007 (AccessToken2) — auto-generated from APP_ID + APP_CERTIFICATE |
| Real-time | Agora RTC (audio) + RTM (messaging/transcription) |
| AI Providers | Deepgram (ASR), OpenAI (LLM), ElevenLabs (TTS) |

## Common Development Tasks

### Working on Frontend
See [web-client/AGENTS.md](./web-client/AGENTS.md) for:
- UI component development
- State management patterns (Zustand)
- Agora SDK integration (RTC/RTM)
- API client usage

### Working on Backend
See [server-python/AGENTS.md](./server-python/AGENTS.md) for:
- API endpoint development
- Agent lifecycle management (start/stop via AgentSession)
- Token generation (`generate_convo_ai_token`)
- ASR/LLM/TTS provider configuration

### Cross-Module Changes
1. Review [ARCHITECTURE.md](./ARCHITECTURE.md) for system overview and data flow
2. Check both module-specific AGENTS.md files
3. Verify API contracts — frontend calls `/api/*`, proxied to backend on port 8000
4. Test token flow: backend generates Token007, frontend uses it for RTC/RTM

## Important Notes

- Never commit `.env.local` or credentials
- Frontend proxies `/api/*` requests to backend via `web-client/proxy.ts`
- Agent lifecycle is managed by backend (AgentSession), not frontend
- All Agora SDK calls go through `useAgoraConnection.ts` hook on the frontend
- Authentication uses Token007 (AccessToken2) — only `APP_ID` and `APP_CERTIFICATE` are needed
- Backend uses `Agora(area=Area.US, ...)` client with auto Token007 auth

## Reference Documentation

- [Agora Conversational AI Docs](https://docs.agora.io/en/conversational-ai/overview)
- [Next.js Docs](https://nextjs.org/docs)
- [FastAPI Docs](https://fastapi.tiangolo.com/)
## Done Criteria

Before finishing a change:

1. Run the narrowest relevant verification command.
2. If the change affects the deployable web app, ensure `bun run verify:web` passes.
3. If the change affects local Python-backed development, ensure `bun run verify:local` or the narrower `bun run verify:local:fastapi` / `bun run verify:web:proxy` / `bun run verify:backend` commands pass as appropriate.
4. Treat `server-python/.env.local` as CLI-managed by default. If you change required env vars or setup steps, update both the root README and the module README.
5. Update `README.md` or architecture docs when the developer workflow or request flow changes.

`bun run verify:local:fastapi` exercises the real FastAPI route layer through Next, but with a fake agent implementation so the check stays deterministic and does not depend on a live managed-agent start.
88 changes: 48 additions & 40 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,89 @@
# Agora Conversational AI Demo — Architecture

## System Architecture
This quickstart supports two runtime environments. The UI is the same in both modes, but the owner of `/api/*` changes by environment.

## Local Python-Backed Development

```
┌─────────────────────────────────────────────────────────────┐
│ Frontend │
│ Next.js 16 + React 19 + TypeScript + Agora Web SDK │
│ (Port 3000) │
└──────────────────┬──────────────────────────────────────────┘
│ /api/* proxy (proxy.ts)
┌─────────────────────────────────────────────────────────────┐
│ Backend │
│ Python FastAPI + Agora Agent SDK │
│ (Port 8000) │
└──────────────────┬──────────────────────────────────────────┘
│ REST API (Token007 auth)
┌─────────────────────────────────────────────────────────────┐
│ Agora Cloud Services │
│ • RTC (Real-Time Communication — audio) │
│ • RTM (Real-Time Messaging — subtitles/transcription) │
│ • Conversational AI Engine (ASR + LLM + TTS) │
└─────────────────────────────────────────────────────────────┘
Browser
Next.js app on :3000
/api/* route handlers proxy through AGENT_BACKEND_URL
FastAPI service on :8000
Agora Cloud Services
```

## Data Flow
- `web-client` owns the browser UI and the `/api/*` entrypoints
- `server-python` owns the actual token generation and agent start/stop logic
- this is the mode used by `bun run dev`

## Single-Target Web Deployment

```
Browser
Next.js app
/api/* route handlers run in-process
Agora Cloud Services
```

- `web-client` owns both the UI and the deployed `/api/*` implementation
- `server-python` is not required for this deployment path

## Shared Conversation Flow

### 1. Connection

```
User clicks "Start"
→ Frontend: GET /api/get_config
→ Backend: generate_convo_ai_token(app_id, app_certificate, channel, account)
→ Frontend: Join RTC channel + Login RTM with token
Frontend: GET /api/get_config
→ Generate Token007 config for a user UID, agent UID, and channel
→ Frontend joins RTC and logs into RTM
```

### 2. Agent Start

```
Frontend: POST /api/v2/startAgent { channelName, rtcUid, userUid }
Backend: Build AgoraAgent (Deepgram ASR + OpenAI LLM + ElevenLabs TTS)
Backend: session.start() → agent_id
Agent joins RTC channel → Frontend receives audio + RTM subtitles
→ Build agent session
Scope remote_uids to the requesting user
Start session and return agent_id
```

### 3. Conversation

```
User speaks → RTC audio → Agora Cloud
→ Deepgram (ASR): audio → text
→ OpenAI (LLM): text → response
→ ElevenLabs (TTS): response → audio
→ RTC audio + RTM subtitles → Frontend
User audio → RTC
→ Managed ASR, LLM, and TTS pipeline
→ Agent audio + RTM transcript events
→ UIKit transcript and visualizer in the web app
```

### 4. Agent Stop

```
Frontend: POST /api/v2/stopAgent { agentId }
Backend: session.stop()
Agent leaves channel → Frontend cleanup
Stop session directly or through stateless fallback
Client cleans up RTC and RTM state
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/get_config` | GET | Generate connection config (Token007, channel, UIDs) |
| `/v2/startAgent` | POST | Start AI agent |
| `/v2/stopAgent` | POST | Stop agent by agent_id |
| `/v2/startAgent` | POST | Start the agent session |
| `/v2/stopAgent` | POST | Stop the agent by `agent_id` |

Frontend calls these as `/api/*`, proxied to backend via `web-client/proxy.ts`.
Frontend calls these as `/api/*`. In local Python mode, the Next handlers proxy to `AGENT_BACKEND_URL`; in Vercel they run in-process inside the Next app.

## Authentication

Token007 (AccessToken2) — generated from `APP_ID` + `APP_CERTIFICATE` only. No API_KEY/API_SECRET needed. The SDK handles token generation and API auth internally.
Token007 (AccessToken2) — generated from `AGORA_APP_ID` + `AGORA_APP_CERTIFICATE` only. No API_KEY/API_SECRET needed. The SDK handles token generation and API auth internally.

## Detailed Documentation

Expand Down
Loading