add Llama Stack quickstart guide and notebook demo#107
add Llama Stack quickstart guide and notebook demo#107
Conversation
WalkthroughAdds three new documentation artifacts: a user guide for creating an AI agent with LlamaStack, a sample YAML stack configuration, and a Jupyter quickstart notebook demonstrating server startup, agent/tool setup, streaming, session handling, and a FastAPI chat endpoint. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Notebook as FastAPI\n(Notebook/API)
participant Agent as Agent
participant Model as Model\n(LLM)
participant Tool as Tool\n(get_weather)
User->>Notebook: POST /chat {message}
Notebook->>Agent: Start session / enqueue message
Agent->>Model: Request response (streaming)
Agent->>Tool: Invoke get_weather(...) if tool required
Tool-->>Agent: Return tool result
Model-->>Agent: Stream tokens/results
Agent-->>Notebook: Stream aggregated response
Notebook-->>User: Stream partial/final responses
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/public/llama-stack/llama_stack_config.yaml`:
- Around line 1-60: The metadata_store block omits an explicit db_path; add a
db_path entry to metadata_store mirroring the pattern used for vector_io and
files so it reads metadata_store: type: sqlite and db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/llama-stack-demo}/registry.db
(update the metadata_store section in the YAML to include this db_path key).
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 462-467: Update the notebook metadata kernelspec so the kernel
name and display_name reflect the Llama Stack quickstart (e.g., change
kernelspec.name from "langchain-demo" and kernelspec.display_name from "Python
(langchain-demo)" to a clearer identifier like "llama-stack" and "Python (Llama
Stack)" respectively) by editing the kernelspec block in the notebook metadata.
- Around line 122-148: The docstring for get_weather promises wind speed but the
returned dict only contains city, temperature, and humidity; update the function
to include wind speed by extracting it from the parsed API response (e.g.,
current['windspeedKmph'] or current['windspeedMiles'] depending on desired
units) and add a 'wind_speed' key to the returned dictionary, or alternatively
remove the "wind speed" mention from the docstring to make it match the existing
return value.
- Around line 194-208: Agent creation uses model_id which may be undefined if
the model listing try block failed; move the Agent(...) creation (the Agent
instantiation that references model_id, client, get_weather and instructions)
inside the try block that sets model_id or add an early exit/conditional guard
after the except (e.g., return or raise) so Agent(...) is only called when
model_id is successfully set; ensure you reference the same Agent(...) call and
the model_id assignment to relocate or gate the creation.
🧹 Nitpick comments (2)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)
41-44: Consider varying the link descriptions.All four resource links begin with "Llama Stack", which creates repetition. You could vary the wording:
💡 Suggested rewording
-- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts. -- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management. -- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers. -- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios. +- [Official Documentation](https://llamastack.github.io/docs) - Covers all usage-related topics, API providers, and core concepts. +- [Core Concepts Guide](https://llamastack.github.io/docs/concepts) - Deep dive into architecture, API stability, and resource management. +- [GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, and distribution configurations. +- [Example Applications](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating various use cases.docs/public/llama-stack/llama_stack_quickstart.ipynb (1)
325-343: Consider session management for the chat endpoint.The
/chatendpoint creates a new session for every request (line 328). For a demo this works, but in production:
- Sessions accumulate without cleanup
- Conversation context is lost between requests
For a production-ready version, consider reusing sessions or implementing session cleanup:
# Option 1: Single shared session (simple approach) _session_id = None `@api_app.post`("/chat") async def chat(request: ChatRequest): global _session_id if _session_id is None: _session_id = agent.create_session('fastapi-weather-session') # ... rest of the code using _session_id
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.mddocs/public/llama-stack/llama_stack_config.yamldocs/public/llama-stack/llama_stack_quickstart.ipynb
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T11:25:34.596Z
Learnt from: jing2uo
Repo: alauda/knowledge PR: 104
File: docs/en/solutions/How_to_Migrate_VirtualMachine_From_VMware.md:131-172
Timestamp: 2026-01-13T11:25:34.596Z
Learning: In VMware migration documentation (docs/en/solutions), when describing the Forklift Operator workflow for VMware, specify that the VMware provider secret should set insecureSkipVerify=true to accommodate self-signed certificates commonly used in enterprise vCenter/ESXi environments. Include a note on the security trade-offs and ensure readers understand this is for internal, controlled environments.
Applied to files:
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
📚 Learning: 2025-12-30T08:43:22.275Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 95
File: docs/public/langchain/langchain_quickstart.ipynb:163-171
Timestamp: 2025-12-30T08:43:22.275Z
Learning: When reviewing LangChain notebooks, validate the agent creation pattern: ensure code uses create_agent(model=..., tools=[...], system_prompt="...") with model being a chat model instance (e.g., ChatOpenAI) and system_prompt provided as a string describing system instructions. This reflects a standard agent construction and can be applied across LangChain notebook files.
Applied to files:
docs/public/llama-stack/llama_stack_quickstart.ipynb
🪛 LanguageTool
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... stability, and resource management. - [Llama Stack GitHub Repository](https://github...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s, and how to add new API providers. - [Llama Stack Example Apps](https://github.com/...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.11)
docs/public/llama-stack/llama_stack_quickstart.ipynb
42-42: Do not catch blind exception: Exception
(BLE001)
43-43: Use explicit conversion flag
Replace with conversion flag
(RUF010)
59-59: Abstract raise to an inner function
(TRY301)
59-59: Create your own exception
(TRY002)
59-59: Avoid specifying long messages outside the exception class
(TRY003)
73-73: Abstract raise to an inner function
(TRY301)
73-73: Create your own exception
(TRY002)
73-73: Avoid specifying long messages outside the exception class
(TRY003)
78-78: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)
1-44: Documentation is clear and comprehensive.The guide covers all essential aspects: overview, prerequisites, quickstart instructions, and additional resources. The structure is logical and user-friendly.
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
85-106: LGTM! Import setup is appropriate for notebook environments.The custom path manipulation to include
~/packagesaligns with the--target ~/packagesflag used in the pip install cell, which is a valid pattern for restricted notebook environments.
358-384: LGTM! Server startup approach is appropriate for notebook demo.The daemon thread approach with clear comments about production alternatives is well-documented. Using
daemon=Trueensures cleanup when the kernel restarts.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| "except Exception as e:\n", | ||
| " print(f'Failed to get model list: {e}')\n", | ||
| " print('Make sure the server is running')\n", | ||
| "\n", | ||
| "\n", | ||
| "# Create Agent\n", | ||
| "print('Creating Agent...')\n", | ||
| "agent = Agent(\n", | ||
| " client,\n", | ||
| " model=model_id,\n", | ||
| " instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n", | ||
| " tools=[get_weather],\n", | ||
| ")\n", | ||
| "\n", | ||
| "print('Agent created successfully')" |
There was a problem hiding this comment.
Agent creation may fail if model listing failed.
The agent creation at lines 199-208 uses model_id which is only defined inside the try block (line 191). If the model listing fails, model_id will be undefined and agent creation will raise a NameError.
🔧 Suggested fix: Move agent creation inside the try block or add early exit
except Exception as e:
print(f'Failed to get model list: {e}')
print('Make sure the server is running')
+ raise # Re-raise to prevent subsequent cells from failing
# Create AgentOr wrap agent creation in a conditional:
+if 'model_id' in dir():
# Create Agent
print('Creating Agent...')
agent = Agent(
client,
model=model_id,
instructions='You are a helpful weather assistant...',
tools=[get_weather],
)
print('Agent created successfully')
+else:
+ print('Skipping agent creation - no model available')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "except Exception as e:\n", | |
| " print(f'Failed to get model list: {e}')\n", | |
| " print('Make sure the server is running')\n", | |
| "\n", | |
| "\n", | |
| "# Create Agent\n", | |
| "print('Creating Agent...')\n", | |
| "agent = Agent(\n", | |
| " client,\n", | |
| " model=model_id,\n", | |
| " instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n", | |
| " tools=[get_weather],\n", | |
| ")\n", | |
| "\n", | |
| "print('Agent created successfully')" | |
| except Exception as e: | |
| print(f'Failed to get model list: {e}') | |
| print('Make sure the server is running') | |
| raise # Re-raise to prevent subsequent cells from failing | |
| # Create Agent | |
| print('Creating Agent...') | |
| agent = Agent( | |
| client, | |
| model=model_id, | |
| instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.', | |
| tools=[get_weather], | |
| ) | |
| print('Agent created successfully') |
🤖 Prompt for AI Agents
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb` around lines 194 - 208,
Agent creation uses model_id which may be undefined if the model listing try
block failed; move the Agent(...) creation (the Agent instantiation that
references model_id, client, get_weather and instructions) inside the try block
that sets model_id or add an early exit/conditional guard after the except
(e.g., return or raise) so Agent(...) is only called when model_id is
successfully set; ensure you reference the same Agent(...) call and the model_id
assignment to relocate or gate the creation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-70: Remove the leading space in the environment variable value
for VLLM_URL so the provider URL is correct; locate the VLLM_URL entry (the
name: VLLM_URL and its value field) and change the value from "
https://api.deepseek.com/v1" to "https://api.deepseek.com/v1" (no leading
whitespace).
🧹 Nitpick comments (2)
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
129-148: Docstring promises “description” but the return payload omits it.Either remove “description” from the docstring or include it in the returned dict to avoid confusing tool consumers.
♻️ Proposed fix (add description)
- return { - 'city': city, - 'temperature': f"{current['temp_C']}°C", - 'humidity': f"{current['humidity']}%", - } + description = current.get('weatherDesc', [{}])[0].get('value', '') + return { + 'city': city, + 'temperature': f"{current['temp_C']}°C", + 'description': description, + 'humidity': f"{current['humidity']}%", + }
199-202: Prefer bareraiseto preserve traceback.
raise eresets the traceback and makes debugging harder.♻️ Proposed fix
- raise e + raise
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 68-73: Replace the inline VLLM_API_TOKEN value with a Kubernetes
Secret reference instead of plaintext: update the env var entry for
VLLM_API_TOKEN in the LlamaStackDistribution CRD/spec so it uses
valueFrom.secretKeyRef (referencing your Secret name and key) rather than
setting value: XXX; ensure the Secret contains the token and that the container
spec (where VLLM_URL, VLLM_MAX_TOKENS, and VLLM_API_TOKEN are defined)
references that secret via valueFrom.secretKeyRef to securely inject the token
at runtime.
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 128-150: The get_weather function builds a wttr.in URL using the
raw city string which breaks for spaces and non-ASCII characters; update
get_weather to percent-encode the city before interpolating into url (e.g., use
urllib.parse.quote or quote_plus or pass city as a query/path-encoded parameter)
so city names like "New York" and Unicode names are valid; ensure the encoding
is applied to the city variable used in the url =
f'https://wttr.in/{city}?format=j1' construction and keep the existing
timeout/response handling.
- Around line 331-349: The endpoint is defined as async but calls blocking
functions (agent.create_turn and AgentEventLogger.log); change the FastAPI route
handler from "async def chat(request: ChatRequest)" to a synchronous "def
chat(request: ChatRequest)" so FastAPI runs it in a threadpool, keep the body
logic the same (call agent.create_turn(...) and iterate
logger.log(response_stream) directly) and remove any awaits or async-only
constructs; ensure the decorator remains `@api_app.post`("/chat") and the function
name chat, and keep returning the {"response": full_response} dict.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md`:
- Around line 116-120: The extraction step uses tar with --strip-components=1
into ~/python312 but doesn't ensure the target directory exists; update the
documentation step that currently shows "tar -xzf /tmp/python312.tar.gz -C
~/python312 --strip-components=1" to create the directory first (use mkdir -p
~/python312) before running the tar command so extraction won't fail.
♻️ Duplicate comments (2)
docs/public/llama-stack/llama_stack_quickstart.ipynb (2)
137-139: URL-encode the city parameter to handle spaces and unicode characters.City names like "New York" or non-ASCII names will produce invalid URLs. Use
urllib.parse.quoteto encode the city before interpolating into the URL.🔧 Proposed fix
+ from urllib.parse import quote - url = f'https://wttr.in/{city}?format=j1' + url = f'https://wttr.in/{quote(city)}?format=j1'
331-349: Use a sync endpoint instead ofasyncfor blocking I/O.
agent.create_session(),agent.create_turn(), andAgentEventLogger.log()are synchronous blocking calls. Usingasync defhere blocks the event loop and prevents concurrent request handling. Change to a syncdefendpoint—FastAPI will automatically run it in a threadpool.🔧 Proposed fix
`@api_app.post`("/chat") -async def chat(request: ChatRequest): +def chat(request: ChatRequest): """Chat endpoint that uses the Llama Stack Agent"""
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.