Conversation
📝 WalkthroughWalkthroughA new evaluation summarization feature is introduced that leverages OpenAI to generate concise summaries of course evaluation narratives. This includes a GitHub Actions workflow for automated summarization, configuration files for development and production modes, extended argument parsing to accept API credentials, a new asynchronous summarization module with concurrent request handling, integration into the main flow, and a new OpenAI dependency. Changes
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant Main as main.py
participant Crawler as start_crawl()
participant Summarizer as summarize_evals()
participant FS as File System
participant OpenAI as OpenAI API
participant DataRepo as data repository
GHA->>GHA: Checkout code & data repos
GHA->>GHA: Setup Python 3.12 & uv
GHA->>Main: Run with seasons input & OPENAI_API_KEY
Main->>Crawler: Call start_crawl(args)
Crawler->>FS: Load course data
Crawler->>FS: Crawl season data
Crawler->>Main: Return seasons: list[str]
Main->>Main: Assert openai_api_key provided
Main->>Summarizer: Call summarize_evals(seasons, data_dir, api_key)
loop For each season
Summarizer->>FS: Load parsed_evaluations/{season}.json
Summarizer->>FS: Load existing evaluation_summaries/{season}.json
loop For each course with narratives
Summarizer->>OpenAI: Concurrent requests (max 10) to summarize comments
OpenAI-->>Summarizer: Summary responses
end
Summarizer->>FS: Merge & write evaluation_summaries/{season}.json
end
Summarizer-->>Main: Completion
GHA->>FS: Collect evaluation_summaries/*.json
GHA->>DataRepo: Commit generated summaries via git-auto-commit-action
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Comment |
| runs-on: ubuntu-latest | ||
| timeout-minutes: 120 | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/checkout@v4 | ||
| with: | ||
| repository: coursetable/ferry-data | ||
| path: data | ||
| ssh-key: ${{ secrets.REPO_SSH_KEY }} | ||
| fetch-depth: 1 | ||
|
|
||
| - name: Install Packages | ||
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get -y install gcc g++ musl-dev libffi-dev libpq-dev libhdf5-dev python3-tables graphviz libgraphviz-dev | ||
| - name: Set up Python 3.12 | ||
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Install uv and dependencies | ||
| run: | | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
| uv venv .venv | ||
| echo "VIRTUAL_ENV=.venv" >> $GITHUB_ENV | ||
| echo "$PWD/.venv/bin" >> $GITHUB_PATH | ||
| uv pip install -e . | ||
| - name: Summarize Evaluations | ||
| env: | ||
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
| run: | | ||
| python main.py \ | ||
| -f config/release_summarize_evals.yml \ | ||
| --seasons ${{ inputs.seasons }} \ | ||
| --sentry-url "${{ secrets.SENTRY_URL }}" | ||
| - name: Commit Evaluation Summaries | ||
| uses: stefanzweifel/git-auto-commit-action@v5 | ||
| with: | ||
| commit_message: "feat: summarize evals for ${{ inputs.seasons }}" | ||
| commit_user_name: "Ferry Bot" | ||
| commit_user_email: "coursetable.at.yale@gmail.com" | ||
| commit_author: course-table <course-table@users.noreply.github.com> | ||
| repository: data | ||
| file_pattern: "evaluation_summaries/*.json" | ||
| skip_dirty_check: false |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
In general, the fix is to explicitly declare GITHUB_TOKEN permissions in the workflow, granting only what the job requires. This workflow needs to read and write repository contents (for actions/checkout and git-auto-commit-action), and it doesn’t appear to require any other scopes like pull-requests, issues, or packages. The simplest least-privilege configuration at the job level is therefore permissions: contents: write.
The best minimally invasive fix without changing functionality is to add a permissions block under the summarize_evals job (the job flagged by CodeQL). This keeps the scope limited to this job and does not affect other workflows. Concretely, in .github/workflows/summarize-evals.yml, add:
jobs:
summarize_evals:
runs-on: ubuntu-latest
permissions:
contents: write
timeout-minutes: 120
...No new imports, methods, or other definitions are required because this is a YAML workflow configuration change only.
| @@ -10,6 +10,8 @@ | ||
| jobs: | ||
| summarize_evals: | ||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| contents: write | ||
| timeout-minutes: 120 | ||
| steps: | ||
| - uses: actions/checkout@v4 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.github/workflows/summarize-evals.yml:
- Around line 29-32: Update the GitHub Actions step that currently uses
actions/setup-python@v4 to the recommended v5; locate the workflow step with the
name "Set up Python 3.12" and change the uses value from actions/setup-python@v4
to actions/setup-python@v5 while keeping the existing python-version: "3.12"
configuration intact.
- Around line 42-49: The "Summarize Evaluations" workflow step currently
interpolates `${{ inputs.seasons }}` directly into the shell which risks command
injection; change the step to set an environment variable (e.g., SEASONS) from
`${{ inputs.seasons }}` and then call python main.py with --seasons "$SEASONS"
(preserving other envs like OPENAI_API_KEY and SENTRY_URL), ensuring the value
is passed as a single quoted/escaped argument instead of being expanded into the
shell command.
- Around line 10-13: Add an explicit least-privilege permissions block to the
summarize_evals job to avoid default broad permissions; inside the job
definition (jobs -> summarize_evals) add a permissions mapping (for example at
minimum permissions: contents: read) or other exact scopes the job requires so
the workflow no longer relies on default permissions.
In `@ferry/summarize/summarize_evals.py`:
- Around line 83-94: The code assumes response.choices[0].message.content is
always a string and calls .strip(), which can raise AttributeError if content is
None; update the return logic in summarize_evals.py (the block using
openai_client.chat.completions.create and returning
response.choices[0].message.content.strip()) to safely handle None by checking
the value (e.g., assign content = response.choices[0].message.content or "" /
fallback text) and then call .strip() on the confirmed string, optionally
logging or raising a clear error if content is unexpectedly missing.
🧹 Nitpick comments (3)
ferry/summarize/summarize_evals.py (1)
14-14: Unused import:ujson.The
ujsonimport appears unused in this module. JSON operations are handled by the importedload_cache_jsonandsave_cache_jsonfunctions fromferry.crawler.cache.♻️ Proposed fix
-import ujsonmain.py (1)
136-142: Consider usingValueErrorinstead ofassertfor runtime validation.Using
assertfor input validation can be bypassed if Python runs with-O(optimizations enabled). AValueErrorprovides a cleaner user experience and is more robust.♻️ Proposed fix
if args.summarize_evals: - assert args.openai_api_key, "OpenAI API key is required for --summarize-evals" + if not args.openai_api_key: + raise ValueError("OpenAI API key is required for --summarize-evals") await summarize_evals( seasons=seasons, data_dir=args.data_dir, openai_api_key=args.openai_api_key, )pyproject.toml (1)
30-30: Consider pinning theopenaiversion for consistency and reproducibility.All other dependencies use exact version pinning (e.g.,
==4.13.3), butopenai>=1.0.0uses a minimum version constraint. This inconsistency could lead to non-reproducible builds across environments—if a future OpenAI SDK release introduces behavioral changes, different versions might be installed in different builds. Either pin to a specific version matching your requirements or document the intentional flexibility if loose pinning is necessary for compatibility.
| jobs: | ||
| summarize_evals: | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 120 |
There was a problem hiding this comment.
Add explicit permissions block for security.
The workflow lacks a permissions block, which means it uses default (potentially broad) permissions. CodeQL correctly flags this as a security concern.
🛡️ Proposed fix
jobs:
summarize_evals:
runs-on: ubuntu-latest
timeout-minutes: 120
+ permissions:
+ contents: read
steps:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| jobs: | |
| summarize_evals: | |
| runs-on: ubuntu-latest | |
| timeout-minutes: 120 | |
| jobs: | |
| summarize_evals: | |
| runs-on: ubuntu-latest | |
| timeout-minutes: 120 | |
| permissions: | |
| contents: read |
🤖 Prompt for AI Agents
In @.github/workflows/summarize-evals.yml around lines 10 - 13, Add an explicit
least-privilege permissions block to the summarize_evals job to avoid default
broad permissions; inside the job definition (jobs -> summarize_evals) add a
permissions mapping (for example at minimum permissions: contents: read) or
other exact scopes the job requires so the workflow no longer relies on default
permissions.
| - name: Set up Python 3.12 | ||
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: "3.12" |
There was a problem hiding this comment.
Update actions/setup-python to v5.
Static analysis indicates that actions/setup-python@v4 is outdated. Version 5 is the current recommended version.
🔧 Proposed fix
- name: Set up Python 3.12
- uses: actions/setup-python@v4
+ uses: actions/setup-python@v5
with:
python-version: "3.12"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Set up Python 3.12 | |
| uses: actions/setup-python@v4 | |
| with: | |
| python-version: "3.12" | |
| - name: Set up Python 3.12 | |
| uses: actions/setup-python@v5 | |
| with: | |
| python-version: "3.12" |
🧰 Tools
🪛 actionlint (1.7.10)
[error] 30-30: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In @.github/workflows/summarize-evals.yml around lines 29 - 32, Update the
GitHub Actions step that currently uses actions/setup-python@v4 to the
recommended v5; locate the workflow step with the name "Set up Python 3.12" and
change the uses value from actions/setup-python@v4 to actions/setup-python@v5
while keeping the existing python-version: "3.12" configuration intact.
| - name: Summarize Evaluations | ||
| env: | ||
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
| run: | | ||
| python main.py \ | ||
| -f config/release_summarize_evals.yml \ | ||
| --seasons ${{ inputs.seasons }} \ | ||
| --sentry-url "${{ secrets.SENTRY_URL }}" |
There was a problem hiding this comment.
Potential command injection via inputs.seasons.
The ${{ inputs.seasons }} is directly interpolated into the shell command without sanitization. A malicious input like ; rm -rf / could be injected. Use an environment variable to safely pass the input.
🛡️ Proposed fix
- name: Summarize Evaluations
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+ SEASONS: ${{ inputs.seasons }}
run: |
python main.py \
-f config/release_summarize_evals.yml \
- --seasons ${{ inputs.seasons }} \
+ --seasons $SEASONS \
--sentry-url "${{ secrets.SENTRY_URL }}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Summarize Evaluations | |
| env: | |
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | |
| run: | | |
| python main.py \ | |
| -f config/release_summarize_evals.yml \ | |
| --seasons ${{ inputs.seasons }} \ | |
| --sentry-url "${{ secrets.SENTRY_URL }}" | |
| - name: Summarize Evaluations | |
| env: | |
| OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | |
| SEASONS: ${{ inputs.seasons }} | |
| run: | | |
| python main.py \ | |
| -f config/release_summarize_evals.yml \ | |
| --seasons $SEASONS \ | |
| --sentry-url "${{ secrets.SENTRY_URL }}" |
🤖 Prompt for AI Agents
In @.github/workflows/summarize-evals.yml around lines 42 - 49, The "Summarize
Evaluations" workflow step currently interpolates `${{ inputs.seasons }}`
directly into the shell which risks command injection; change the step to set an
environment variable (e.g., SEASONS) from `${{ inputs.seasons }}` and then call
python main.py with --seasons "$SEASONS" (preserving other envs like
OPENAI_API_KEY and SENTRY_URL), ensuring the value is passed as a single
quoted/escaped argument instead of being expanded into the shell command.
| async with semaphore: | ||
| response = await openai_client.chat.completions.create( | ||
| model=OPENAI_MODEL, | ||
| messages=[ | ||
| {"role": "system", "content": SYSTEM_PROMPT}, | ||
| {"role": "user", "content": user_content}, | ||
| ], | ||
| temperature=0.3, | ||
| max_tokens=300, | ||
| ) | ||
|
|
||
| return response.choices[0].message.content.strip() |
There was a problem hiding this comment.
Handle potential None content from API response.
response.choices[0].message.content can be None in some edge cases (e.g., if the model declines to respond). Calling .strip() on None would raise an AttributeError.
🛡️ Proposed fix
- return response.choices[0].message.content.strip()
+ content = response.choices[0].message.content
+ return content.strip() if content else ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async with semaphore: | |
| response = await openai_client.chat.completions.create( | |
| model=OPENAI_MODEL, | |
| messages=[ | |
| {"role": "system", "content": SYSTEM_PROMPT}, | |
| {"role": "user", "content": user_content}, | |
| ], | |
| temperature=0.3, | |
| max_tokens=300, | |
| ) | |
| return response.choices[0].message.content.strip() | |
| async with semaphore: | |
| response = await openai_client.chat.completions.create( | |
| model=OPENAI_MODEL, | |
| messages=[ | |
| {"role": "system", "content": SYSTEM_PROMPT}, | |
| {"role": "user", "content": user_content}, | |
| ], | |
| temperature=0.3, | |
| max_tokens=300, | |
| ) | |
| content = response.choices[0].message.content | |
| return content.strip() if content else "" |
🤖 Prompt for AI Agents
In `@ferry/summarize/summarize_evals.py` around lines 83 - 94, The code assumes
response.choices[0].message.content is always a string and calls .strip(), which
can raise AttributeError if content is None; update the return logic in
summarize_evals.py (the block using openai_client.chat.completions.create and
returning response.choices[0].message.content.strip()) to safely handle None by
checking the value (e.g., assign content = response.choices[0].message.content
or "" / fallback text) and then call .strip() on the confirmed string,
optionally logging or raising a clear error if content is unexpectedly missing.
Summary by CodeRabbit
Release Notes
--summarize-evalsand--openai-api-key) to enable and configure summarization.