draft: eval summaries by reybahl · Pull Request #387 · coursetable/ferry

reybahl · 2026-02-12T14:00:18Z

Summary by CodeRabbit

Release Notes

New Features
- Added automated evaluation summarization to generate concise summaries of course evaluation comments.
- New command-line options (--summarize-evals and --openai-api-key) to enable and configure summarization.
- New GitHub Actions workflow to automatically summarize evaluations for specified seasons and commit results.
- Development and production configuration files for evaluation summarization workflows.

coderabbitai · 2026-02-12T14:00:36Z

📝 Walkthrough

Walkthrough

A new evaluation summarization feature is introduced that leverages OpenAI to generate concise summaries of course evaluation narratives. This includes a GitHub Actions workflow for automated summarization, configuration files for development and production modes, extended argument parsing to accept API credentials, a new asynchronous summarization module with concurrent request handling, integration into the main flow, and a new OpenAI dependency.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow `.github/workflows/summarize-evals.yml`	New workflow that accepts seasons input, checks out repositories, sets up Python/uv environment, runs summarization script with OpenAI API integration, and commits generated evaluation summaries back to data repository.
Configuration Files `config/dev_summarize_evals.yml`, `config/release_summarize_evals.yml`	New configuration files for development and production environments; both enable evaluation summarization with dev targeting LATEST_1 season and enabling crawl, while release marks release mode true.
Argument Parser Extensions `ferry/args_parser.py`	Extended RawArgs and Args classes with two new fields: `openai_api_key` (str \| None) and `summarize_evals` (bool); argument parser updated to accept `--openai-api-key` and `--summarize-evals` flags; environment-based initialization reads OPENAI_API_KEY and prompts for input if missing when summarization is enabled.
Summarization Module `ferry/summarize/__init__.py`, `ferry/summarize/summarize_evals.py`	New module implementing AI-assisted evaluation summarization; core function `summarize_evals()` reads parsed evaluations per season, uses OpenAI (default gpt-4-mini) to generate 2–4 sentence summaries per narrative question, enforces minimum comment threshold (3 comments), performs concurrent summarization with semaphore-based throttling (max 10 concurrent requests), handles errors gracefully, and persists results to evaluation_summaries/{season}.json.
Main Flow Integration `main.py`	Updated to import and call `summarize_evals`; `start_crawl()` now has explicit return type `list[str]` and returns seasons; main flow captures returned seasons and invokes summarization when `--summarize-evals` flag is set, with assertion ensuring OpenAI API key is provided.
Dependencies `pyproject.toml`	Adds new runtime dependency `openai>=1.0.0` for OpenAI API integration.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Main as main.py
    participant Crawler as start_crawl()
    participant Summarizer as summarize_evals()
    participant FS as File System
    participant OpenAI as OpenAI API
    participant DataRepo as data repository

    GHA->>GHA: Checkout code & data repos
    GHA->>GHA: Setup Python 3.12 & uv
    GHA->>Main: Run with seasons input & OPENAI_API_KEY
    Main->>Crawler: Call start_crawl(args)
    Crawler->>FS: Load course data
    Crawler->>FS: Crawl season data
    Crawler->>Main: Return seasons: list[str]
    Main->>Main: Assert openai_api_key provided
    Main->>Summarizer: Call summarize_evals(seasons, data_dir, api_key)
    loop For each season
        Summarizer->>FS: Load parsed_evaluations/{season}.json
        Summarizer->>FS: Load existing evaluation_summaries/{season}.json
        loop For each course with narratives
            Summarizer->>OpenAI: Concurrent requests (max 10) to summarize comments
            OpenAI-->>Summarizer: Summary responses
        end
        Summarizer->>FS: Merge & write evaluation_summaries/{season}.json
    end
    Summarizer-->>Main: Completion
    GHA->>FS: Collect evaluation_summaries/*.json
    GHA->>DataRepo: Commit generated summaries via git-auto-commit-action

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Evaluations Summarized, Swift and Bright

A rabbit hops through feedback vast,
With OpenAI to hold it fast,
Ten concurrent tasks in flight,
Course summaries—polished, clean, and tight!
No more endless comments to parse,
Just wisdom, distilled and alas... surpassed! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'draft: eval summaries' is vague and generic, using the word 'draft' and abbreviation 'eval summaries' without clearly conveying the specific functionality being added (OpenAI-based evaluation summarization pipeline).	Consider a more descriptive title such as 'Add OpenAI-based evaluation summarization pipeline' or 'Implement automated course evaluation summaries with GPT-4' that clearly indicates the main feature being introduced.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch eval-summaries

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

.github/workflows/summarize-evals.yml

+    runs-on: ubuntu-latest
+    timeout-minutes: 120
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/checkout@v4
+        with:
+          repository: coursetable/ferry-data
+          path: data
+          ssh-key: ${{ secrets.REPO_SSH_KEY }}
+          fetch-depth: 1
+
+      - name: Install Packages
+        run: |
+          sudo apt-get update
+          sudo apt-get -y install gcc g++ musl-dev libffi-dev libpq-dev libhdf5-dev python3-tables graphviz libgraphviz-dev
+
+      - name: Set up Python 3.12
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.12"
+
+      - name: Install uv and dependencies
+        run: |
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          uv venv .venv
+          echo "VIRTUAL_ENV=.venv" >> $GITHUB_ENV
+          echo "$PWD/.venv/bin" >> $GITHUB_PATH
+          uv pip install -e .
+
+      - name: Summarize Evaluations
+        env:
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+        run: |
+          python main.py \
+            -f config/release_summarize_evals.yml \
+            --seasons ${{ inputs.seasons }} \
+            --sentry-url "${{ secrets.SENTRY_URL }}"
+
+      - name: Commit Evaluation Summaries
+        uses: stefanzweifel/git-auto-commit-action@v5
+        with:
+          commit_message: "feat: summarize evals for ${{ inputs.seasons }}"
+          commit_user_name: "Ferry Bot"
+          commit_user_email: "coursetable.at.yale@gmail.com"
+          commit_author: course-table <course-table@users.noreply.github.com>
+          repository: data
+          file_pattern: "evaluation_summaries/*.json"
+          skip_dirty_check: false


In general, the fix is to explicitly declare GITHUB_TOKEN permissions in the workflow, granting only what the job requires. This workflow needs to read and write repository contents (for actions/checkout and git-auto-commit-action), and it doesn’t appear to require any other scopes like pull-requests, issues, or packages. The simplest least-privilege configuration at the job level is therefore permissions: contents: write.

The best minimally invasive fix without changing functionality is to add a permissions block under the summarize_evals job (the job flagged by CodeQL). This keeps the scope limited to this job and does not affect other workflows. Concretely, in .github/workflows/summarize-evals.yml, add:

jobs: summarize_evals: runs-on: ubuntu-latest permissions: contents: write timeout-minutes: 120 ...

No new imports, methods, or other definitions are required because this is a YAML workflow configuration change only.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @.github/workflows/summarize-evals.yml:
- Around line 29-32: Update the GitHub Actions step that currently uses
actions/setup-python@v4 to the recommended v5; locate the workflow step with the
name "Set up Python 3.12" and change the uses value from actions/setup-python@v4
to actions/setup-python@v5 while keeping the existing python-version: "3.12"
configuration intact.
- Around line 42-49: The "Summarize Evaluations" workflow step currently
interpolates `${{ inputs.seasons }}` directly into the shell which risks command
injection; change the step to set an environment variable (e.g., SEASONS) from
`${{ inputs.seasons }}` and then call python main.py with --seasons "$SEASONS"
(preserving other envs like OPENAI_API_KEY and SENTRY_URL), ensuring the value
is passed as a single quoted/escaped argument instead of being expanded into the
shell command.
- Around line 10-13: Add an explicit least-privilege permissions block to the
summarize_evals job to avoid default broad permissions; inside the job
definition (jobs -> summarize_evals) add a permissions mapping (for example at
minimum permissions: contents: read) or other exact scopes the job requires so
the workflow no longer relies on default permissions.

In `@ferry/summarize/summarize_evals.py`:
- Around line 83-94: The code assumes response.choices[0].message.content is
always a string and calls .strip(), which can raise AttributeError if content is
None; update the return logic in summarize_evals.py (the block using
openai_client.chat.completions.create and returning
response.choices[0].message.content.strip()) to safely handle None by checking
the value (e.g., assign content = response.choices[0].message.content or "" /
fallback text) and then call .strip() on the confirmed string, optionally
logging or raising a clear error if content is unexpectedly missing.

🧹 Nitpick comments (3)

ferry/summarize/summarize_evals.py (1)
14-14: Unused import: ujson.

The ujson import appears unused in this module. JSON operations are handled by the imported load_cache_json and save_cache_json functions from ferry.crawler.cache.
♻️ Proposed fix
-import ujson
main.py (1)
136-142: Consider using ValueError instead of assert for runtime validation.

Using assert for input validation can be bypassed if Python runs with -O (optimizations enabled). A ValueError provides a cleaner user experience and is more robust.
♻️ Proposed fix
     if args.summarize_evals:
-        assert args.openai_api_key, "OpenAI API key is required for --summarize-evals"
+        if not args.openai_api_key:
+            raise ValueError("OpenAI API key is required for --summarize-evals")
         await summarize_evals(
             seasons=seasons,
             data_dir=args.data_dir,
             openai_api_key=args.openai_api_key,
         )
pyproject.toml (1)

30-30: Consider pinning the openai version for consistency and reproducibility.

All other dependencies use exact version pinning (e.g., ==4.13.3), but openai>=1.0.0 uses a minimum version constraint. This inconsistency could lead to non-reproducible builds across environments—if a future OpenAI SDK release introduces behavioral changes, different versions might be installed in different builds. Either pin to a specific version matching your requirements or document the intentional flexibility if loose pinning is necessary for compatibility.

coderabbitai · 2026-02-12T14:05:56Z

.github/workflows/summarize-evals.yml

+jobs:
+  summarize_evals:
+    runs-on: ubuntu-latest
+    timeout-minutes: 120


⚠️ Potential issue | 🟡 Minor

Add explicit permissions block for security.

The workflow lacks a permissions block, which means it uses default (potentially broad) permissions. CodeQL correctly flags this as a security concern.

🛡️ Proposed fix

jobs: summarize_evals: runs-on: ubuntu-latest timeout-minutes: 120 + permissions: + contents: read steps:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

jobs:

summarize_evals:

runs-on: ubuntu-latest

timeout-minutes: 120

jobs:

summarize_evals:

runs-on: ubuntu-latest

timeout-minutes: 120

permissions:

contents: read

🤖 Prompt for AI Agents

In @.github/workflows/summarize-evals.yml around lines 10 - 13, Add an explicit least-privilege permissions block to the summarize_evals job to avoid default broad permissions; inside the job definition (jobs -> summarize_evals) add a permissions mapping (for example at minimum permissions: contents: read) or other exact scopes the job requires so the workflow no longer relies on default permissions.

coderabbitai · 2026-02-12T14:05:56Z

.github/workflows/summarize-evals.yml

+      - name: Set up Python 3.12
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.12"


⚠️ Potential issue | 🟡 Minor

Update actions/setup-python to v5.

Static analysis indicates that actions/setup-python@v4 is outdated. Version 5 is the current recommended version.

🔧 Proposed fix

- name: Set up Python 3.12 - uses: actions/setup-python@v4 + uses: actions/setup-python@v5 with: python-version: "3.12"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Set up Python 3.12

uses: actions/setup-python@v4

with:

python-version: "3.12"

- name: Set up Python 3.12

uses: actions/setup-python@v5

with:

python-version: "3.12"

🧰 Tools

🪛 actionlint (1.7.10)

[error] 30-30: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In @.github/workflows/summarize-evals.yml around lines 29 - 32, Update the GitHub Actions step that currently uses actions/setup-python@v4 to the recommended v5; locate the workflow step with the name "Set up Python 3.12" and change the uses value from actions/setup-python@v4 to actions/setup-python@v5 while keeping the existing python-version: "3.12" configuration intact.

coderabbitai · 2026-02-12T14:05:56Z

.github/workflows/summarize-evals.yml

+      - name: Summarize Evaluations
+        env:
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+        run: |
+          python main.py \
+            -f config/release_summarize_evals.yml \
+            --seasons ${{ inputs.seasons }} \
+            --sentry-url "${{ secrets.SENTRY_URL }}"


⚠️ Potential issue | 🟠 Major

Potential command injection via inputs.seasons.

The ${{ inputs.seasons }} is directly interpolated into the shell command without sanitization. A malicious input like ; rm -rf / could be injected. Use an environment variable to safely pass the input.

🛡️ Proposed fix

- name: Summarize Evaluations env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} + SEASONS: ${{ inputs.seasons }} run: | python main.py \ -f config/release_summarize_evals.yml \ - --seasons ${{ inputs.seasons }} \ + --seasons $SEASONS \ --sentry-url "${{ secrets.SENTRY_URL }}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Summarize Evaluations

env:

OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

run: |

python main.py \

-f config/release_summarize_evals.yml \

--seasons ${{ inputs.seasons }} \

--sentry-url "${{ secrets.SENTRY_URL }}"

- name: Summarize Evaluations

env:

OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

SEASONS: ${{ inputs.seasons }}

run: |

python main.py \

-f config/release_summarize_evals.yml \

--seasons $SEASONS \

--sentry-url "${{ secrets.SENTRY_URL }}"

🤖 Prompt for AI Agents

In @.github/workflows/summarize-evals.yml around lines 42 - 49, The "Summarize Evaluations" workflow step currently interpolates `${{ inputs.seasons }}` directly into the shell which risks command injection; change the step to set an environment variable (e.g., SEASONS) from `${{ inputs.seasons }}` and then call python main.py with --seasons "$SEASONS" (preserving other envs like OPENAI_API_KEY and SENTRY_URL), ensuring the value is passed as a single quoted/escaped argument instead of being expanded into the shell command.

coderabbitai · 2026-02-12T14:05:56Z

ferry/summarize/summarize_evals.py

+    async with semaphore:
+        response = await openai_client.chat.completions.create(
+            model=OPENAI_MODEL,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_content},
+            ],
+            temperature=0.3,
+            max_tokens=300,
+        )
+
+    return response.choices[0].message.content.strip()


⚠️ Potential issue | 🟡 Minor

Handle potential None content from API response.

response.choices[0].message.content can be None in some edge cases (e.g., if the model declines to respond). Calling .strip() on None would raise an AttributeError.

🛡️ Proposed fix

- return response.choices[0].message.content.strip() + content = response.choices[0].message.content + return content.strip() if content else ""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async with semaphore:

response = await openai_client.chat.completions.create(

model=OPENAI_MODEL,

messages=[

{"role": "system", "content": SYSTEM_PROMPT},

{"role": "user", "content": user_content},

],

temperature=0.3,

max_tokens=300,

)

return response.choices[0].message.content.strip()

async with semaphore:

response = await openai_client.chat.completions.create(

model=OPENAI_MODEL,

messages=[

{"role": "system", "content": SYSTEM_PROMPT},

{"role": "user", "content": user_content},

],

temperature=0.3,

max_tokens=300,

)

content = response.choices[0].message.content

return content.strip() if content else ""

🤖 Prompt for AI Agents

In `@ferry/summarize/summarize_evals.py` around lines 83 - 94, The code assumes response.choices[0].message.content is always a string and calls .strip(), which can raise AttributeError if content is None; update the return logic in summarize_evals.py (the block using openai_client.chat.completions.create and returning response.choices[0].message.content.strip()) to safely handle None by checking the value (e.g., assign content = response.choices[0].message.content or "" / fallback text) and then call .strip() on the confirmed string, optionally logging or raising a clear error if content is unexpectedly missing.

draft: eval summaries

dd7d821

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: eval summaries#387

draft: eval summaries#387
reybahl wants to merge 1 commit intomasterfrom
eval-summaries

reybahl commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

Check warning

Copilot Autofix

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reybahl commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

Check warning

Copilot Autofix

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

reybahl commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading