Skip to content

draft: eval summaries#387

Open
reybahl wants to merge 1 commit intomasterfrom
eval-summaries
Open

draft: eval summaries#387
reybahl wants to merge 1 commit intomasterfrom
eval-summaries

Conversation

@reybahl
Copy link
Member

@reybahl reybahl commented Feb 12, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added automated evaluation summarization to generate concise summaries of course evaluation comments.
    • New command-line options (--summarize-evals and --openai-api-key) to enable and configure summarization.
    • New GitHub Actions workflow to automatically summarize evaluations for specified seasons and commit results.
    • Development and production configuration files for evaluation summarization workflows.

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

A new evaluation summarization feature is introduced that leverages OpenAI to generate concise summaries of course evaluation narratives. This includes a GitHub Actions workflow for automated summarization, configuration files for development and production modes, extended argument parsing to accept API credentials, a new asynchronous summarization module with concurrent request handling, integration into the main flow, and a new OpenAI dependency.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow
.github/workflows/summarize-evals.yml
New workflow that accepts seasons input, checks out repositories, sets up Python/uv environment, runs summarization script with OpenAI API integration, and commits generated evaluation summaries back to data repository.
Configuration Files
config/dev_summarize_evals.yml, config/release_summarize_evals.yml
New configuration files for development and production environments; both enable evaluation summarization with dev targeting LATEST_1 season and enabling crawl, while release marks release mode true.
Argument Parser Extensions
ferry/args_parser.py
Extended RawArgs and Args classes with two new fields: openai_api_key (str | None) and summarize_evals (bool); argument parser updated to accept --openai-api-key and --summarize-evals flags; environment-based initialization reads OPENAI_API_KEY and prompts for input if missing when summarization is enabled.
Summarization Module
ferry/summarize/__init__.py, ferry/summarize/summarize_evals.py
New module implementing AI-assisted evaluation summarization; core function summarize_evals() reads parsed evaluations per season, uses OpenAI (default gpt-4-mini) to generate 2–4 sentence summaries per narrative question, enforces minimum comment threshold (3 comments), performs concurrent summarization with semaphore-based throttling (max 10 concurrent requests), handles errors gracefully, and persists results to evaluation_summaries/{season}.json.
Main Flow Integration
main.py
Updated to import and call summarize_evals; start_crawl() now has explicit return type list[str] and returns seasons; main flow captures returned seasons and invokes summarization when --summarize-evals flag is set, with assertion ensuring OpenAI API key is provided.
Dependencies
pyproject.toml
Adds new runtime dependency openai>=1.0.0 for OpenAI API integration.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Main as main.py
    participant Crawler as start_crawl()
    participant Summarizer as summarize_evals()
    participant FS as File System
    participant OpenAI as OpenAI API
    participant DataRepo as data repository

    GHA->>GHA: Checkout code & data repos
    GHA->>GHA: Setup Python 3.12 & uv
    GHA->>Main: Run with seasons input & OPENAI_API_KEY
    Main->>Crawler: Call start_crawl(args)
    Crawler->>FS: Load course data
    Crawler->>FS: Crawl season data
    Crawler->>Main: Return seasons: list[str]
    Main->>Main: Assert openai_api_key provided
    Main->>Summarizer: Call summarize_evals(seasons, data_dir, api_key)
    loop For each season
        Summarizer->>FS: Load parsed_evaluations/{season}.json
        Summarizer->>FS: Load existing evaluation_summaries/{season}.json
        loop For each course with narratives
            Summarizer->>OpenAI: Concurrent requests (max 10) to summarize comments
            OpenAI-->>Summarizer: Summary responses
        end
        Summarizer->>FS: Merge & write evaluation_summaries/{season}.json
    end
    Summarizer-->>Main: Completion
    GHA->>FS: Collect evaluation_summaries/*.json
    GHA->>DataRepo: Commit generated summaries via git-auto-commit-action
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Evaluations Summarized, Swift and Bright

A rabbit hops through feedback vast,
With OpenAI to hold it fast,
Ten concurrent tasks in flight,
Course summaries—polished, clean, and tight!
No more endless comments to parse,
Just wisdom, distilled and alas... surpassed! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'draft: eval summaries' is vague and generic, using the word 'draft' and abbreviation 'eval summaries' without clearly conveying the specific functionality being added (OpenAI-based evaluation summarization pipeline). Consider a more descriptive title such as 'Add OpenAI-based evaluation summarization pipeline' or 'Implement automated course evaluation summaries with GPT-4' that clearly indicates the main feature being introduced.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval-summaries

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +12 to +60
runs-on: ubuntu-latest
timeout-minutes: 120
steps:
- uses: actions/checkout@v4

- uses: actions/checkout@v4
with:
repository: coursetable/ferry-data
path: data
ssh-key: ${{ secrets.REPO_SSH_KEY }}
fetch-depth: 1

- name: Install Packages
run: |
sudo apt-get update
sudo apt-get -y install gcc g++ musl-dev libffi-dev libpq-dev libhdf5-dev python3-tables graphviz libgraphviz-dev
- name: Set up Python 3.12
uses: actions/setup-python@v4
with:
python-version: "3.12"

- name: Install uv and dependencies
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv
echo "VIRTUAL_ENV=.venv" >> $GITHUB_ENV
echo "$PWD/.venv/bin" >> $GITHUB_PATH
uv pip install -e .
- name: Summarize Evaluations
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
python main.py \
-f config/release_summarize_evals.yml \
--seasons ${{ inputs.seasons }} \
--sentry-url "${{ secrets.SENTRY_URL }}"
- name: Commit Evaluation Summaries
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "feat: summarize evals for ${{ inputs.seasons }}"
commit_user_name: "Ferry Bot"
commit_user_email: "coursetable.at.yale@gmail.com"
commit_author: course-table <course-table@users.noreply.github.com>
repository: data
file_pattern: "evaluation_summaries/*.json"
skip_dirty_check: false

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 5 days ago

In general, the fix is to explicitly declare GITHUB_TOKEN permissions in the workflow, granting only what the job requires. This workflow needs to read and write repository contents (for actions/checkout and git-auto-commit-action), and it doesn’t appear to require any other scopes like pull-requests, issues, or packages. The simplest least-privilege configuration at the job level is therefore permissions: contents: write.

The best minimally invasive fix without changing functionality is to add a permissions block under the summarize_evals job (the job flagged by CodeQL). This keeps the scope limited to this job and does not affect other workflows. Concretely, in .github/workflows/summarize-evals.yml, add:

jobs:
  summarize_evals:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    timeout-minutes: 120
    ...

No new imports, methods, or other definitions are required because this is a YAML workflow configuration change only.

Suggested changeset 1
.github/workflows/summarize-evals.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/summarize-evals.yml b/.github/workflows/summarize-evals.yml
--- a/.github/workflows/summarize-evals.yml
+++ b/.github/workflows/summarize-evals.yml
@@ -10,6 +10,8 @@
 jobs:
   summarize_evals:
     runs-on: ubuntu-latest
+    permissions:
+      contents: write
     timeout-minutes: 120
     steps:
       - uses: actions/checkout@v4
EOF
@@ -10,6 +10,8 @@
jobs:
summarize_evals:
runs-on: ubuntu-latest
permissions:
contents: write
timeout-minutes: 120
steps:
- uses: actions/checkout@v4
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.github/workflows/summarize-evals.yml:
- Around line 29-32: Update the GitHub Actions step that currently uses
actions/setup-python@v4 to the recommended v5; locate the workflow step with the
name "Set up Python 3.12" and change the uses value from actions/setup-python@v4
to actions/setup-python@v5 while keeping the existing python-version: "3.12"
configuration intact.
- Around line 42-49: The "Summarize Evaluations" workflow step currently
interpolates `${{ inputs.seasons }}` directly into the shell which risks command
injection; change the step to set an environment variable (e.g., SEASONS) from
`${{ inputs.seasons }}` and then call python main.py with --seasons "$SEASONS"
(preserving other envs like OPENAI_API_KEY and SENTRY_URL), ensuring the value
is passed as a single quoted/escaped argument instead of being expanded into the
shell command.
- Around line 10-13: Add an explicit least-privilege permissions block to the
summarize_evals job to avoid default broad permissions; inside the job
definition (jobs -> summarize_evals) add a permissions mapping (for example at
minimum permissions: contents: read) or other exact scopes the job requires so
the workflow no longer relies on default permissions.

In `@ferry/summarize/summarize_evals.py`:
- Around line 83-94: The code assumes response.choices[0].message.content is
always a string and calls .strip(), which can raise AttributeError if content is
None; update the return logic in summarize_evals.py (the block using
openai_client.chat.completions.create and returning
response.choices[0].message.content.strip()) to safely handle None by checking
the value (e.g., assign content = response.choices[0].message.content or "" /
fallback text) and then call .strip() on the confirmed string, optionally
logging or raising a clear error if content is unexpectedly missing.
🧹 Nitpick comments (3)
ferry/summarize/summarize_evals.py (1)

14-14: Unused import: ujson.

The ujson import appears unused in this module. JSON operations are handled by the imported load_cache_json and save_cache_json functions from ferry.crawler.cache.

♻️ Proposed fix
-import ujson
main.py (1)

136-142: Consider using ValueError instead of assert for runtime validation.

Using assert for input validation can be bypassed if Python runs with -O (optimizations enabled). A ValueError provides a cleaner user experience and is more robust.

♻️ Proposed fix
     if args.summarize_evals:
-        assert args.openai_api_key, "OpenAI API key is required for --summarize-evals"
+        if not args.openai_api_key:
+            raise ValueError("OpenAI API key is required for --summarize-evals")
         await summarize_evals(
             seasons=seasons,
             data_dir=args.data_dir,
             openai_api_key=args.openai_api_key,
         )
pyproject.toml (1)

30-30: Consider pinning the openai version for consistency and reproducibility.

All other dependencies use exact version pinning (e.g., ==4.13.3), but openai>=1.0.0 uses a minimum version constraint. This inconsistency could lead to non-reproducible builds across environments—if a future OpenAI SDK release introduces behavioral changes, different versions might be installed in different builds. Either pin to a specific version matching your requirements or document the intentional flexibility if loose pinning is necessary for compatibility.

Comment on lines +10 to +13
jobs:
summarize_evals:
runs-on: ubuntu-latest
timeout-minutes: 120
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit permissions block for security.

The workflow lacks a permissions block, which means it uses default (potentially broad) permissions. CodeQL correctly flags this as a security concern.

🛡️ Proposed fix
 jobs:
   summarize_evals:
     runs-on: ubuntu-latest
     timeout-minutes: 120
+    permissions:
+      contents: read
     steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
summarize_evals:
runs-on: ubuntu-latest
timeout-minutes: 120
jobs:
summarize_evals:
runs-on: ubuntu-latest
timeout-minutes: 120
permissions:
contents: read
🤖 Prompt for AI Agents
In @.github/workflows/summarize-evals.yml around lines 10 - 13, Add an explicit
least-privilege permissions block to the summarize_evals job to avoid default
broad permissions; inside the job definition (jobs -> summarize_evals) add a
permissions mapping (for example at minimum permissions: contents: read) or
other exact scopes the job requires so the workflow no longer relies on default
permissions.

Comment on lines +29 to +32
- name: Set up Python 3.12
uses: actions/setup-python@v4
with:
python-version: "3.12"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update actions/setup-python to v5.

Static analysis indicates that actions/setup-python@v4 is outdated. Version 5 is the current recommended version.

🔧 Proposed fix
       - name: Set up Python 3.12
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v5
         with:
           python-version: "3.12"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set up Python 3.12
uses: actions/setup-python@v4
with:
python-version: "3.12"
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
🧰 Tools
🪛 actionlint (1.7.10)

[error] 30-30: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In @.github/workflows/summarize-evals.yml around lines 29 - 32, Update the
GitHub Actions step that currently uses actions/setup-python@v4 to the
recommended v5; locate the workflow step with the name "Set up Python 3.12" and
change the uses value from actions/setup-python@v4 to actions/setup-python@v5
while keeping the existing python-version: "3.12" configuration intact.

Comment on lines +42 to +49
- name: Summarize Evaluations
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
python main.py \
-f config/release_summarize_evals.yml \
--seasons ${{ inputs.seasons }} \
--sentry-url "${{ secrets.SENTRY_URL }}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential command injection via inputs.seasons.

The ${{ inputs.seasons }} is directly interpolated into the shell command without sanitization. A malicious input like ; rm -rf / could be injected. Use an environment variable to safely pass the input.

🛡️ Proposed fix
       - name: Summarize Evaluations
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          SEASONS: ${{ inputs.seasons }}
         run: |
           python main.py \
             -f config/release_summarize_evals.yml \
-            --seasons ${{ inputs.seasons }} \
+            --seasons $SEASONS \
             --sentry-url "${{ secrets.SENTRY_URL }}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Summarize Evaluations
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
python main.py \
-f config/release_summarize_evals.yml \
--seasons ${{ inputs.seasons }} \
--sentry-url "${{ secrets.SENTRY_URL }}"
- name: Summarize Evaluations
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
SEASONS: ${{ inputs.seasons }}
run: |
python main.py \
-f config/release_summarize_evals.yml \
--seasons $SEASONS \
--sentry-url "${{ secrets.SENTRY_URL }}"
🤖 Prompt for AI Agents
In @.github/workflows/summarize-evals.yml around lines 42 - 49, The "Summarize
Evaluations" workflow step currently interpolates `${{ inputs.seasons }}`
directly into the shell which risks command injection; change the step to set an
environment variable (e.g., SEASONS) from `${{ inputs.seasons }}` and then call
python main.py with --seasons "$SEASONS" (preserving other envs like
OPENAI_API_KEY and SENTRY_URL), ensuring the value is passed as a single
quoted/escaped argument instead of being expanded into the shell command.

Comment on lines +83 to +94
async with semaphore:
response = await openai_client.chat.completions.create(
model=OPENAI_MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_content},
],
temperature=0.3,
max_tokens=300,
)

return response.choices[0].message.content.strip()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle potential None content from API response.

response.choices[0].message.content can be None in some edge cases (e.g., if the model declines to respond). Calling .strip() on None would raise an AttributeError.

🛡️ Proposed fix
-    return response.choices[0].message.content.strip()
+    content = response.choices[0].message.content
+    return content.strip() if content else ""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async with semaphore:
response = await openai_client.chat.completions.create(
model=OPENAI_MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_content},
],
temperature=0.3,
max_tokens=300,
)
return response.choices[0].message.content.strip()
async with semaphore:
response = await openai_client.chat.completions.create(
model=OPENAI_MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_content},
],
temperature=0.3,
max_tokens=300,
)
content = response.choices[0].message.content
return content.strip() if content else ""
🤖 Prompt for AI Agents
In `@ferry/summarize/summarize_evals.py` around lines 83 - 94, The code assumes
response.choices[0].message.content is always a string and calls .strip(), which
can raise AttributeError if content is None; update the return logic in
summarize_evals.py (the block using openai_client.chat.completions.create and
returning response.choices[0].message.content.strip()) to safely handle None by
checking the value (e.g., assign content = response.choices[0].message.content
or "" / fallback text) and then call .strip() on the confirmed string,
optionally logging or raising a clear error if content is unexpectedly missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant