Skip to content

Support new Open AI model configuration#87

Merged
Ayush8923 merged 9 commits intomainfrom
feat/support-new-models-configuration
Apr 1, 2026
Merged

Support new Open AI model configuration#87
Ayush8923 merged 9 commits intomainfrom
feat/support-new-models-configuration

Conversation

@Ayush8923
Copy link
Copy Markdown
Collaborator

@Ayush8923 Ayush8923 commented Mar 24, 2026

Issue: #88

Summary:

  • Add GPT-5 family models (GPT-5, GPT-5 mini/nano, GPT-5.4, GPT-5.4 pro/mini/nano) and
    GPT-4.1 to the model selection dropdown
  • Hide temperature and max results fields when a GPT-5 model is selected
  • Exclude temperature and max_num_results from the API payload for GPT-5 models
  • Centralize MODEL_OPTIONS into a single shared file (was duplicated in 3 files)

Summary by CodeRabbit

  • New Features

    • Added GPT-5 model support with automatic UI adaptation for model-specific parameters.
    • Implemented formula injection prevention for CSV exports.
  • Bug Fixes

    • Fixed temperature control visibility for GPT-5 models.
    • Fixed Max Results input display for GPT-5 models.
  • Improvements

    • Simplified API key authentication flow for evaluations and datasets.
    • Enhanced model selection with dynamically populated options.
    • Improved evaluation status display and warnings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces GPT-5 model support with conditional UI logic, refactors API key handling in evaluations to use authentication state, adds CSV sanitization utilities, and centralizes model configuration. Changes span model management, multiple component pages, and API integration across the application.

Changes

Cohort / File(s) Summary
Model Infrastructure
app/lib/models.ts, app/components/ConfigDrawer.tsx, app/components/prompt-editor/ConfigEditorPane.tsx, app/components/prompt-editor/CurrentConfigTab.tsx
Created new models.ts module exporting ModelOption interface, MODEL_OPTIONS constant with GPT-5/GPT-4 variants, and isGpt5Model() utility. Updated config components to import and use these, conditionally hiding Temperature slider and Max Results fields when isGpt5 is true.
Configuration & Defaults
app/lib/constants.ts, app/(main)/configurations/prompt-editor/page.tsx, app/components/ConfigSelector.tsx
Added DEFAULT_CONFIG constant with OpenAI provider and gpt-4o-mini as default model. Updated prompt-editor page to omit temperature and max_num_results from saved configs when GPT-5 is selected. Modified ConfigSelector to conditionally render temperature only when defined.
Evaluations API Refactoring
app/components/evaluations/DatasetsTab.tsx, app/components/evaluations/EvaluationsTab.tsx, app/(main)/evaluations/page.tsx
Replaced API-keys array + selectedKeyId prop pattern with single activeKey sourced from useAuth() hook. Refactored backend requests to use centralized apiFetch helper instead of manual fetch calls, removing explicit error parsing and API-key header handling.
CSV Sanitization & Evaluation Details
app/lib/utils.ts, app/(main)/evaluations/[id]/page.tsx
Added escapeCSVValue() and sanitizeCSVCell() utilities to handle CSV special characters and optional formula-injection prevention. Updated evaluation detail page to import and remove duplicate sanitization logic; simplified pairs display UI and condensed in-progress scoring warning logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • Model Selection: Update GPT-5 controls #88 — Adds centralized MODEL_OPTIONS and isGpt5Model utility for gating GPT-5-specific controls (Temperature, Max Results) in config UI and saves, addressing GPT-5 model feature parity requirements.

Possibly related PRs

Suggested reviewers

  • AkhileshNegi
  • vprashrex
  • Prajna1999

🐰 A model gatekeeper hops through the code,
GPT-5 now hides its temperature load,
Keys no longer scatter like carrots so wide,
Auth brings them home with a centralized ride—
CSV escapes with a formula-proof stride!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Support new Open AI model configuration' is vague and overly broad, referring generally to model configuration without clearly conveying the main change of adding GPT-5 support and gating temperature/max results fields. Consider a more specific title like 'Add GPT-5 model support with conditional field gating' or 'Support GPT-5 models and hide temperature for new model family' to clearly indicate the primary change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/support-new-models-configuration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
app/lib/utils.ts (1)

4-10: Consider centralizing timestamp normalization to avoid logic drift.

timeAgo now duplicates normalization logic that also exists in app/lib/useConfigs.ts:456-471 (formatRelativeTime). This duplication already diverged (negative offset handling). A shared helper would prevent future regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/lib/utils.ts` around lines 4 - 10, The time normalization logic in
timeAgo duplicates code from formatRelativeTime causing drift; extract a single
helper (e.g., normalizeTimestamp or parseTimestampWithUTC) and replace the
normalization in both timeAgo and formatRelativeTime to call that helper; ensure
the helper accepts a date string, applies the same "append Z when no timezone"
rule and any negative-offset handling currently used in formatRelativeTime, and
return a Date object so existing calls in timeAgo and formatRelativeTime (and
any other callers) can use it without duplicating logic.
app/components/prompt-editor/CurrentConfigTab.tsx (1)

228-230: Consider using dynamic provider key for future extensibility.

The model options are currently hardcoded to MODEL_OPTIONS.openai. When additional providers are enabled, this will need to be updated to use the provider prop dynamically.

♻️ Proposed fix for future provider support
-          {MODEL_OPTIONS.openai.map((m) => (
+          {MODEL_OPTIONS[provider as keyof typeof MODEL_OPTIONS]?.map((m) => (
             <option key={m.value} value={m.value}>{m.label}</option>
           ))}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/prompt-editor/CurrentConfigTab.tsx` around lines 228 - 230,
The options list is hardcoded to MODEL_OPTIONS.openai which prevents switching
providers; update the mapping to use the component's provider prop dynamically
(e.g., use MODEL_OPTIONS[provider] or a safe getter like
getModelOptions(provider)) when rendering options in CurrentConfigTab (where
MODEL_OPTIONS.openai.map(...) is used) and guard against missing providers by
falling back to a default provider or empty array so the <option> mapping works
for future providers.
app/configurations/prompt-editor/types.ts (1)

26-30: Consider making max_num_results optional in the Tool interface for consistency.

The Tool interface defines max_num_results as required, but GPT-5 models don't support this parameter. When loading a saved GPT-5 config that omits max_num_results, the tools array reconstruction could fail type validation or produce unexpected behavior since the field is still expected.

♻️ Proposed fix
 export interface Tool {
   type: 'file_search';
   knowledge_base_ids: string[];
-  max_num_results: number;
+  max_num_results?: number;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/configurations/prompt-editor/types.ts` around lines 26 - 30, The Tool
interface currently requires max_num_results which breaks loading GPT-5 configs
that omit it; update the Tool type so max_num_results is optional (e.g. declare
max_num_results?: number) and ensure any code constructing or validating Tool
instances (places referencing Tool, the tools array reconstruction logic, and
any validators/serializers) handles the undefined case safely by falling back to
a sensible default or skipping use when undefined; update usages to treat
max_num_results as optional rather than required.
app/components/ConfigDrawer.tsx (1)

809-810: Consider handling undefined temperature in the Saved tab display.

Line 810 displays T:{version.temperature} for all saved configs. For GPT-5 configs where temperature is omitted, this could display T:undefined. Consider conditionally showing this only when temperature is defined.

♻️ Proposed fix
                              <div className="text-xs mb-2" style={{ color: colors.text.secondary }}>
-                                {formatTimestamp(version.timestamp)} • {version.provider}/{version.modelName} • T:{version.temperature}
+                                {formatTimestamp(version.timestamp)} • {version.provider}/{version.modelName}{version.temperature !== undefined ? ` • T:${version.temperature}` : ''}
                              </div>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/ConfigDrawer.tsx` around lines 809 - 810, The Saved tab in
ConfigDrawer renders "T:{version.temperature}" which can show "T:undefined" when
version.temperature is absent; update the JSX that renders the line containing
formatTimestamp(version.timestamp) • {version.provider}/{version.modelName} •
T:{version.temperature} to conditionally include the temperature segment only
when version.temperature is not null/undefined (e.g., render the "• T:…" piece
with a conditional/ternary or &&), ensuring separators (dots) are adjusted so
there are no dangling separators when temperature is omitted; reference
ConfigDrawer, formatTimestamp, and version.temperature to locate the markup to
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/lib/utils.ts`:
- Around line 5-8: The timezone detection around the date creation in timeAgo
incorrectly only checks for "Z" or "+" on dateStr; update the condition that
builds date (the dateStr check used before new Date(dateStr) in timeAgo) to also
detect negative offsets by matching a timezone offset pattern (e.g. Z or +HH:MM
or -HH:MM) using a regex like /Z|[+\-]\d{2}(:\d{2})?$/. Replace the simple
includes(...) check with this regex test so valid strings like
"2026-03-24T09:00:49-05:00" are treated as timezone-aware and not suffixed with
"Z", preventing the Invalid time value error when constructing the Date.

---

Nitpick comments:
In `@app/components/ConfigDrawer.tsx`:
- Around line 809-810: The Saved tab in ConfigDrawer renders
"T:{version.temperature}" which can show "T:undefined" when version.temperature
is absent; update the JSX that renders the line containing
formatTimestamp(version.timestamp) • {version.provider}/{version.modelName} •
T:{version.temperature} to conditionally include the temperature segment only
when version.temperature is not null/undefined (e.g., render the "• T:…" piece
with a conditional/ternary or &&), ensuring separators (dots) are adjusted so
there are no dangling separators when temperature is omitted; reference
ConfigDrawer, formatTimestamp, and version.temperature to locate the markup to
change.

In `@app/components/prompt-editor/CurrentConfigTab.tsx`:
- Around line 228-230: The options list is hardcoded to MODEL_OPTIONS.openai
which prevents switching providers; update the mapping to use the component's
provider prop dynamically (e.g., use MODEL_OPTIONS[provider] or a safe getter
like getModelOptions(provider)) when rendering options in CurrentConfigTab
(where MODEL_OPTIONS.openai.map(...) is used) and guard against missing
providers by falling back to a default provider or empty array so the <option>
mapping works for future providers.

In `@app/configurations/prompt-editor/types.ts`:
- Around line 26-30: The Tool interface currently requires max_num_results which
breaks loading GPT-5 configs that omit it; update the Tool type so
max_num_results is optional (e.g. declare max_num_results?: number) and ensure
any code constructing or validating Tool instances (places referencing Tool, the
tools array reconstruction logic, and any validators/serializers) handles the
undefined case safely by falling back to a sensible default or skipping use when
undefined; update usages to treat max_num_results as optional rather than
required.

In `@app/lib/utils.ts`:
- Around line 4-10: The time normalization logic in timeAgo duplicates code from
formatRelativeTime causing drift; extract a single helper (e.g.,
normalizeTimestamp or parseTimestampWithUTC) and replace the normalization in
both timeAgo and formatRelativeTime to call that helper; ensure the helper
accepts a date string, applies the same "append Z when no timezone" rule and any
negative-offset handling currently used in formatRelativeTime, and return a Date
object so existing calls in timeAgo and formatRelativeTime (and any other
callers) can use it without duplicating logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea765e1a-3fe8-47d2-82a0-46a69693c567

📥 Commits

Reviewing files that changed from the base of the PR and between fe9a695 and 2095077.

📒 Files selected for processing (7)
  • app/components/ConfigDrawer.tsx
  • app/components/prompt-editor/ConfigEditorPane.tsx
  • app/components/prompt-editor/CurrentConfigTab.tsx
  • app/configurations/prompt-editor/page.tsx
  • app/configurations/prompt-editor/types.ts
  • app/lib/models.ts
  • app/lib/utils.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/components/evaluations/EvaluationsTab.tsx (1)

66-125: ⚠️ Potential issue | 🟠 Major

Scope evaluation state to the current API key.

evalJobs and assistantConfigs survive API-key changes, so this component can render runs/configs fetched under the previous credential until the next requests finish. Because the config cache is keyed only by assistant_id, assistantConfigs.has(job.assistant_id) can also suppress a refetch and keep showing a config loaded under the wrong key indefinitely. Clear both states when activeKey.key changes and ignore or abort older requests before they commit state.

Possible direction
+ useEffect(() => {
+   setEvalJobs([]);
+   setAssistantConfigs(new Map());
+   setError(null);
+ }, [activeKey?.key]);

Then make both fetch helpers ignore aborted/stale responses so an older key cannot repopulate the cleared state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/evaluations/EvaluationsTab.tsx` around lines 66 - 125, When
the API key changes, clear evalJobs and assistantConfigs and ensure in-flight
fetches cannot repopulate cleared state: update the activeKey effect to reset
setEvalJobs([]) and setAssistantConfigs(new Map()) whenever activeKey?.key
changes, and modify fetchEvaluations and fetchAssistantConfig to accept/use an
AbortController or capture the current activeKey value so that responses are
ignored if the request is aborted or the key has changed; specifically, add
abort handling in fetchEvaluations and fetchAssistantConfig (or check that
activeKey.key matches the key used to start the fetch) and bail out before
calling setEvalJobs or setAssistantConfigs to avoid stale-data commits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/`(routes)/configurations/prompt-editor/page.tsx:
- Around line 297-302: The params payload may include temperature: undefined
when switching a GPT-5-loaded config back to a non-GPT-5 model; update the
branch that builds params (the object containing model, instructions:
currentContent, ...(!gpt5 && { temperature, })) to normalize temperature by
using currentConfigBlob.completion.params.temperature ?? 0.7 so it always sends
a concrete default for non-GPT-5 models; locate the params construction in
page.tsx (referencing variables model, currentContent, gpt5, currentConfigBlob)
and replace the temperature expression with the nullish-coalescing default.
- Around line 89-95: The UI is using stableVersionItemsMap for versionItemsMap
which hides live cache misses; change the binding so the live
hookVersionItemsMap is used (or takes precedence) for cache-miss detection while
still falling back to stableVersionItemsMap for stability. Replace the current
const versionItemsMap = stableVersionItemsMap with logic that uses
hookVersionItemsMap when it has entries (e.g. use hookVersionItemsMap if
Object.keys(hookVersionItemsMap).length > 0, or merge like {
...stableVersionItemsMap, ...hookVersionItemsMap } with hookVersionItemsMap
taking precedence) so the refill effect that reads configState.versionItemsCache
sees live misses and refetches correctly.

In `@app/`(routes)/evaluations/page.tsx:
- Around line 75-96: When activeKey changes you should immediately clear dataset
state and protect loadStoredDatasets against stale responses: in the effect that
watches activeKey (and before calling loadStoredDatasets) call
setStoredDatasets([]) and setSelectedDatasetId(undefined) to remove previous key
data, and inside loadStoredDatasets implement a request-id or AbortController
guard (e.g., create a local token/requestId captured by the async call and
verify it before calling setStoredDatasets) so an older in-flight response
cannot overwrite the fresh results for the current activeKey; keep references to
activeKey.key, loadStoredDatasets, storedDatasets, and selectedDatasetId to
locate where to add these changes.

In `@app/components/evaluations/EvaluationsTab.tsx`:
- Around line 123-125: The current useEffect only calls fetchEvaluations once
per key; change it to poll every 10 seconds while a key is active: inside the
useEffect that references activeKey and fetchEvaluations, call fetchEvaluations
immediately if activeKey?.key exists, then start a setInterval that calls
fetchEvaluations every 10000ms; return a cleanup function that clears the
interval to avoid leaks. Ensure the effect depends on activeKey.key (or
activeKey) and fetchEvaluations so polling restarts when the key changes.

---

Outside diff comments:
In `@app/components/evaluations/EvaluationsTab.tsx`:
- Around line 66-125: When the API key changes, clear evalJobs and
assistantConfigs and ensure in-flight fetches cannot repopulate cleared state:
update the activeKey effect to reset setEvalJobs([]) and setAssistantConfigs(new
Map()) whenever activeKey?.key changes, and modify fetchEvaluations and
fetchAssistantConfig to accept/use an AbortController or capture the current
activeKey value so that responses are ignored if the request is aborted or the
key has changed; specifically, add abort handling in fetchEvaluations and
fetchAssistantConfig (or check that activeKey.key matches the key used to start
the fetch) and bail out before calling setEvalJobs or setAssistantConfigs to
avoid stale-data commits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9930a1dd-bec6-496d-949b-b225ee17d108

📥 Commits

Reviewing files that changed from the base of the PR and between c9b737a and 2f44f15.

📒 Files selected for processing (13)
  • app/(routes)/configurations/prompt-editor/page.tsx
  • app/(routes)/configurations/prompt-editor/types.ts
  • app/(routes)/evaluations/page.tsx
  • app/components/ConfigDrawer.tsx
  • app/components/ConfigSelector.tsx
  • app/components/evaluations/DatasetsTab.tsx
  • app/components/evaluations/EvaluationsTab.tsx
  • app/components/prompt-editor/ConfigEditorPane.tsx
  • app/components/prompt-editor/CurrentConfigTab.tsx
  • app/hooks/useConfigs.ts
  • app/lib/constants.ts
  • app/lib/models.ts
  • app/lib/utils.ts
✅ Files skipped from review due to trivial changes (4)
  • app/(routes)/configurations/prompt-editor/types.ts
  • app/hooks/useConfigs.ts
  • app/lib/models.ts
  • app/components/prompt-editor/CurrentConfigTab.tsx
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/lib/utils.ts
  • app/components/prompt-editor/ConfigEditorPane.tsx
  • app/components/ConfigDrawer.tsx
  • app/components/ConfigSelector.tsx

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
app/components/evaluations/EvaluationsTab.tsx (1)

66-88: ⚠️ Potential issue | 🟠 Major

Clear and guard key-scoped evaluation state on API-key changes.

Both fetch paths still update component state unconditionally, so switching activeKey can briefly show the previous key’s runs/configs, and an older in-flight response can overwrite the new key’s state. Reset evalJobs / assistantConfigs when the key changes and ignore stale responses before calling setEvalJobs or setAssistantConfigs.

Also applies to: 91-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/evaluations/EvaluationsTab.tsx` around lines 66 - 88, Reset
the key-scoped state and guard against stale in-flight responses: when activeKey
changes, immediately clear evalJobs and assistantConfigs (call setEvalJobs([])
and setAssistantConfigs([])) and use a per-request token (e.g., incrementing
requestId local to fetchEvaluations and the similar fetch in the 91-113 block or
an AbortController) captured in the async closure; before calling setEvalJobs or
setAssistantConfigs inside fetchEvaluations (and the other fetch function),
verify the token matches the latest activeKey token (or that the request wasn’t
aborted) to ignore stale responses and prevent older results from overwriting
state.
app/(main)/evaluations/page.tsx (1)

205-221: ⚠️ Potential issue | 🟠 Major

Return false from the validation branches to match the prop contract.

EvaluationsTab expects handleRunEvaluation: () => Promise<boolean>, but the bare return; statements at lines 208, 212, 216, and 220 widen this to Promise<boolean | undefined>. This violates the contract and leaves the caller with an undeclared third state.

Fix
  const handleRunEvaluation = async () => {
    if (!activeKey?.key) {
      toast.error("Please select an API key first");
-     return;
+     return false;
    }
    if (!selectedDatasetId) {
      toast.error("Please select a dataset first");
-     return;
+     return false;
    }
    if (!experimentName.trim()) {
      toast.error("Please enter an evaluation name");
-     return;
+     return false;
    }
    if (!selectedConfigId || !selectedConfigVersion) {
      toast.error("Please select a configuration before running evaluation");
-     return;
+     return false;
    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/`(main)/evaluations/page.tsx around lines 205 - 221, The validation
branches inside the async function handleRunEvaluation currently use bare
returns which produce Promise<void|boolean>; update each early-exit validation
(checks for activeKey, selectedDatasetId, experimentName,
selectedConfigId/selectedConfigVersion) to explicitly return false so the
signature matches EvaluationsTab's handleRunEvaluation: () => Promise<boolean>;
also ensure the successful path returns true (or resolves to true) at the end of
handleRunEvaluation so callers always receive a boolean.
♻️ Duplicate comments (3)
app/(main)/configurations/prompt-editor/page.tsx (1)

288-290: ⚠️ Potential issue | 🟠 Major

Default non-GPT-5 temperature before save.

On Line 289, the non-GPT-5 branch can still emit temperature: undefined if a GPT-5-loaded config is switched to a non-GPT-5 model and saved without touching the slider. Please normalize with ?? 0.7.

Suggested fix
             ...(!gpt5 && {
-              temperature: currentConfigBlob.completion.params.temperature,
+              temperature: currentConfigBlob.completion.params.temperature ?? 0.7,
             }),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/`(main)/configurations/prompt-editor/page.tsx around lines 288 - 290, The
non-GPT-5 branch can emit temperature: undefined when switching from a GPT-5
config; update the spread that adds temperature for non-GPT-5 models so it uses
a normalized default (e.g., currentConfigBlob.completion.params.temperature ??
0.7) instead of the raw value. Modify the expression where gpt5 is checked and
temperature is added (the ...(!gpt5 && { temperature:
currentConfigBlob.completion.params.temperature, }) spot) to apply the
nullish-coalescing default so saved configs always have a concrete temperature.
app/components/evaluations/EvaluationsTab.tsx (1)

123-125: ⚠️ Potential issue | 🟠 Major

Restore the 10-second status polling.

This effect only fetches once per key, so runs that move from pending / processing to completed stay stale until the user manually refreshes.

Possible direction
  useEffect(() => {
-   if (activeKey?.key) fetchEvaluations();
+   if (!activeKey?.key) return;
+
+   fetchEvaluations();
+   const intervalId = window.setInterval(fetchEvaluations, 10_000);
+
+   return () => {
+     window.clearInterval(intervalId);
+   };
  }, [activeKey, fetchEvaluations]);
Based on learnings: Implement polling every 10 seconds for job status updates in evaluation workflows instead of using WebSockets or server-sent events.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/evaluations/EvaluationsTab.tsx` around lines 123 - 125, The
current useEffect with dependency [activeKey, fetchEvaluations] only calls
fetchEvaluations once per key; change it to start a 10-second polling loop when
an activeKey.key exists by creating an interval (setInterval) that calls
fetchEvaluations every 10_000 ms and ensure you clear the interval on cleanup
(clearInterval) and when activeKey changes; implement this inside the same
useEffect that references activeKey and fetchEvaluations so the interval starts
for the active key and is disposed via the effect’s cleanup to avoid duplicate
timers.
app/(main)/evaluations/page.tsx (1)

76-97: ⚠️ Potential issue | 🟠 Major

Reset dataset state before refetching for a new API key.

This still leaves the previous key’s datasets and selection visible until the new request resolves, and an older response can overwrite the current list entirely. Clear storedDatasets / selectedDatasetId on key changes and guard loadStoredDatasets against stale responses before calling setStoredDatasets.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/`(main)/evaluations/page.tsx around lines 76 - 97, When the API key
changes, clear the previous dataset UI state and prevent stale responses from
overwriting it: in the useEffect that watches activeKey, call
setStoredDatasets([]) and setSelectedDatasetId(undefined) (or null) immediately
before invoking loadStoredDatasets, and modify loadStoredDatasets to track
request identity (e.g., a local requestId or AbortController) tied to the
current activeKey so that only the latest response calls setStoredDatasets; also
ensure loadStoredDatasets checks activeKey still matches before updating state
and still toggles setIsDatasetsLoading correctly.
🧹 Nitpick comments (1)
app/(main)/evaluations/[id]/page.tsx (1)

169-171: Consider using sanitizeCSVCell consistently for all user-controlled content.

While sanitizeCSVCell is now imported and used for some fields, several user-controlled values still use inline escaping (LLM answers at line 170, and in exportRowCSV at lines 229-231). For consistency and to benefit from formula injection prevention on untrusted content, consider migrating these as well.

Example for line 170
-          row.push(
-            `"${(group.llm_answers[i] || "").replace(/"/g, '""').replace(/\n/g, " ")}"`,
-          );
+          row.push(sanitizeCSVCell(group.llm_answers[i] || "", true));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/`(main)/evaluations/[id]/page.tsx around lines 169 - 171, Replace inline
CSV escaping of user-controlled strings with the shared sanitizeCSVCell utility:
where group.llm_answers[i] is pushed (currently using .replace to escape
quotes/newlines) and in the exportRowCSV logic (the spots mentioned around
exportRowCSV and the lines that manually replace quotes/newlines), call
sanitizeCSVCell(group.llm_answers[i] || "") and sanitizeCSVCell for the other
user-controlled fields instead of manual .replace chains; ensure you pass the
raw value to sanitizeCSVCell so it can handle quote/newline escaping and
formula-injection protection consistently across the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/`(main)/evaluations/[id]/page.tsx:
- Around line 384-387: Define a shared constant TERMINAL_JOB_STATUSES =
["completed","success","failed","error"] in app/lib/constants.ts, then replace
the ad-hoc progress checks (the isJobInProgress boolean expressions) in
page.tsx, EvalRunCard.tsx and the other pages with: const isJobInProgress =
!TERMINAL_JOB_STATUSES.includes(job.status.toLowerCase()); also update any other
places relying on getStatusColor() to use the same TERMINAL_JOB_STATUSES to keep
terminal-state logic consistent across getStatusColor, isJobInProgress, and
warning displays.

In `@app/`(main)/evaluations/page.tsx:
- Around line 41-42: The page currently reads activeKey from useAuth() and
renders the "API key required" empty state before the auth context hydrates;
update the component to also read isHydrated from useAuth() and short-circuit to
a loader/placeholder (or keep existing mounted/loading state) while isHydrated
is false so the empty-key UI only appears after hydration completes; apply this
change wherever activeKey is checked (e.g., the checks around lines referencing
activeKey at the top and the other branches you noted) so that checks use both
isHydrated and activeKey (render loader if !isHydrated, render missing-key UI
only if isHydrated && !activeKey).

---

Outside diff comments:
In `@app/`(main)/evaluations/page.tsx:
- Around line 205-221: The validation branches inside the async function
handleRunEvaluation currently use bare returns which produce
Promise<void|boolean>; update each early-exit validation (checks for activeKey,
selectedDatasetId, experimentName, selectedConfigId/selectedConfigVersion) to
explicitly return false so the signature matches EvaluationsTab's
handleRunEvaluation: () => Promise<boolean>; also ensure the successful path
returns true (or resolves to true) at the end of handleRunEvaluation so callers
always receive a boolean.

In `@app/components/evaluations/EvaluationsTab.tsx`:
- Around line 66-88: Reset the key-scoped state and guard against stale
in-flight responses: when activeKey changes, immediately clear evalJobs and
assistantConfigs (call setEvalJobs([]) and setAssistantConfigs([])) and use a
per-request token (e.g., incrementing requestId local to fetchEvaluations and
the similar fetch in the 91-113 block or an AbortController) captured in the
async closure; before calling setEvalJobs or setAssistantConfigs inside
fetchEvaluations (and the other fetch function), verify the token matches the
latest activeKey token (or that the request wasn’t aborted) to ignore stale
responses and prevent older results from overwriting state.

---

Duplicate comments:
In `@app/`(main)/configurations/prompt-editor/page.tsx:
- Around line 288-290: The non-GPT-5 branch can emit temperature: undefined when
switching from a GPT-5 config; update the spread that adds temperature for
non-GPT-5 models so it uses a normalized default (e.g.,
currentConfigBlob.completion.params.temperature ?? 0.7) instead of the raw
value. Modify the expression where gpt5 is checked and temperature is added (the
...(!gpt5 && { temperature: currentConfigBlob.completion.params.temperature, })
spot) to apply the nullish-coalescing default so saved configs always have a
concrete temperature.

In `@app/`(main)/evaluations/page.tsx:
- Around line 76-97: When the API key changes, clear the previous dataset UI
state and prevent stale responses from overwriting it: in the useEffect that
watches activeKey, call setStoredDatasets([]) and
setSelectedDatasetId(undefined) (or null) immediately before invoking
loadStoredDatasets, and modify loadStoredDatasets to track request identity
(e.g., a local requestId or AbortController) tied to the current activeKey so
that only the latest response calls setStoredDatasets; also ensure
loadStoredDatasets checks activeKey still matches before updating state and
still toggles setIsDatasetsLoading correctly.

In `@app/components/evaluations/EvaluationsTab.tsx`:
- Around line 123-125: The current useEffect with dependency [activeKey,
fetchEvaluations] only calls fetchEvaluations once per key; change it to start a
10-second polling loop when an activeKey.key exists by creating an interval
(setInterval) that calls fetchEvaluations every 10_000 ms and ensure you clear
the interval on cleanup (clearInterval) and when activeKey changes; implement
this inside the same useEffect that references activeKey and fetchEvaluations so
the interval starts for the active key and is disposed via the effect’s cleanup
to avoid duplicate timers.

---

Nitpick comments:
In `@app/`(main)/evaluations/[id]/page.tsx:
- Around line 169-171: Replace inline CSV escaping of user-controlled strings
with the shared sanitizeCSVCell utility: where group.llm_answers[i] is pushed
(currently using .replace to escape quotes/newlines) and in the exportRowCSV
logic (the spots mentioned around exportRowCSV and the lines that manually
replace quotes/newlines), call sanitizeCSVCell(group.llm_answers[i] || "") and
sanitizeCSVCell for the other user-controlled fields instead of manual .replace
chains; ensure you pass the raw value to sanitizeCSVCell so it can handle
quote/newline escaping and formula-injection protection consistently across the
codebase.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b40d878-b450-4db7-b213-5466538d34f9

📥 Commits

Reviewing files that changed from the base of the PR and between 653964a and 38c9c8d.

📒 Files selected for processing (10)
  • app/(main)/configurations/prompt-editor/page.tsx
  • app/(main)/evaluations/[id]/page.tsx
  • app/(main)/evaluations/page.tsx
  • app/components/ConfigDrawer.tsx
  • app/components/evaluations/DatasetsTab.tsx
  • app/components/evaluations/EvaluationsTab.tsx
  • app/components/prompt-editor/ConfigEditorPane.tsx
  • app/components/prompt-editor/CurrentConfigTab.tsx
  • app/lib/constants.ts
  • app/lib/utils.ts
✅ Files skipped from review due to trivial changes (1)
  • app/components/prompt-editor/CurrentConfigTab.tsx
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/lib/constants.ts
  • app/components/ConfigDrawer.tsx
  • app/lib/utils.ts
  • app/components/evaluations/DatasetsTab.tsx

@Ayush8923 Ayush8923 merged commit d3f11ec into main Apr 1, 2026
2 checks passed
@Ayush8923 Ayush8923 deleted the feat/support-new-models-configuration branch April 1, 2026 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants