Skip to content

Code_Generation: Prompt JSON source mismatch (research_plan.json vs task_analysis_*.json) + missing post-generation run/debug verification step #7

@AA-ke

Description

@AA-ke

While reviewing CellForge’s code generation scripts under scAgents/cellforge/Code_Generation/, I found:

  1. The prompt builder in auto_start_openhands.py hardcodes results/research_plan.json and expects keys like task_description, dataset, and perturbations, but in this project the authoritative metadata appears in Task_Analysis/results/task_analysis_*.json with different key names (e.g., dataset_info). This causes missing/incorrect prompt content and path mismatches.
  2. The current code generation pipeline appears to only start OpenHands + request code + write result.py, but does not implement the “auto compile/run/debug verification” mentioned in the original description (no py_compile, smoke test, unit test, etc.), so generated code is not automatically validated.

Evidence / Code Pointers

A) Prompt reads results/research_plan.json (hardcoded)
In auto_start_openhands.py, prerequisites and prompt construction rely on results/research_plan.json:
auto_start_openhands.py:
research_plan_path = Path("results/research_plan.json")if research_plan_path.exists(): logger.info(f"Found research plan: {research_plan_path}")else: logger.warning(f"Research plan not found: {research_plan_path}") logger.info("OpenHands will start without research plan")
Prompt construction pulls keys from that JSON:
auto_start_openhands.py
research_plan_path = Path("results/research_plan.json")with open(research_plan_path, 'r', encoding='utf-8') as f: research_plan = json.load(f)task_description = research_plan.get("task_description", "Single-cell perturbation prediction")dataset_info = research_plan.get("dataset", {})perturbations = research_plan.get("perturbations", [])
B) Actual metadata exists in Task_Analysis outputs (different schema)
Example task analysis output contains task_description and dataset_info:
task_analysis_2026_.json
{ "timestamp": "20260119_105842", "task_description": "...", "dataset_info": { "dataset_path": "cellforge/data/datasets/", "dataset_name": "norman_2019_k562", "data_type": "scRNA-seq", "cell_line": "K562", "perturbation_type": "CRISPRi" }, ...}
This means the prompt builder is likely reading the wrong file and/or wrong keys (dataset vs dataset_info).
C) Code generation writes result.py but does not run/validate it
OpenHandsCodeGenerator.generate_code() starts OpenHands, calls the chat API, extracts code, and writes result.py. No post-generation compile/run/test step is implemented:
init.py
def generate_code(...): ... if not self.start_openhands_docker(): return None if not self.wait_for_openhands_ready(): return None full_prompt = f"{self.code_generation_prompt}\n\nRESEARCH PLAN:\n{research_plan_json}" code_file_path = self._send_to_openhands(full_prompt, output_dir) if code_file_path: logger.info(f"Code generated successfully: {code_file_path}") return code_file_path ...

Steps to Reproduce

Ensure scAgents/cellforge/Task_Analysis/results/task_analysis_*.json exists (generated by Task Analysis).
Run:
python scAgents/cellforge/Code_Generation/auto_start_openhands.py
Inspect the generated prompt (e.g., ~/.openhands-workspace/initial_prompt.md) and/or logs.
Observe that dataset/task fields may be missing or “Unknown” due to schema/path mismatch.
Run the code generation flow and note it outputs result.py but does not attempt to compile/run/smoke-test it.

Expected Behavior

Prompt creation should use the authoritative task analysis output by default (e.g., latest Task_Analysis/results/task_analysis_*.json), or allow explicit configuration of the source file via CLI flag/env var.
Schema/key compatibility should be handled (dataset_info vs dataset, etc.), with clear warnings when fields are missing.
After generation, the pipeline should provide an optional but default-safe validation loop such as:
_compile result.py`
optional smoke test (e.g., python result.py --help or minimal run)
optional tests/linting if applicable (pytest -q, etc.)
persist logs to output_dir and optionally re-prompt OpenHands to fix failures automatically.

Actual Behavior

Prompt creation is hardcoded to results/research_plan.json and expects keys that don’t match the project’s task analysis schema, resulting in incomplete/incorrect prompt info.
Generated code is saved but not automatically validated by compilation/execution/testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions