Code_Generation: Prompt JSON source mismatch (research_plan.json vs task_analysis_*.json) + missing post-generation run/debug verification step

While reviewing CellForge’s code generation scripts under `scAgents/cellforge/Code_Generation/`, I found:
1) The prompt builder in auto_start_openhands.py hardcodes results/research_plan.json and expects keys like task_description, dataset, and perturbations, but in this project the authoritative metadata appears in `Task_Analysis/results/task_analysis_*.json` with different key names (e.g., dataset_info). This causes missing/incorrect prompt content and path mismatches.
2) The current code generation pipeline appears to only start OpenHands + request code + write result.py, but does not implement the “auto compile/run/debug verification” mentioned in the original description (no py_compile, smoke test, unit test, etc.), so generated code is not automatically validated.

### Evidence / Code Pointers
_A) Prompt reads `results/research_plan.json `(hardcoded)_
In auto_start_openhands.py, prerequisites and prompt construction rely on `results/research_plan.json`:
auto_start_openhands.py:
`research_plan_path = Path("results/research_plan.json")if research_plan_path.exists():    logger.info(f"Found research plan: {research_plan_path}")else:    logger.warning(f"Research plan not found: {research_plan_path}")    logger.info("OpenHands will start without research plan")`
Prompt construction pulls keys from that JSON:
auto_start_openhands.py
`research_plan_path = Path("results/research_plan.json")with open(research_plan_path, 'r', encoding='utf-8') as f:    research_plan = json.load(f)task_description = research_plan.get("task_description", "Single-cell perturbation prediction")dataset_info = research_plan.get("dataset", {})perturbations = research_plan.get("perturbations", [])`
_B) Actual metadata exists in Task_Analysis outputs (different schema)_
Example task analysis output contains task_description and dataset_info:
task_analysis_2026_.json
`{  "timestamp": "20260119_105842",  "task_description": "...",  "dataset_info": {    "dataset_path": "cellforge/data/datasets/",    "dataset_name": "norman_2019_k562",    "data_type": "scRNA-seq",    "cell_line": "K562",    "perturbation_type": "CRISPRi"  },  ...}`
This means the prompt builder is likely reading the wrong file and/or wrong keys (dataset vs dataset_info).
_C) Code generation writes result.py but does not run/validate it_
OpenHandsCodeGenerator.generate_code() starts OpenHands, calls the chat API, extracts code, and writes result.py. No post-generation compile/run/test step is implemented:
__init__.py
`def generate_code(...):    ...    if not self.start_openhands_docker():        return None    if not self.wait_for_openhands_ready():        return None    full_prompt = f"{self.code_generation_prompt}\n\nRESEARCH PLAN:\n{research_plan_json}"    code_file_path = self._send_to_openhands(full_prompt, output_dir)    if code_file_path:        logger.info(f"Code generated successfully: {code_file_path}")        return code_file_path    ...`

### Steps to Reproduce

Ensure` scAgents/cellforge/Task_Analysis/results/task_analysis_*.json` exists (generated by Task Analysis).
Run:
`python scAgents/cellforge/Code_Generation/auto_start_openhands.py`
Inspect the generated prompt (e.g., `~/.openhands-workspace/initial_prompt.md`) and/or logs.
Observe that dataset/task fields may be missing or “Unknown” due to schema/path mismatch.
Run the code generation flow and note it outputs result.py but does not attempt to compile/run/smoke-test it.

### Expected Behavior

Prompt creation should use the authoritative task analysis output by default (e.g., latest `Task_Analysis/results/task_analysis_*.json`), or allow explicit configuration of the source file via CLI flag/env var.
Schema/key compatibility should be handled (dataset_info vs dataset, etc.), with clear warnings when fields are missing.
After generation, the pipeline should provide an optional but default-safe validation loop such as:
_compile result.py`
optional smoke test (e.g., python result.py --help or minimal run)
optional tests/linting if applicable (pytest -q, etc.)
persist logs to output_dir and optionally re-prompt OpenHands to fix failures automatically.

### Actual Behavior
Prompt creation is hardcoded to results/research_plan.json and expects keys that don’t match the project’s task analysis schema, resulting in incomplete/incorrect prompt info.
Generated code is saved but not automatically validated by compilation/execution/testing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code_Generation: Prompt JSON source mismatch (research_plan.json vs task_analysis_*.json) + missing post-generation run/debug verification step #7

Evidence / Code Pointers

Steps to Reproduce

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Code_Generation: Prompt JSON source mismatch (research_plan.json vs task_analysis_*.json) + missing post-generation run/debug verification step #7

Description

Evidence / Code Pointers

Steps to Reproduce

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions