Question about the sft datasets

Hi, I have some question about the dataset in your training.

First, In your README, you use the sft dataset is 6000+, but in your paper, you sample 600.

Second,  `train.json` and `test.json` in https://github.com/mnluzimu/WebGen-Bench/tree/main/data ,just have `instruction` ,no `prompt_column: str = "question"` and `answer_column: str = "response_content"` in your `openr1/sft.py`.

I just notice https://github.com/mnluzimu/WebGen-Bench/tree/main/data have `messages_generate_xxx.jsonl` which have `system` `user` `assistant` content, your use this for supervising in sft? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the sft datasets #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about the sft datasets #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions