Data¶
This document describes the data preparation process for different stages: SFT, Reward, RLHF, DPO, OnlineDPO and GRPO.
The following is a collection of general environment variables used in this tutorial script:
ENV |
Explanation |
---|---|
|
The location where the ChatLearn code is cloned https://github.com/alibaba/ChatLearn.git |
|
The root directory for storing the SFT/Reward/RLHF/DPO/OnlineDPO/GRPO training dataset collection. |
1 Prepare SFT Training Data¶
Organize the question-response pairs of SFT data into a jsonl file, where each line of the jsonl file represents a SFT data sample in the following Python dictionary format:
{'query': question, 'response': reply}
Taking the example of Anthropic’s helpful&harmless data, use the following code to store it in $DATASET_ROOT/sft/train.jsonl
.
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=$path_to_dataset_root
python data/prepare_data_sft.py $DATASET_ROOT
2 Prepare Reward Training Data¶
First, prepare question-different response pairs and organize them into a jsonl file. Each line in the jsonl file represents a Reward model training data sample in the following Python dictionary format:
{'query': question, 'response': [reply 1, reply 2, ...], 'score': [score1, score2, ...]}
The score value indicates the quality of the corresponding response, with higher scores indicating higher quality and closer to human preference.
Taking the example of Anthropic’s helpful&harmless data, use the following code to store it in
$DATASET_ROOT/rm/train.jsonl
and$DATASET_ROOT/rm/dev.jsonl
.
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=path-to-dataset-root
python data/prepare_data_reward.py $DATASET_ROOT
3 Prepare Alignment Training Data¶
ChatLearn supports multiple alignments: RLHF, DPO, OnlineDPO, GRPO
Firstly, prepare a dataset of instructions to be explored and organize it into a JSON file. Each line in the JSON file should represent a prompt in the following format:
{"prompt": prompt}
Taking Anthropic’s helpful & harmless data as an example, use the following code to store the dataset in
$DATASET_ROOT/alignment/train.jsonl
and$DATASET_ROOT/alignment/dev.jsonl
:
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=path-to-dataset-root
python data/prepare_data_alignment.py $DATASET_ROOT
4 Prepare Math Training Data¶
Firstly, prepare a dataset of math data to be explored and organize it into a JSON file. Each line in the JSON file should represent a prompt in the following format:
{"eval_func": "math_rule", "prompt": prompt, 'answer': answer}
Taking openai/gsm8k data as an example, use the following code to store the dataset in
$DATASET_ROOT/math/train.jsonl
:
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=path-to-dataset-root
python data/prepare_data_math.py $DATASET_ROOT