End-to-end training tutorial based on the Qwen model¶
This document describes DPO training based on the ChatLearn, DeepSpeed framework, and Qwen model.
The following is a collection of common environment variables used in this tutorial script:
ENV |
Meaning |
---|---|
|
Location where the ChatLearn code repository is cloned https://github.com/alibaba/ChatLearn.git |
|
Root directory where the training datasets are stored |
Setup: Image, Code, and Data Preparation¶
Image / Code¶
Please refer to Environment and Code Setup.
Data¶
The data format required by qwen2 is chatml:
{"type": "chatml", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me something about large language models."}, {"role": "assistant", "content": "Large language models are a type of language model that is trained on a large corpus of text data. They are capable of generating human-like text and are used in a variety of natural language processing tasks..."}], "source": "unknown"}
The following script can convert Dahoas/full-hh-rlhf
to data in chatml format and store it in the file $DATASET_ROOT/alignment/train.jsonl
:
cd ${CHATLEARN}/examples/huggingface/
DATASET_ROOT=path-to-dataset-root
python data/preprocess_data_chatml.py $DATASET_ROOT
DPO¶
Here is an example of DPO training for Qwen2-7B.
In this example, the user needs to set policy_model_path
to the initialization model checkpoint path, and the Policy model and Reference model will be initialized with this checkpoint.
export CHATLEARN=path-to-chatlearn
export DATASET_PATH=$DATASET_ROOT/alignment/train.jsonl
export policy_model_path=path-to-qwen2-ckpt
cd ${CHATLEARN}/examples/huggingface/
bash scripts/train_dpo_qwen.sh