Configuration Files¶
Training Configuration File¶
Users need a main YAML configuration file to set up the runtime environment, model configurations, and RLHF training process-related configurations. Additionally, users may need separate model configurations for each model.
The RLHF training configuration consists of three parts:
runtime_env
: Configuration for the runtime environment.models
: Model configurations. Each model can have its own specific parameter configuration. Different models can be distinguished by usingmodel_name
, which corresponds to themodel_name
passed in when defining the model in the main file.rlhf
: RLHF training configuration.
Below is an example of a training configuration. For detailed explanations of the configuration options, please refer to the Config API Documentation.
To facilitate the configuration of different hyperparameters, we also support reading parameters from environment variables. The format is as follows:
param: ${env_name:default_value}
param
is the parameter name, env_name
is the environment variable name, and default_value
is the default value (optional).
In the following example, if the environment variable ref_generation_batch_size
is set, the value will be read from the environment variable and assigned to reference
’s generation_batch_size
. If the environment variable ref_generation_batch_size
is not set, the default value of 4 will be used.
runtime_env:
platform: DLC
excludes:
- "*pt"
- "logs"
- "tensorboards"
- ".nfs*"
models:
policy:
model_config_file: policy_inference.yaml
num_gpu: 8
trainable: False
reference:
model_config_file: reference.yaml
num_gpu: 8
trainable: False
generation_batch_size: ${ref_generation_batch_size:4}
reward:
model_config_file: reward_inference.yaml
num_gpu: 8
trainable: False
value:
model_config_file: old_value_inference.yaml
num_gpu: 8
trainable: False
ppo_policy:
model_config_file: ppo_policy.yaml
num_gpu: 8
trainable: True
ppo_value:
model_config_file: ppo_value.yaml
num_gpu: ${num_gpu:16}
trainable: True
runtime:
colocation:
- policy,ppo_policy,reward,reference,value,ppo_value
generation_batch_size: ${generation_batch_size:4}
train_micro_batch_size: 2
train_global_batch_size: ${train_global_batch_size:512}
num_episode: 200
sample_per_episode: ${sample_per_episode:1024}
num_training_epoch: 1
save_episode_interval: ${save_episode_interval:50}
data_path: ${data_path}
eval_episode_interval: ${eval_episode_interval:100}
Model Configuration YAML¶
This framework supports separate configuration files for each model, which can be used to configure hyperparameters, parallelization strategies, checkpoint initialization, and more for different models. The model configuration file is in YAML format. Here is a simple example of a model configuration:
num_layers: 6
hidden_size: 768
num_attention_heads: 12
bf16: True
seq_length: 2048
tensor_model_parallel_size: 8
pipeline_model_parallel_size: 2
load: path-to-ckpt
To simplify the sharing of configuration across different models, we have extended the syntax of YAML by introducing the include
field to inherit configurations from a base configuration file. In the example below, policy_inference.yaml
and ppo_policy.yaml
share parameters such as num_layers
and hidden_size
, while each model has its own specific pipeline_model_parallel_size
configuration.