Config¶
- class chatlearn.utils.arguments.RuntimeEnvConfig[source]¶
Runtime env config, you can refer https://docs.ray.io/en/latest/ray-core/handling-dependencies.html for more information.
- pip: List[str] = []¶
pip install packages
- py_modules: List[str] = []¶
python modules
- working_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/chatlearn/checkouts/v1.0.2/docs/en'¶
working directory
- platform: str = ''¶
platform, e.g., DLC
- excludes: List[str] = []¶
excludes files from packaging
- class chatlearn.utils.arguments.RuntimeConfig[source]¶
training related configs.
- num_episode: int = 5000¶
[required] number of episodes. One episode includes a inference and training loop.
- sample_per_episode: int = 1000¶
[required] number of samples per episode.
- num_training_epoch: int = 1¶
[optional] number of training epoch per episode. default set to 1.
- generation_batch_size: int = 2¶
[required] generation(inference) batch size.
- train_micro_batch_size: int = 2¶
[required] training micro batch size.
- train_global_batch_size: int = None¶
[required] training global batch size.
- save_episode_interval: int = None¶
[required] save checkpoint per save_episode_interval episodes.
- log_interval: int = 1¶
[optional] log time and memory per log_interval iterations.
- data_path: str = None¶
data_path for dataset
- Type:
[required]
- colocation: List[str] = []¶
colocate models into the same device
- Type:
[optional]
- eval_episode_interval: int = 0¶
eval every N episode, if 0, will not eval
- Type:
[optional]
- enable_resume_training: bool = True¶
enable resume training when data checkpoint is set
- Type:
[optional]
- data_checkpoint_path: str = None¶
checkpoint for dataloader
- Type:
[optional]
- max_data_ckpt_nums: int = None¶
max data checkpoint nums
- Type:
[optional]
- load_data_checkpoint_iteration: int = None¶
load data checkpoint from iteration
- Type:
[optional]
- stream_data_loader_type: str = 'fixed'¶
stream_data_loader type, [“fixed”, “dynamic”]
- Type:
[optional]
- debug: bool = False¶
private
- nsys: bool = False¶
enable nsys nvtx
- profiler_dir: str = None¶
profiler dir
- coalesce_param: bool = True¶
coalesce parameters in model sync
- coalesced_buffer_mb: int = 100¶
coalesce_buffer size in mb
- concurrent_comm: bool = True¶
concurrent parameter sync
- param_sync_comm_type: str = 'broadcast'¶
parameter sync communication type, broadcast/p2p
- param_sync_max_workers: int = None¶
parameter sync max workers
- max_relay_episode: int = 0¶
max number of relay episodes, if max_relay_episode is set to -1, then relay all episodes if max_relay_episode is set to 0, then relay is disabled
- relay_episode_offset: int = 0¶
relay after n episodes
- consumed_samples: int = 0¶
consumed samples
- concurrent_setup: bool = False¶
concurrent model setup
- bucket_size_mb_in_memory_manager: int = 1024¶
bucket size in the memory manager to reduce peak memory
- free_sync_collective_group: bool = False¶
free collective group after parameter synchronization and rebuild before next synchronization
- cpu_schedule_strategy: str = 'SPREAD'¶
[optional] cpu only model schedule policy, PACK or SPREAD PACK: All provided bundles are packed onto a single node on a best-effort basis. SPREAD: Each bundle is spread onto separate nodes on a best-effort basis.
- exp_name: str = 'CHATLEARN'¶
exp name for each run
- output_dir: str = './'¶
output dir
- class chatlearn.utils.arguments.ModelConfig[source]¶
Config for model.
- num_device: int = 0¶
[legacy] number of GPU used for one model, default 0.
- num_gpu: int = 0¶
[required] number of GPU used for one model, default 0, same as num_device
- num_cpu: int = 0¶
[required] number of GPU used for one model, default 0
- gpu_per_process: int = None¶
[optional] gpu per process, e.g., for PyTorch DDP, Megatron, DeepSpeed, gpu_per_process is set to 1
- cpu_per_process: int = None¶
[optional] cpu per process
- num_replica: int = 1¶
[optional] number of module replica, for gpu model, num_replica = num_gpu // (TP * PP * DP), for cpu model, num_replica = num_cpu // cpu_per_process
- trainable: bool = False¶
[required] whether model is trainable
- tensor_model_parallel_size: int = None¶
[optional] tensor model parallel size
- pipeline_model_parallel_size: int = None¶
[optional] pipeline model parallel size
- zero_size: int = None¶
[optional] zero size
- model_config_file: str = ''¶
[optional] config file for model
- config_dir: str = ''¶
- model_type: str = ''¶
[optional] model type, e.g., Torch/Tensorflow, etc
- generation_batch_size: int = -1¶
[optional] generation batch size, will overwrite generation batch size in RuntimeConfig
- offload_optimizer_states = False¶
offload optimizer states
- sync_frequency = 1¶
parameter sync frequency
- offload_weights = False¶
offload weights
- free_grad_buffers = False¶
free grad buffers
- free_memory = False¶
overall switch for offload optimizer states/weights and free grad buffers
- args_dict: dict = None¶
[optional] placeholder for other args
- lora: LoraConfig = None¶
lora config
- batch_generation: BatchGenerationConfig = None¶
batch generation config
- class chatlearn.utils.arguments.BatchGenerationConfig[source]¶
Config for batch generation ranking and memory-efficiency.
- ranking: bool = False¶
[optional] sort prompts by length each episode.
- min_prompt_length: int = 0¶
[optional] min prompt length in the first stage of batch generation.
- class chatlearn.utils.arguments.LoraConfig[source]¶
Config for lora
- enable_lora: bool = False¶
enable lora, default False.
- part_module_name: str = None¶
The “name_scope” parameter is used to specify a particular module to be converted to its LoRA. By default, it is set to None, which means there is no restriction on the module and any module can be converted using the “lora_layer” parameter. However, if “name_scope” is set to a specific value (e.g., “encoder”), only the modules whose name_scope contains the value “encoder” will be converted to LoRA.
- lora_dim: int = 8¶
The rank value of the LoRA, which is the r dimension of the A/B matrix.
- lora_dropout: float = 0.0¶
The LoRA dropout ratio refers to whether dropout computation is inserted in the forward pass of the LoRA layer. By default, the dropout ratio is set to 0.0.
- lora_scaling: float = 1.0¶
When adding the values of the LoRA A and B matrices to the original weight matrix, the scaling value is set as “W = W + A * B * lora_scaling”. By default, the scaling value is set to 1.0.
- lora_layer: str = 'ColumnParallelLinear,Embedding,LinearLayer,RowParallelLinear,VocabParallelEmbedding'¶
The layer class names involved in LoRA training in the model, separated by commas.
- column_only_qkv: bool = False¶
LoRA training is enabled only in the ColumnParallelLinear layer of the MHA QKV module.