# Environment and Code Preparation

## 1. Image Preparation

ChatLearn supports vLLM and SGLang as backend frameworks for Rollout generation. Depending on the chosen Rollout backend, you can select the appropriate Docker image for your experiments.

### vLLM

You can prepare the image by referring to [Dockerfile.torch2.6.0.vllm085](https://github.com/alibaba/ChatLearn/blob/main/docker/torch/Dockerfile.torch2.6.0.vllm085). Alternatively, you can directly pull and use the following image:

```bash
dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-ubuntu24.04-cuda12.6-py312
```

### SGLang

You can prepare the image by referring to [Dockerfile.torch2.8.0.sglang052](https://github.com/alibaba/ChatLearn/blob/main/docker/torch/Dockerfile.torch2.8.0.sglang053). Alternatively, you can directly pull and use the following image:

```bash
dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.8.0-sglang0.5.3-ubuntu24.04-cuda12.6-py312
```
### Image History

| Image URL                                                    | Pkg Version                                    | Model List                                   |
| ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------ |
| dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.8.0-sglang0.5.3-ubuntu24.04-cuda12.6-py312 | sglang 0.5.3.post1<br>transformers 4.57.0 | Qwen3-VL<br>Qwen2.5-VL<br>Qwen3<br>Qwen2.5 |
| dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.8.0-sglang0.5.3-ubuntu24.04-cuda12.6-py312 | sglang 0.5.2<br>transformers 4.56.1       | Qwen2.5-VL<br>Qwen3<br>Qwen2.5             |
| dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-te2.7-ubuntu24.04-cuda12.6-py312 | vllm 0.8.5<br>transformer_engine 2.7      | Moonlight<br>Deepseek-r1                   |
| dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-ubuntu24.04-cuda12.6-py312 | vllm 0.8.5<br>transformers 4.51.3         | Qwen2.5-VL<br>Qwen3<br/>Qwen2.5            |

## 2. Code Preparation

```bash
# Download ChatLearn code
git clone https://github.com/alibaba/ChatLearn.git
```

If you choose Megatron as the training framework, you need to download [Pai-Megatron-Patch](https://github.com/alibaba/Pai-Megatron-Patch).

```bash
# Download Megatron-LM
git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
```

> If you encounter network connectivity issues with GitHub, you can alternatively download our pre-prepared `Pai-Megatron-Patch` archive using the following command:  
`wget https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/csrc/Pai-Megatron-Patch.tar && tar -xvf Pai-Megatron-Patch.tar`

## 3. Running Reinforcement Learning Experiments

ChatLearn supports FSDP and Megatron as training backends. Please refer to the following tutorials for detailed instructions:

- [End-to-End GRPO Training with FSDP](https://github.com/alibaba/ChatLearn/blob/main/docs/en/tutorial/tutorial_grpo_fsdp.md)
- [End-to-End GRPO Training with Mcore](https://github.com/alibaba/ChatLearn/blob/main/docs/en/tutorial/tutorial_grpo_mcore.md)