SFT专攻Pass@k,RL强化Pass@1?
Agentic RL
使用 Flowise 构建基于私有知识库的智能客服 Agent(图文教程)
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models’ Reasoning Abilities
Camel框架
极简 Megatron-LM 模型并行切分介绍
多Agent
ray accelerate trainer lightning pytorch
xpu_timer
Qwen3技术报告解读