现有GUI Agent的训练数据、训练方法及实例分析
UFO:A UI-Focused Agent for Windows OS Interaction
Megatron & Swift监督微调Qwen3-8B
一行代码,解锁SFT泛化能力:深度解读DFT如何完胜传统微调
SFT专攻Pass@k,RL强化Pass@1?
Agentic RL
使用 Flowise 构建基于私有知识库的智能客服 Agent(图文教程)
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models’ Reasoning Abilities
Camel框架
极简 Megatron-LM 模型并行切分介绍