Embedding Labs

Training Language Models to Follow Instructions

Ouyang et al. • 2022

RLHFAlignmentInstruction-following

Resumen

The InstructGPT paper, which introduced the three-step alignment process (SFT, Reward Modeling, RLHF) used to turn base models into helpful assistants. This methodology made language models suitable for general use.

Por Qué Importa

Standardized the RLHF pipeline
Addressed helpful and harmless alignment
Foundational work for instruction-following models

Ver en arXiv Descargar PDF

Preguntar sobre este artículo

Loading chat...