Embedding Labs

Training Language Models to Follow Instructions

Ouyang et al. • 2022

RLHFAlignmentInstruction-following

Abstract

The InstructGPT paper, which introduced the three-step alignment process (SFT, Reward Modeling, RLHF) used to turn base models into helpful assistants. This methodology made language models suitable for general use.

Why It Matters

Standardized the RLHF pipeline
Addressed helpful and harmless alignment
Foundational work for instruction-following models

View on arXiv Download PDF

Ask about this paper

Loading chat...