Embedding Labs

Training Language Models to Follow Instructions

Ouyang et al. • 2022

RLHFAlignmentInstruction-following

Abstract

The InstructGPT paper, which introduced the three-step alignment process (SFT, Reward Modeling, RLHF) used to turn base models into helpful assistants. This methodology made language models suitable for general use.

Perché È Importante

Standardized the RLHF pipeline
Addressed helpful and harmless alignment
Foundational work for instruction-following models

Vedi su arXiv Scarica PDF

Chiedi su questo articolo

Loading chat...