Embedding Labs

Training Language Models to Follow Instructions

Ouyang et al. • 2022

RLHFAlignmentInstruction-following

Abstract

The InstructGPT paper, which introduced the three-step alignment process (SFT, Reward Modeling, RLHF) used to turn base models into helpful assistants. This methodology made language models suitable for general use.

Warum Es Wichtig Ist

Standardized the RLHF pipeline
Addressed helpful and harmless alignment
Foundational work for instruction-following models

Auf arXiv ansehen PDF herunterladen

Fragen zu diesem Artikel stellen

Loading chat...