Embedding Labs

Training Language Models to Follow Instructions

Ouyang et al. • 2022

RLHFAlignmentInstruction-following

Résumé

The InstructGPT paper, which introduced the three-step alignment process (SFT, Reward Modeling, RLHF) used to turn base models into helpful assistants. This methodology made language models suitable for general use.

Pourquoi C'est Important

Standardized the RLHF pipeline
Addressed helpful and harmless alignment
Foundational work for instruction-following models

Voir sur arXiv Télécharger le PDF

Poser une question sur cet article

Loading chat...