Skip to content
Embedding LabsEmbedding Labs
Embedding Labs
Retour à la Recherche

Constitutional AI: Harmlessness from AI Feedback

Bai et al. (Anthropic)2022

AlignmentRLHFSafety

Résumé

The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.

Pourquoi C'est Important

  • Foundational to Anthropic's approach to safe AI
  • Reduces dependence on human feedback for safety alignment
  • Scalable framework for encoding values into AI systems

Poser une question sur cet article

Loading chat...