Skip to content
Embedding LabsEmbedding Labs
Embedding Labs
Volver a Investigación

Constitutional AI: Harmlessness from AI Feedback

Bai et al. (Anthropic)2022

AlignmentRLHFSafety

Resumen

The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.

Por Qué Importa

  • Foundational to Anthropic's approach to safe AI
  • Reduces dependence on human feedback for safety alignment
  • Scalable framework for encoding values into AI systems

Preguntar sobre este artículo

Loading chat...