← Volver a Investigación
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic) • 2022
AlignmentRLHFSafety
Resumen
The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.
Por Qué Importa
- Foundational to Anthropic's approach to safe AI
- Reduces dependence on human feedback for safety alignment
- Scalable framework for encoding values into AI systems
Preguntar sobre este artículo
Loading chat...
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic) • 2022
AlignmentRLHFSafety
Resumen
The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.
Por Qué Importa
- Foundational to Anthropic's approach to safe AI
- Reduces dependence on human feedback for safety alignment
- Scalable framework for encoding values into AI systems
Preguntar sobre este artículo
Loading chat...
