← Torna alla Ricerca
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic) • 2022
AlignmentRLHFSafety
Abstract
The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.
Perché È Importante
- Foundational to Anthropic's approach to safe AI
- Reduces dependence on human feedback for safety alignment
- Scalable framework for encoding values into AI systems
Chiedi su questo articolo
Loading chat...
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic) • 2022
AlignmentRLHFSafety
Abstract
The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.
Perché È Importante
- Foundational to Anthropic's approach to safe AI
- Reduces dependence on human feedback for safety alignment
- Scalable framework for encoding values into AI systems
Chiedi su questo articolo
Loading chat...
