← Voltar para Pesquisa
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic) • 2022
AlignmentRLHFSafety
Resumo
The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.
Por Que Importa
- Foundational to Anthropic's approach to safe AI
- Reduces dependence on human feedback for safety alignment
- Scalable framework for encoding values into AI systems
Perguntar sobre este artigo
Loading chat...
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic) • 2022
AlignmentRLHFSafety
Resumo
The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.
Por Que Importa
- Foundational to Anthropic's approach to safe AI
- Reduces dependence on human feedback for safety alignment
- Scalable framework for encoding values into AI systems
Perguntar sobre este artigo
Loading chat...
