Skip to content
Embedding LabsEmbedding Labs
Embedding Labs
Zurück zur Forschung

Constitutional AI: Harmlessness from AI Feedback

Bai et al. (Anthropic)2022

AlignmentRLHFSafety

Abstract

The foundational paper behind Claude. Constitutional AI extends RLHF by introducing a set of written principles (a 'constitution') that enables models to critique and revise their own outputs. This approach reduces reliance on human feedback for harmlessness while maintaining helpfulness, providing a scalable path to safer AI systems.

Warum Es Wichtig Ist

  • Foundational to Anthropic's approach to safe AI
  • Reduces dependence on human feedback for safety alignment
  • Scalable framework for encoding values into AI systems

Fragen zu diesem Artikel stellen

Loading chat...