Constitution
List of principles: 'Don't provide instructions for weapons.' 'Be respectful.' Model reads + applies.
Advertisement
Self-critique loop
Initial response → self-critique per constitution → revise → repeat. Trained model internalizes principles.
Advertisement
At inference
Can be applied at inference too: prompt with constitution + ask model to check output. Extra latency but strong safety layer.