Randomized smoothing
Wrap classifier with noise. Certifiable robustness within L2 ball. Trade accuracy for guarantees.
Advertisement
For LLMs
Very hard. Discrete token space. Some progress on classifier heads (safety, moderation classifiers).
Advertisement
Applications
Safety-critical classification: harm detection, medical Q, autonomous vehicles. Where formal guarantees demanded.