Taxonomy

13 hazard categories: violence, hate, sexual, criminal, weapons, defamation, etc. Configurable per app.

Advertisement

Deployment

Run alongside primary LLM. Classify user input + LLM output. Block/redact on hit.

Advertisement

Latency

Small model (7B). ~50ms on GPU. Streaming supported via chunked classification.