article thumbnail

Announcing a Benchmark to Improve AI Safety

Cars That Think

A simple keyword- or rules- based rating system for evaluating the responses is affordable and scalable, but isn’t adequate when models’ responses are complex, ambiguous or unusual. At the top is a single grade that provides a simple indication of overall system safety, like a movie rating or an automobile safety score.