article thumbnail

Announcing a Benchmark to Improve AI Safety

Cars That Think

The latest smartphones are more powerful than the fastest supercomputers from the year 2000. A simple keyword- or rules- based rating system for evaluating the responses is affordable and scalable, but isn’t adequate when models’ responses are complex, ambiguous or unusual. Measurement of performance, though, is not limited to chips.