IFEval — does it actually do what you ask?

Best AI models for instruction-following.

IFEval scores whether a model obeys constraints — word counts, JSON formats, specific phrasings. The score that translates to production agent reliability.

Benchmarks used: IFEVAL
# Model Score From
1 92.1 Meta AI
2 88.6 Meta AI

Showing top 2 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.

AI model leaderboards

More leaderboards.