Original article excerpt
Server-side extracted preview paragraphs from the original source.
I tested Opus 4.8 against 4.7 using coding, medical, finance, and legal traps, then cross-checked the results with multiple AIs.
Last week, Anthropic released its latest frontier large language model, Claude Opus 4.8. One of the signature features of this new release is that it is more honest and "has noticeably better judgment" than previous releases.
Before I take you through the whole testing process and some detailed results, let me bottom-line it for you. In some ways, Opus 4.8 is better than the previous Opus 4.7 model. Opus 4.7 itself is quite capable.
