"These kind of hallucinations are the last line of defense for so much white collar work"

(From AI Explained at 16:53)

Who else feels this way? If current products, especially Deep Research, just worked more reliably at their present capabilities, a significant part of the white collar economy would be impacted.

Even without agents, the levels of reasoning, intelligence, and information synthesis that we already have are more than sufficient to perform a lot of work. The problem is that they hallucinate so much that a competent human is required to check everything.

Along with the usual benchmarks, I'm excited by progress on benchmarks specifically testing hallucinations and the model's ability to detect them.