“CriticGPT’s suggestions are not always correct, but we find that they can help trainers to catch many more problems with model-written answers than they would without AI help,” the company wrote. “Additionally, when people use CriticGPT, the AI augments their skills, resulting in more comprehensive critiques than when people work alone, and fewer hallucinated bugs than when the model works alone.”
And therein lies the logic problem here. One of the criticisms of generative AI is that it is terrific at mimicking humans but fails to actually understand humans. I’m reminded of a column I wrote more than a decade ago, about engineers creating a product that tests for true love. (It was an actual product: a Bluetooth bra that would unhook only when it detected true love. Really. To be clear, I am not officially suggesting that engineers are as bad at understand human emotions as genAI. Not disputing it, but also not officially saying it.)
Getting back to genAI logic, the flawed assumption that OpenAI is making is that humans will continue checking their systems for lies. Humans are lazy, and human IT employees are overworked and under-resourced. The far more likely outcome is that humans will trust the AI-watching-AI more and more. That is where the real danger exists.