In This Story
OpenAI ended its “12 Days of OpenAI” product-launch spree by unveiling the successor to its first “reasoning” model.
The new frontier model family includes o3 and o3-mini, the artificial intelligence startup said Friday. Neither model is being publicly launched yet, but they are now available for public safety testing.
“We view this as sort of the beginning of the next phase of AI, where you can use these models to do increasingly complex tasks that require a lot of reasoning,” OpenAI chief executive Sam Altman said during a livestreamed announcement.
The AI startup is skipping the 02 name, Altman said, “out of respect to our friends at Telefónica (TEF+0.25%), and in the grand tradition of OpenAI being really, truly bad at names.” O2, a brand of Spain’s Telefónica, is a mobile network operator in the U.K.
For the first time, OpenAI is opening the models for external safety testing. Safety and security researchers can sign up to preview and test the models, Altman said, adding that the startup plans to launch o3-mini around the end of January, followed by the full o3 model shortly after.
Compared to o1 and o1-mini, which launched in September, o3 outperformed o1 by almost 23 percentage points on OpenAI’s own SWE-Bench Verified evaluation, and reached a Codeforces rating of 2727, it said. Meanwhile, OpenAI’s chief scientist scored 2665, according to the startup. The new model also set a record on EpochAI’s Frontier Math evaluation, OpenAI said, and apparently more than tripled o1’s score on the ARC-AGI test.
OpenAI launched the full version of its o1 model out of preview during the first day of its “12 Days of OpenAI” promotional scheme. The startup also announced a new, $200-a-month subscription tier for ChatGPT called ChatGPT Pro, which includes a more advanced version of o1 called o1 pro mode.