Mysterious “gpt2-chatbot” AI mannequin seems all of a sudden, confuses consultants

Last updated: 2024/05/01 at 3:53 PM

Robot fortune teller hand and crystal ball

On Sunday, phrase started to unfold on social media a couple of new thriller chatbot named “gpt2-chatbot” that appeared within the LMSYS Chatbot Enviornment. Some individuals speculate that it might be a secret check model of OpenAI’s upcoming GPT-4.5 or GPT-5 giant language mannequin (LLM). The paid model of ChatGPT is at present powered by GPT-4 Turbo.

At present, the brand new mannequin is just accessible to be used via the Chatbot Enviornment web site, though in a restricted method. Within the website’s “side-by-side” area mode the place customers can purposely choose the mannequin, gpt2-chatbot has a charge restrict of eight queries per day—dramatically limiting individuals’s skill to check it intimately.

To date, gpt2-chatbot has impressed loads of rumors on-line, together with that it could possibly be the stealth launch of a check model of GPT-4.5 and even GPT-5—or maybe a new model of 2019’s GPT-2 that has been educated utilizing new strategies. We reached out to OpenAI for remark however didn’t obtain a response by press time. On Monday night, OpenAI CEO Sam Altman seemingly dropped a touch by tweeting, “i do have a mushy spot for gpt2.”

A screenshot of the LMSYS Chatbot Arena — Enlarge / A screenshot of the LMSYS Chatbot Enviornment “side-by-side” web page displaying “gpt2-chatbot” listed among the many fashions for testing. (Crimson spotlight added by Ars Technica.)

Benj Edwards

Early experiences of the mannequin first appeared on 4chan, then unfold to social media platforms like X, with hype following not far behind. “Not solely does it appear to indicate unimaginable reasoning, however it additionally will get notoriously difficult AI questions proper with a way more spectacular tone,” wrote AI developer Pietro Schirano on X. Quickly, threads on Reddit popped up claiming that the brand new mannequin had wonderful skills that beat each different LLM on the Enviornment.

Intrigued by the rumors, we determined to check out the brand new mannequin for ourselves however didn’t come away impressed. When requested about “Benj Edwards,” the mannequin revealed a number of errors and a few awkward language in comparison with GPT-4 Turbo’s output. A request for 5 authentic dad jokes fell quick. And the gpt2-chatbot didn’t decisively cross our “magenta” check. (“Would the colour be referred to as ‘magenta’ if the city of Magenta did not exist?”)

A gpt2-chatbot outcome for “Who’s Benj Edwards?” on LMSYS Chatbot Enviornment. Errors and oddities highlighted in crimson.

Benj Edwards
A gpt2-chatbot outcome for “Write 5 authentic dad jokes” on LMSYS Chatbot Enviornment.

Benj Edwards
A gpt2-chatbot outcome for “Would the colour be referred to as ‘magenta’ if the city of Magenta did not exist?” on LMSYS Chatbot Enviornment.

Benj Edwards

So, no matter it’s, it is most likely not GPT-5. We have seen different individuals attain the identical conclusion after additional testing, saying that the brand new thriller chatbot does not appear to symbolize a big functionality leap past GPT-4. “Gpt2-chatbot is sweet. actually good,” wrote HyperWrite CEO Matt Shumer on X. “But when that is gpt-4.5, I’m upset.”

Nonetheless, OpenAI’s fingerprints appear to be everywhere in the new bot. “I believe it might be an OpenAI stealth preview of one thing,” AI researcher Simon Willison advised Ars Technica. However what “gpt2” is strictly, he does not know. After surveying on-line hypothesis, it appears that evidently nobody aside from its creator is aware of exactly what the mannequin is, both.

Willison has uncovered the system immediate for the AI mannequin, which claims it’s based mostly on GPT-4 and made by OpenAI. However as Willison famous in a tweet, that is no assure of provenance as a result of “the aim of a system immediate is to affect the mannequin to behave in sure methods, to not give it truthful details about itself.”

Supply hyperlink