Only 2.4% in math: Is ChatGPT turning dumb?

Only 2.4% in math: Is ChatGPT turning dumb?

Tech News

Why is ChatGPT in the news?

Recently, researchers Lingjiao Chen and James Zou from Stanford University, and Matei Zaharia from UC Berkeley tested GPT-3.5 and GPT-4 for solving math problems, answering sensitive and dangerous questions, generating code and for visual reasoning. The conclusion: the “performance and behaviour” of both these large language models (LLMs) “can vary greatly over time”. The March version of GPT-4 identified prime numbers with 97.6% accuracy. In the June version, accuracy collapsed to 2.4%. Both made “more formatting mistakes in code generation in June than in March”.

How did other experts react?

When the findings were published, AI expert Gary Marcus tweeted that “this instability will be LLMs’ undoing”. Jim Fan, senior scientist at Nvidia, opined that in a bid to make GPT-4 “safer”, OpenAI could have made it less useful, “leading to a possible degradation in cognitive skills”. He added that in a bid to cut costs, OpenAI could have reduced the parameters. Princeton professor of computer science Arvind Narayanan and a PhD student at the same university co-authored a response in which they argue, among other things, that variance in behaviour does not suggest a degradation in capability.

How is OpenAI reacting to this controversy?

Reacting to user criticism, Peter Welinder (in pic), vice-president of OpenAI, which owns ChatGPT, said GPT-4 was getting smarter with each new version. “When you use it more heavily, you start noticing issues you didn’t see before.” Logan Kilpatrick, lead of developer relations at OpenAI, tweeted: “We are actively looking into the reports people shared.”

What does this mean for users and cos?

Human resources tasks like onboarding, training, performance management, and employee queries and complaints can be automated using ChatGPT. But to integrate OpenAI’s application programming interfaces (APIs) with the business workflows of companies, one has to continuously monitor, retrain and fine-tune the models to ensure that they continue to produce accurate output and stay up-to-date. Variance in AI model behaviour only makes it a bigger challenge.

Is it a boost for open-source LLMs?

The day the paper was released, Meta too released the second version of its free open-source LLM called Llama 2 for research and commercial use, providing an alternative to the pricy proprietary LLMs sold by OpenAI like ChatGPT Plus and Google’s Bard. Interestingly, Databricks Inc., whose CTO is Zaharia (one of the paper’s authors), has open-sourced its LLM called Dolly 2.0. Hugging Face’s BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), too, is open to researchers to run.

Source link

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments