Coding AIs Tend to Suffer From the Dunning-Kruger Effect

fooferdoggie

Site Master
Site Donor
Joined
Aug 15, 2020
Posts
6,275
New research shows that coding AIs such as ChatGPT suffer from the Dunning-Kruger Effect, often acting most confident when they are least competent. When tackling unfamiliar or obscure programming languages, they claim high certainty even as their answers fall apart. The study links model overconfidence to both poor performance and lack of training data, raising new concerns about how much these systems really know about what they don’t know.



Anyone who has spent even a moderate amount of time interacting with Large Language Models about factual matters will already know that LLMs are frequently disposed to give a confidently wrong response to a user query.

Along with more overt forms of hallucination, the reason for this empty boastfulness is not 100% clear. Research published over the summer suggests that models give confident answers even when they know they are wrong, for instance; though other theories ascribe overconfidence to architectural choices, among other possibilities.

What the end user can be certain about is that the experience is incredibly frustrating, since we are hard-coded to put faith in people’s estimations of their own abilities (not least because in such cases there are consequences, legal and otherwise, to a person over-promising and under-delivering); and a kind of anthropomorphic transference means we tend to replicate this behavior with conversational AI systems.

But an LLM is an unaccountable entity which can and will effectively return a ‘Whoops! Butterfingers…’ after it has helped the user to inadvertently destroy something important, or at least waste an afternoon of their time; assuming it will admit liability at all.
 
Completely guessing here, but I could imagine that the "confidence" is an artifact of data scarcity.

We know (I think ;-) that AI's perform best when they are in the center of the gaussian distribution curve on any matter - i.e., where they have the largest data samples to work with.
With a lot of data, the are more likely to be correct, but know that there is noise etc and may take that into account when formulating an answer.

At the 2 extremes of the data distribution curve, with few data points, they often get it wrong and hallucinate, but since there is little to contradict whatever they have, their answer is unequivocal - and wrong.
 
Completely guessing here, but I could imagine that the "confidence" is an artifact of data scarcity.

We know (I think ;-) that AI's perform best when they are in the center of the gaussian distribution curve on any matter - i.e., where they have the largest data samples to work with.
It's true that they perform better when there's lots of data points in the training set, but the reason for overconfidence is something else.

The first part is that LLMs have no real understanding of anything, including whether the answer they're giving contradicts well known facts. This is true even when those facts are abundantly represented in the training data set.

The second is that high confidence bullshit is the native language of the people OpenAI and friends most want to impress - management and money types. Therefore, these companies train their LLMs to use very confident and assertive language. If they trained them to emit weasel wording appropriate to how inherently unreliable LLMs are, it wouldn't make for a very impressive tech demo, and they wouldn't be able to attract gigadollar investments.
 
Back
Top