What It Means When an AI Says It Is Not Sure

An AI says, “I’m not completely sure.” That sounds honest—but it doesn’t tell you whether the answer is right, wrong, or based on a missing page.

Cautious wording can be useful. It can also be generated as smoothly as confident wording. So what does AI uncertainty actually reveal?

Reasoning Models Explained Part 5 of 5

This five-day series explains what reasoning models do, how step-by-step prompting changes results, and why better-looking reasoning isn't always better thinking.

You ask an AI assistant for the deadline in a contract.

It replies:

I’m not completely sure, but the cancellation deadline appears to be June 30.

That sounds careful.

It may even make the answer feel more trustworthy than a confident reply.

But what does “I’m not sure” actually mean when an AI says it?

It doesn’t necessarily mean the model has measured its uncertainty in the same way a person would. The phrase may reflect unclear context, conflicting information, a cautious response style, or wording the model learned to use when an answer seems uncertain.

Sometimes that warning is useful.

Sometimes it’s missing when it should be there.

And sometimes the model says it’s unsure even when the answer is correct.

Uncertainty language is part of the generated answer

A language model produces phrases such as “I’m not sure,” “probably,” and “this may be” as part of its response.

Those phrases are generated from the prompt, the available context, the system’s instructions, and patterns learned during training.

The model may have reasons to use cautious language:

the question is ambiguous
important information is missing
two sources disagree
the task asks for a prediction
the system encourages careful wording
the answer depends on an assumption

That can make uncertainty language helpful.

But the sentence itself isn’t a direct reading from a perfect internal confidence meter.

Key idea:
When an AI says it isn’t sure, the phrase is evidence that the response is cautious. It isn’t proof that the system has measured the real chance of being wrong.

Confidence and truth are different

An AI answer can sound confident and be wrong.

It can also sound uncertain and be correct.

These are separate questions:

How confident does it sound?

This is about tone, wording, and how strongly the answer is presented.

Is the answer correct?

This depends on facts, sources, calculations, and whether the conclusion follows.

People often mix these two together.

A firm answer feels more reliable. A hesitant answer feels less reliable.

That shortcut works sometimes with people because tone may reflect memory, experience, or doubt.

With AI, the relationship can be weaker.

The model can give a smooth, confident answer because that style fits the prompt. It can also use cautious wording because the application has been designed to avoid overclaiming.

Neither tone proves the answer is right.

A real-life example: the missing contract page

Say you upload a contract and ask when it can be canceled.

The assistant finds this sentence:

Either party may cancel the agreement by providing written notice before the renewal date.

The renewal date appears to be June 30, so the assistant answers:

I’m not completely sure, but written notice may be required before June 30.

That caution is reasonable.

But imagine the uploaded file is missing an amendment that changed the renewal date to August 31.

The assistant’s uncertainty doesn’t solve the real problem.

It still lacks the right source.

What went wrong?
The assistant correctly signaled uncertainty, but the answer still depended on an incomplete document. Cautious wording couldn’t replace the missing evidence.

This is why uncertainty language should lead to a check, not simply to greater trust.

“Not sure” can be a useful warning

Even though it isn’t proof, uncertainty language can still be valuable.

It can tell the reader that the answer may depend on missing details or a weak inference.

That is better than pretending every answer is certain.

Useful uncertainty might look like this:

“The document doesn’t state the reason directly.”
“This conclusion depends on the renewal date being June 30.”
“I can’t confirm which policy version applies.”
“Two sections appear to conflict.”
“The available information supports more than one interpretation.”

These statements do more than say “maybe.”

They identify what is missing, uncertain, or disputed.

That makes the answer easier to review.

Vague uncertainty is less useful

Not all cautious language helps equally.

Compare these two answers:

Vague uncertainty

“I’m not sure, but the deadline is probably June 30.”

Useful uncertainty

“Page 4 lists June 30 as the renewal date, but the file may not include later amendments. Check the current agreement before acting.”

The first answer sounds careful but gives you little help.

The second explains:

where the date came from
what might make it wrong
what should be checked next

Good uncertainty is specific.

It tells you where the weakness is.

AI may fail to show uncertainty

One of the biggest problems is that an AI system may give a confident answer when the evidence is weak.

Say you ask why sales fell in one region.

The report says only:

Sales in the western region fell by 8% during the quarter.

The assistant replies:

Sales fell because a competitor launched a cheaper product in the region.

That may be possible.

But the report never states the cause.

The answer should have shown uncertainty or clearly labeled the explanation as a hypothesis.

Instead, it presented a plausible story as fact.

This is closely related to why AI can sound confident even when it is wrong.

AI may also sound unsure when it is right

The opposite can happen too.

An assistant may hedge a correct answer because the prompt is unclear, the system has been told to be cautious, or the topic is usually treated carefully.

For example:

I believe the total is $194.40, though you may want to verify the calculation.

If the calculation is correct, the cautious tone doesn’t make it less correct.

This matters because readers may reject a good answer simply because it sounds uncertain.

The better approach is to check the evidence or calculation rather than judging the tone.

What calibration means

Calibration is the relationship between confidence and actual correctness.

A well-calibrated system should be highly confident more often when it is right and less confident more often when the evidence is weak.

For example, imagine a system gives 100 answers that it rates as 80% confident.

If it is well calibrated, roughly 80 of those answers should be correct over many comparable cases.

That does not mean any one answer has been guaranteed.

Calibration is a pattern measured across many answers.

Calibration in plain English:
When the system says it is more confident, does that usually match a higher rate of correct answers?

Good calibration is difficult.

Models may be more reliable in familiar tasks and less reliable when questions are unusual, ambiguous, or outside the data they handle well.

A confidence score from one type of task may not transfer cleanly to another.

Uncertainty language is not the same as calibration

A sentence such as “I’m 80% confident” can look precise.

But unless the system has a tested method for producing that number, it may be generated text rather than a trustworthy probability.

The number can still sound scientific.

That does not make it calibrated.

Some AI systems include separate methods for estimating confidence, comparing multiple outputs, checking sources, or measuring whether answers are likely to be correct.

Those methods can be useful.

But a model simply writing a percentage in a sentence is not enough.

Watch for this:
An exact confidence percentage can create false precision when the system cannot explain how that number was produced or tested.

Reasoning models can still misjudge uncertainty

A reasoning model may spend more time checking a problem before answering.

That can help it notice contradictions, missing facts, or weak conclusions.

But more reasoning does not guarantee better uncertainty estimates.

The model can work through several steps, reach the wrong conclusion, and still sound confident.

It can also notice one possible problem and become overly cautious about an otherwise solid answer.

Reasoning quality and confidence quality are connected, but they are not the same.

A model may reason correctly and express uncertainty poorly.

It may also reason incorrectly and express strong confidence.

Ask what the uncertainty is about

When an AI says it is not sure, the most useful next question is:

What specific part of the answer is uncertain, and what information would resolve it?

This pushes the assistant to identify the source of uncertainty.

It might say:

the document contains two different dates
the relevant table is unreadable
the answer depends on an unstated assumption
the source does not explain the cause
a newer policy may exist

Those details are much more useful than a general warning.

You can also ask:

Which claim is directly supported?
Which part is an inference?
What source should I check?
What fact would change the answer?
Are there other reasonable interpretations?

What to check before trusting an uncertain answer

Uncertainty should change what you do next.

It should not automatically make you trust or reject the answer.

What to check:
Identify which claim is uncertain, trace it to the source, check any missing facts, and confirm whether the answer is a fact, an inference, or only one possible explanation.

A simple review can follow four questions:

What is the model unsure about?
Why is that part uncertain?
What evidence supports the answer?
What information would confirm or change it?

For high-risk tasks, the original source still matters more than the model’s tone.

This is why AI cannot always verify facts on its own. Verification requires dependable evidence and a reliable way to compare the answer against it.

The main idea

When an AI says it is not sure, the warning may be useful.

It may signal missing context, conflicting evidence, an assumption, or a task where several answers are possible.

But the phrase itself is not proof that the model has accurately measured its chance of being wrong.

Confidence is not truth.

Uncertainty is not falsehood.

A confident answer can be wrong.

An uncertain answer can be right.

The better questions are:

What evidence supports the answer?
Which part is uncertain?
What assumption is being made?
What information is missing?
How can the conclusion be checked?

“I’m not sure” is most useful when it points to a specific weakness.

It should start the verification process.

It should not replace it.

Reasoning Models Explained

What Reasoning Models Actually Do That Regular AI Does Not
Why Showing Its Work Does Not Mean AI Is Thinking Like a Human
How Chain-of-Thought Prompting Changes an AI Answer
Why AI Solves Some Logic Puzzles but Fails at Obvious Ones
What It Means When an AI Says It Is Not Sure — Current article

View the full Reasoning Models Explained series

← Previous: Why AI Fails at Obvious Logic Puzzles

Search This Blog

How AI Models Work