How AI Handles Files You Upload

The answer says the number is right there in the file. Yet when you open page 12, the table tells a different story.

Uploaded documents pass through extraction, chunking, retrieval, and image reading before the model answers. Which parts reached it—and which parts quietly disappeared along the way?

AI Assistants at Work Part 3 of 5

This five-day series explains how AI assistants use context, files, instructions, tools, and review to handle real work.

Upload a PDF, spreadsheet, presentation, or image to an AI assistant, and the experience can feel surprisingly direct.

You attach the file. You ask a question. The assistant answers.

It can seem as though the model opened the document, looked through every page, understood the layout, and found exactly what you needed.

Sometimes the result really is that good.

But what's happening behind the screen is usually more complicated. The model may receive extracted text, selected sections, descriptions of images, or structured data produced by other parts of the system. What reaches the model may not be identical to what you see when you open the file yourself.

That difference explains why AI can summarize a long report well and still miss a number sitting in a table on page 12.

Uploading a file isn't the same as the model reading it like a person

When you read a document, you take in more than words.

You notice headings, page order, columns, footnotes, color, spacing, charts, labels, and where one section ends and another begins. You may also recognize that a sentence belongs to a warning box rather than the main argument.

An AI system may need to convert all of those elements into a form the model can process.

For a text-based PDF or Word document, the application may extract the written content and send it to the model as text. For a scanned page, it may first use optical character recognition to turn the image of the page into readable characters.

A spreadsheet may be converted into rows, columns, cell values, or a simplified text version. A presentation may be split into slide text, speaker notes, and image descriptions.

The model then works from that processed version.

It isn't necessarily seeing the file the way you see it.

A useful distinction:
The system opens and processes the file. The model works with whatever information that process successfully provides.

Text extraction can lose part of the document's meaning

Plain paragraphs are usually easier to handle than complicated layouts.

Picture a two-column research report. The left column contains the main explanation, while the right column contains notes and examples. If the text is extracted in the wrong order, a sentence from the right column may be dropped halfway into a paragraph from the left.

The words are all there. The structure isn't.

The same problem can happen with headers, footers, page numbers, sidebars, captions, and footnotes. A footer repeated on every page may appear dozens of times in the extracted text. A note at the bottom of a page may end up separated from the claim it was meant to qualify.

Scanned documents add another layer of trouble. A slightly blurred number may be read incorrectly. A zero may become the letter O. A decimal point may disappear. A handwritten note may not be captured at all.

And sometimes the system focuses on something nearly useless, such as a page number or repeated footer, while the important sentence sits right beside it.

Once the text reaches the model, the assistant may not know which parts were extracted correctly and which weren't.

It may treat a bad extraction as trustworthy source material.

Long files are often divided into smaller parts

A model can't always process an entire large document at once. Even when a system supports long files, it may still split the material into smaller sections so it can search and work with them more efficiently.

This process is often called chunking.

Say you upload a 200-page employee handbook and ask, “Can contractors carry unused vacation days into the next year?”

The system may search the document for sections related to contractors, vacation, leave, or carryover rules. It then sends the sections that look most relevant to the model.

That can work very well.

But here's the catch: the complete answer may be spread across several places. One section may describe the general vacation policy. Another may define who counts as a contractor. A third may list exceptions for certain regions.

If the system retrieves only the first section, the assistant may answer confidently from an incomplete rule.

This is why retrieval matters so much in file-based tasks. The model can only use the sections the system selects and places into its context.

A relevant-looking section isn't always the whole answer.

Summaries can be useful without being complete

Summarization is one of the best uses for uploaded files.

An assistant can take a long report and turn it into a short overview, a list of decisions, or a set of action items. That's genuinely useful.

Still, every summary involves choices.

The model has to decide which points deserve space and which can be left out. Your prompt helps shape that decision.

Ask for a “short summary,” and the assistant may focus on the document's central argument. Ask for “the financial risks and unresolved questions,” and it may pull out a very different set of details.

The file hasn't changed. The purpose has.

So there isn't always one perfect summary sitting inside the document, waiting to be found. Different readers need different summaries.

The trouble starts when the assistant leaves out something you consider essential.

Picture a project report that says the launch is on schedule, but only if a security review is completed by Friday. A broad summary might say, “The project remains on track.” That's technically close to the main message. But the condition attached to it may matter far more than the headline.

For important documents, don't ask only for a summary. Ask the assistant to identify conditions, exceptions, deadlines, risks, or disagreements that shouldn't be compressed away.

Tables, charts, and images are notoriously tricky to process

Documents often hide their most important information outside the main paragraphs.

A financial report may place the key numbers in a table. A medical paper may show the main result in a chart. A presentation may depend on a diagram that isn't fully explained in the slide text.

These elements are notoriously tricky for AI systems to process accurately.

A table isn't just a pile of numbers. Its meaning depends on row labels, column headings, units, merged cells, notes, and the relationship between one value and another.

Take a simple sales table:

North region: $420,000
South region: $390,000
Returns: shown in a separate negative column

If the negative column is extracted without its heading, the assistant may add the returns instead of subtracting them. All the numbers are present, yet the answer is still wrong.

It's the kind of mistake that feels almost absurd when you look at the original table.

Charts bring a different problem. The assistant may need to identify axes, legends, colors, trends, and labels. A line that rises sharply may represent revenue, error rate, temperature, or something else entirely. Without the legend, the shape alone tells you very little.

Images can also contain text that isn't captured clearly. Small labels, rotated text, low contrast, or overlapping elements may be missed.

Multimodal models can analyze images directly, but they can still misread fine details, count objects incorrectly, mix up labels, or confidently describe a pattern that isn't really there.

Seeing an image isn't the same as understanding every detail inside it.

How to get more reliable answers from uploaded files

A few small changes can make file-based work much safer.

Start by telling the assistant exactly what you need. “Review this file” is vague. “Find the cancellation terms, identify the notice period, and list any exceptions” gives the system a much clearer target.

It's also worth naming the section, page, sheet, or table when you already know where the answer should be.

For example:

Review the table on page 14. Compare the 2025 and 2026 totals, keep the original units, and tell me if any row is unclear or unreadable.

That request does three helpful things. It narrows the source, defines the comparison, and gives the assistant permission to admit uncertainty.

For important tasks, ask for support:

Which page or section supports the answer?
Was the answer taken from text, a table, or an image?
Were any pages unreadable or incomplete?
Did the document contain conflicting information?
Is the assistant quoting the file or making an inference?

You can also break a large task into stages. First ask the assistant to identify the relevant sections. Then ask it to analyze those sections. Finally, compare the answer with the original file.

That takes a little longer.

It also makes those frustrating, obvious-in-hindsight mistakes much easier to catch.

The main idea

When you upload a file, an AI assistant doesn't simply absorb the document the way a person would.

The surrounding system may extract text, read images, divide the file into sections, search for relevant material, and place selected information into the model's context.

The model then generates an answer from what it received.

If the extraction is accurate and the right sections are selected, the result can be excellent. If a table loses its headings, a scanned number is misread, or an important section isn't retrieved, the answer may be wrong even though it sounds polished.

So before trusting a file-based answer, ask:

What part of the file did the assistant actually use?
Was the important information in text, a table, or an image?
Could the layout or extraction have changed the meaning?
What should I verify in the original file?

Uploading a document gives an AI assistant access to more information.

It doesn't guarantee that every part of that information was read correctly.

AI Assistants at Work

What AI Assistants Actually Do When They Help With a Task
Why AI Assistants Need Context Before They Can Help Well
How AI Handles Files You Upload — Current article
Why AI Can Follow Instructions in One Step and Forget Them Later
What Makes an AI Workflow Reliable Instead of Just Impressive

View the full AI Assistants at Work series

← Previous: Why AI Assistants Need Context Next: Why AI Forgets Instructions →

Search This Blog

How AI Models Work