Understanding AI: Context Windows — AI's Working Memory

AI's working memory went from 6 pages to 1,500

Jan 19, 2026

"The context window is the new RAM."

— Andrej Karpathy

If you’ve used ChatGPT or Claude, you’ve probably noticed that the AI seems brilliant for the first few exchanges, then starts forgetting what you told it.

This isn’t the AI being dumb. It’s running out of working memory.

What Is a Context Window?

The context window is AI's short-term memory—the limit on how much information it can hold in mind during a single conversation. Everything the AI "knows" about your exchange—the conversation so far, the documents you've uploaded, the instructions you've given—has to fit in that memory.

Until recently, that memory was small. When ChatGPT launched in late 2022, it ran on GPT-3.5 with a context window of roughly 4,096 tokens—about 3,000 words, or 6 pages of text.

This meant the AI could hold the equivalent of a short patient encounter note before things started falling off the edge. Long conversations? Forget about them (literally—the model did). Complex documents? You had to chop them into pieces and hope the AI could stitch the meaning back together.

The Explosion in Scale

Here’s what’s happened since:

Google’s Gemini 2.5 Pro now supports a 1 million token context window—with 2 million coming soon. That’s roughly 1,500 pages of text, or an entire novel plus your company’s codebase in one conversation.

OpenAI’s GPT-4.1 jumped to 1 million tokens, and GPT-5.2 offers a 400,000-token window optimized for coding and enterprise work. Their latest coding model can coherently work across millions of tokens through a process called “compaction.”

Anthropic’s Claude offers 200,000 tokens by default, with a 1 million token window now available for enterprise API users. Enterprise Claude.ai users get 500,000 tokens.

To put this in perspective: we’ve gone from 6 pages to 1,500 pages.

What Held Context Windows Back

Why were context windows small for so long?

Math.

These models calculate how every word relates to every other word—and that calculation scales quadratically as context grows. Basically, double the input length, quadruple the compute and memory (source).

Recent innovations changed that: new methods for encoding word positions (rotary position embeddings), parallel processing techniques (ring attention), and architectural changes that let models attend to nearby words more than distant ones (sparse attention).

The result: context windows have grown from thousands of tokens to millions, while costs have remained manageable.

Why This Actually Matters

Bigger numbers are meaningless without practical implications. Here’s what larger context windows enable:

Accuracy and coherence. When the AI can “see” your entire conversation at once—or your entire document, or your entire codebase—it doesn’t have to guess about context. It doesn’t forget your patient’s allergy list because you mentioned it 47 messages ago. It doesn’t hallucinate details because it lost track of what you uploaded.

Real-world professional use. Consider what becomes possible in medicine:

Upload a decade of clinic notes for a patient with an undiagnosed condition and ask the AI to identify patterns you might have missed.
Feed in a complex prior authorization denial along with the patient's full chart and ask the AI to draft an appeal using the relevant clinical evidence.
Upload a complex patient's entire chart—progress notes, labs, imaging reports, specialist consultations, medication lists—and ask for a synthesis. Not a summary of the last visit. The whole picture.
Upload an entire clinical trial protocol and patient's history to assess eligibility and potential concerns.

The Next Phase: Persistent Memory

Context windows are the AI’s working memory within a single conversation. But what happens between conversations?

This is where things get interesting. The major AI labs are now combining large context windows with persistent memory—systems that maintain information across days, weeks, or months. ChatGPT and Claude already build a profile of your preferences, your projects, your working style.

The implication: AI shifts from a tool you prompt fresh each time to something more like a colleague who knows your work. One that remembers your patients’ histories, your research interests, your communication preferences. That remembers the literature review you started last month and can pick it up without you re-explaining everything.

The Bottom Line

Context windows are infrastructure—the kind of thing you don’t notice until it limits you. If you tried AI tools in 2023 and found them forgetful, disjointed, or incapable of handling real-world documents, the technology has genuinely changed. Now the bottleneck isn't the AI's memory. It's whether we learn to use it.

Superintelligent Medicine

Ready for more?