LLMModels & ArchitectureUpdated 2026.04.28

Context Window

Also known as컨텍스트 길이Context Length맥락 창

In one line

The context window is the maximum number of tokens an LLM can take in at once — it defines how much content the model can consider in a single prompt.

Going deeper

The context window is the maximum number of tokens — input plus output — an LLM can process in a single call. Early GPT-3.5 sat around 4K tokens, roughly 3,000 English words. Today's GPT-5, Claude 4 and Gemini 2 lineages routinely advertise a million tokens or more, which is enough to drop an entire book or product manual into a single prompt and ask questions over the whole thing.

Technically the context window is not 'how much the model remembers' — it is how many tokens can sit inside one attention computation. Compute and memory grow super-linearly with context length, and pricing scales directly with input tokens. When model providers brag about long context, they are flagging a real engineering achievement: efficient attention at million-token scale is hard, not free.

For marketers, larger context windows quietly change content strategy. When an AI can read your full manual, blog or whitepaper in one shot, the playbook shifts from 'trim everything down to a tight FAQ' to 'keep deep, comprehensive content available in clean, citable form'. Within RAG pipelines this is amplified, because more retrieved chunks fit into the same prompt, favouring brands that own a lot of substantive content.

A common misread is that bigger context means better use of every token in it. The well-documented 'Lost in the Middle' effect shows that models tend to under-use information sitting in the middle of long contexts. Practically that means definitions, conclusions and key claims still belong near the top and bottom of a page. A one-line summary up front and a clear conclusion at the end is good UX and good GEO at the same time.

On the operational side, large context is double-edged. Korean text takes roughly 1.5 to 2 times more tokens than English, so a million-token window holds less effective content and costs more in Korean than the headline number suggests. 'Just dump everything in' is almost never the right answer in the Korean market — better retrieval and tighter chunking beat raw context length. The window opens the door; structuring what walks through it is still on the content team.

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit