Token
In one line
A token is the basic unit an LLM reads and writes — usually a word or piece of a word. LLM pricing and context limits are all measured in tokens.
Going deeper
A token is the smallest unit an LLM operates on. It is not the same as a character or a word, and every model uses its own tokenizer. In English, one word is typically 1 to 2 tokens; in Korean, a single character can take 1 to 3 tokens. Frequent words compress into fewer tokens, while rare strings get split into many. Because LLM pricing, context limits and throughput are all measured in tokens, this is the first unit you have to internalise to run anything in production.
Technically, tokenizers usually rely on Byte Pair Encoding or SentencePiece. Those algorithms merge frequent character sequences into single tokens and split rare ones into smaller pieces. 'ChatGPT' might be a single token; an obscure neologism could be four or five. Korean costs more tokens than English mostly because it appears far less often in tokenizer training data, so the algorithm has fewer chances to compress it.
Two things matter for marketers. First, AI sees your copy as tokens, not as words. Compound words may stay glued together as one token or get split into several, and that subtly affects how the model picks up meaning. Second, since tokens map directly to cost, 'expressing the same idea in fewer tokens' is a real operational lever — concise system prompts and tighter content reduce bills meaningfully without changing what the user sees.
A common misread is to treat tokens and words as interchangeable. Punctuation, whitespace, emoji, numbers and even line breaks all consume tokens. Heavy formatting in an LLM response inflates token count, which inflates cost. Another misread is to assume token limits translate cleanly into character limits — a 'one million token context' holds roughly half as much effective Korean as effective English, so back-of-envelope math by characters consistently underestimates real usage.
Tokens are the single biggest cost variable for the Korean market. The same content costs 1.5 to 3 times more tokens in Korean than in English, which means a Korean RAG, chatbot or summariser can produce a bill two to three times higher than its US equivalent at the same traffic level. Korean LLM teams routinely shorten system prompts, tighten chunks, summarise upstream and cache aggressively to keep costs sane — and the same discipline applies to any production GEO system serving Korean users.
Related terms
LLM
A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.
LLMTransformer
The Transformer is the neural network architecture behind almost every modern LLM, using self-attention to weigh relationships between all tokens in a sequence in parallel.
LLMContext Window
The context window is the maximum number of tokens an LLM can take in at once — it defines how much content the model can consider in a single prompt.
LLMTemperature
Temperature is a parameter that controls how much randomness the LLM allows when picking the next token — lower values give consistent answers, higher values give more creative ones.
LLMEmbedding
An embedding is a numeric vector representation of text or other data that preserves semantic meaning — the foundation of semantic search, vector databases and RAG.
How does your brand show up in AI answers?
Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.
Get a free audit