LLMModels & ArchitectureUpdated 2026.04.28

Tokenization

Also known as토큰화Tokenizer

In one line

Tokenization is the preprocessing step that breaks text into the small pieces a model actually consumes — and it directly drives cost, context length and multilingual performance.

Going deeper

Tokenization breaks text into the small chunks a model actually consumes. GPT-style models lean on variants of BPE; others use SentencePiece. One token is roughly four English characters on average, but Korean and Japanese pack many more tokens per character.

It matters in practice for very concrete reasons. API pricing is per token, context windows are defined in tokens, and the same Korean message routinely costs 1.5x to 3x as many tokens as its English version. The 'same content' is not the same cost or length depending on language.

For GEO and LLMO, it is worth a beat to think about how your content tokenizes — whether it splits into clean, quotable units. Sprawling sentences and tables that resist segmentation can quietly become harder for a model to cite.

Related terms

LLM

Instruction Tuning

Instruction tuning is the fine-tuning step that teaches a base LLM to follow instructions in natural language — the stage that turns 'a model that completes text' into 'a model you can actually ask things'.

LLM

Knowledge Distillation

Knowledge distillation trains a smaller 'student' model to mimic a larger 'teacher' model — preserving most of the quality while drastically cutting cost and latency.

LLM

Streaming Response

Streaming response is the mode where an LLM emits tokens to the client as they are generated, instead of waiting for the full answer to finish.

LLM

A large language model (LLM) is a neural network trained on massive text corpora to understand and generate human language — the engine behind ChatGPT, Claude, Gemini and similar products.

LLM

RAG

RAG (Retrieval-Augmented Generation) lets an LLM fetch external documents at answer time and ground its response in them — the technique behind ChatGPT Search, Perplexity and most AI search products.

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit