LLMInference & InterfacesUpdated 2026.04.28

Streaming Response

Also known as토큰 스트리밍SSE Streaming

In one line

Streaming response is the mode where an LLM emits tokens to the client as they are generated, instead of waiting for the full answer to finish.

Going deeper

Streaming response sends tokens to the client the moment the model produces them, instead of waiting for the whole answer to finish. The typewriter effect in ChatGPT is exactly this; the transport is usually Server-Sent Events or WebSockets.

The headline benefit is perceived latency, not total latency. The full request takes the same time, but a fast first token tells the user the model is working — and abandonment rates drop noticeably. On mobile and in voice interfaces it is close to mandatory.

In agent systems streaming graduates from a UX detail into a structural feature. Patterns like 'intercept tool-call decisions mid-stream and act on them' are common, which is why most modern LLM SDKs treat streaming as a first-class citizen rather than an option.

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit