2026年5月

Speculative DecodingMachine Learning & Deep Learning

What Is Speculative Decoding? A Complete Guide to the Lossless 2-3x LLM Inference Acceleration Technique, EAGLE / Medusa / Multi-Token Prediction Variants, and How It Differs from Continuous Batching

Speculative Decoding accelerates LLM inference 2-3x without changing the output distribution. A small draft model proposes tokens, the target model verifies them in a single forward pass, and only matching tokens are accepted. Standard in vLLM, TGI, and TensorRT-LLM.