In practice
It is how every GPT-style LLM works: each new token depends on all the previous ones. It explains why generation is inherently sequential and hard to parallelize, and is the reason behind tricks like speculative decoding to speed it up.
It is how every GPT-style LLM works: each new token depends on all the previous ones. It explains why generation is inherently sequential and hard to parallelize, and is the reason behind tricks like speculative decoding to speed it up.