Reading level
Every AI chatbot has the same annoying thing: you wait for the text to "stream" across the screen, word by word. Even GPT-4 or Claude does it.
Groq, a hardware startup founded by an ex-Google engineer (Jonathan Ross, father of the first TPU), built a different chip called the LPU (Language Processing Unit). On Llama 2 70B their public demo answers at 500 tokens per second: basically, the whole reply appears instantly, faster than you can read it.
It is not just a demo trick: it changes what you can build. AI agents making 10 chained calls? Suddenly usable. Real-time voice? Possible. Inference speed, until now a bottleneck, becomes a tunable parameter.
Companies
Groq
Tools
Groq LPU, GroqCloud
Tags
GroqLPUInferenceHardwareLlama
Sources