Skip to content
AImpact
IT EN
Models Beginner Also known as: Large Language Model · Modello linguistico di grandi dimensioni

LLM

/el-el-em/

An AI model trained on huge amounts of text to predict the next word and generate natural language responses.

ShareLinkedInX

In practice

It is the engine behind ChatGPT, Claude, Gemini. When you embed an LLM into your product you pay per token and get a service that reads and writes text. Quality depends heavily on the chosen model and the prompt you give it.

Related terms

Seen in the wild

59 entries mentioning it
  1. Local AI 2025: Ollama, MLX LM, Apple Foundation Models triple the speed
    Medium
  2. Private LLM: models up to 7B directly on iPhone and Mac, fully offline
    Medium
  3. vLLM v0.7: chunked prefill by default and a redesigned V1 engine
    Medium
  4. NVIDIA NIM 1.0: Containerized LLM Inference with OpenAI-Compatible API
    High
  5. WebLLM and LLM in WASM: browser-based LLM inference via WebGPU, no server needed
    Medium
  6. Continuous Batching for LLM Serving: survey and state of the art of Orca, vLLM, SGLang, TGI
    Medium
  7. DeepMind: 60+ cases of Specification Gaming in LLMs documented
    High
  8. FlashInfer 0.2: attention library for LLM serving with paged KV cache and RoPE fusion
    Medium
  9. Prefill/decode disaggregation: separate GPUs for low TTFT and high throughput
    High
  10. KV Cache Quantization FP8/INT8: Double User Density per GPU
    High
  11. AnythingLLM 1.0: the complete local RAG stack for enterprise use
    High
  12. LLM Compressor: unified toolkit for quantization and sparsity with native vLLM integration
    Medium
  13. CyberSecEval 2: Meta's LLM cybersecurity benchmark
    Medium
  14. Dify 0.7: visual agentic workflows with integrated RAG and 10+ LLMs
    Medium
  15. DrEureka: LLM automates simulation-to-real transfer without manual tuning
    Medium
  16. NeMo Guardrails 0.8: NVIDIA's framework for adding safety rails to any LLM
    Medium
  17. Microsoft RoboGen: generating robot tasks, skills and environments from text
    Medium
  18. SGLang: 6.4x LLM throughput with RadixAttention and shared prefix caching
    Medium
  19. Continue.dev: open source IDE extension to connect any LLM to your editor
    Medium
  20. Codestral: Mistral's code model, 22B parameters and 80+ languages
    High
← All terms