Models Beginner Also known as: Architettura Transformer

Transformer

A neural network architecture introduced by Google in 2017 that uses the attention mechanism to process text in parallel rather than word by word.

ShareLinkedIn X

In practice

It is the foundation of basically every modern LLM. If you build products you do not need to implement it from scratch: you use frameworks like PyTorch or call APIs. Knowing it is parallelizable explains why training needs heavy GPUs.

Related terms

Attention LLM Foundation model

Seen in the wild

19 entries mentioning it

April 15, 2025

CrossFormer: a single transformer for 20+ robot embodiments with rigorous scaling analysis

High
August 20, 2024

bitsandbytes 0.43: QLoRA and NF4/FP4 quantization for 4-bit fine-tuning

Medium
August 1, 2024

FLUX.1: the new open standard for photorealistic image generation

Landmark
June 5, 2024

FP8 Training with NVIDIA Transformer Engine: Half the Memory, Same Quality

High
March 5, 2024

Stable Diffusion 3: Diffusion Transformer architecture and improved text

High
February 15, 2024

Sora: OpenAI shows cinema-quality AI video

Landmark
July 28, 2023

RT-2: the robot that reasons with a language model

High
July 28, 2023

FlashAttention-2: rewrite with 2x speedup, MQA/GQA support, and head-dim 256

High
December 16, 2022

DeepMind RT-1: the first Transformer trained on real robotics data

High
June 21, 2022

FlashAttention: IO-aware attention that revolutionizes transformer training

Landmark
May 12, 2022

Gato: DeepMind tries a single agent for 600+ tasks

High
March 22, 2022

NVIDIA H100 and Hopper architecture: the foundation-model GPU

Landmark
January 12, 2021

Switch Transformer: Google scales to 1.6T parameters with Mixture of Experts

High
October 22, 2020

Vision Transformer (ViT): "An Image is Worth 16x16 Words"

Landmark
July 22, 2020

Longformer: sliding-window attention for long documents

Medium
July 9, 2020

HuggingFace Transformers 3.0: Rust tokenizers and the Model Hub

High
June 17, 2020

Image GPT: generative pretraining for images

Medium
May 28, 2020

GPT-3: the paper that opens the scaling-laws era

Landmark
January 13, 2020

Reformer: the transformer that handles very long sequences

Medium

← All terms