Instruction Tuning
Instruction tuning is a training phase in which an already-pretrained LLM is further optimized on (instruction, expected-response) pairs, structured as natural-language task descriptions. Unlike generic supervised fine-tuning, it explicitly focuses on standardized task descriptions to instill the ability to follow arbitrary commands. Google's FLAN work (2021) showed that training on 60+ diverse tasks dramatically improves zero-shot generalization. It is the technical foundation of models such as ChatGPT, Vicuna, and Flan-T5.
In practice
In practice, you prepare a dataset of thousands of examples in the format 'Instruction: … Response: …', often derived from existing NLP benchmarks reformatted as prompts. The base model is then fine-tuned on this data using a standard cross-entropy objective. A developer adapting an open-weights model (e.g., LLaMA) to a specific domain builds a vertical instruction dataset and uses frameworks like LLaMA-Factory, Axolotl, or HuggingFace TRL to run instruction tuning in a few hours on a single GPU.
Related terms
Seen in the wild
5 entries mentioning it- MediumWizardCoder: evolutionary instructions for GPT-4 level code generation
- HighInstructBLIP: visual instruction tuning on 26 datasets outperforms GPT-4V
- HighLLaVA: Visual Instruction Tuning opens the multimodal open-source era
- HighFlan-T5 and Flan-PaLM: instruction tuning scales to 1,800 tasks
- HighFLAN: instruction tuning that teaches models to follow directions