Skip to content
AImpact
IT EN
Training Intermediate Also known as: Quantized LoRA

QLoRA

/kew-lor-ah/

A variant of LoRA that keeps the base model in 4-bit quantized form during fine-tuning, drastically cutting the GPU memory needed.

ShareLinkedInX

In practice

It lets you adapt 13B-70B parameter models on a single consumer GPU (e.g. RTX 4090 or 24-40 GB A100). It is the favorite technique for hobbyist or low-budget enterprise fine-tuning. Quality loss vs. full-precision fine-tuning is almost negligible.

Related terms

← All terms