Skip to content
AImpact
IT EN
Medium AI Infrastructure · 1 min read

bitsandbytes 0.43: QLoRA and NF4/FP4 quantization for 4-bit fine-tuning

In one sentence bitsandbytes 0.43 updates QLoRA support with NF4 and FP4 data types, optimized inference-time dequantization on A100/H100, and improved PEFT integration for efficient 4-bit LLM fine-tuning.

Verified Official source
ShareLinkedInX
Reading level

Training an AI model on your own data normally requires many expensive GPUs. QLoRA is a technique that revolutionized this: it allows fine-tuning of enormous models on much more accessible hardware, like a single 24 GB consumer GPU.

The trick combines two ideas: loading the base model in 4-bit compressed format (using much less memory), and training only a small subset of additional parameters (LoRA) at full precision. Memory is saved while preserving most of the quality.

bitsandbytes 0.43 improves this technique with more precise numeric types (NF4) and faster GPU kernels for A100 and H100, making QLoRA a mature and reliable solution for customizing open-source models in production.

Companies

Tim Dettmers, HuggingFace

Tools

bitsandbytes, QLoRA, PEFT, HuggingFace Transformers

Tags

bitsandbytesQLoRAFine-tuningQuantizzazioneNF4A100H100

Sources