WizardCoder: evolutionary instructions for GPT-4 level code generation

In one sentence The WizardLM team applies Evol-Instruct to code, iteratively rewriting problems to increase complexity. WizardCoder-34B achieves 73.2% on HumanEval, matching GPT-4 at release time.

Needs review Reputable source

ShareLinkedIn X

How do you teach a model to solve hard programming problems when you only have simple ones? The WizardLM team answered this question with a technique called "Evol-Instruct": take a simple problem and automatically rewrite it to make it progressively more complex, adding requirements, constraints, and edge cases.

It is like training at the gym by increasing the weight each week instead of always doing the same exercise. By applying this idea to code, WizardCoder transformed a relatively modest base model (StarCoder 15B) into a system capable of solving programming problems at a level that equaled GPT-4 at the time.

The main result was WizardCoder-34B scoring 73.2% on HumanEval, the standard benchmark for code generation. This proved that proprietary data or enormous models are not necessarily required: a smart training dataset construction strategy can make the difference. The research influenced many subsequent coding model fine-tuning projects.