Training Intermediate Also known as: FIM · Infilling · Code Infilling

Fill-In-the-Middle

Fill-In-the-Middle (FIM) is a training objective for code models in which the model must predict a central span of text given the surrounding context — both what precedes it (prefix) and what follows it (suffix). Unlike standard left-to-right autoregressive generation, FIM enables the model to complete partially written functions, docstrings, variable names, or logic blocks in the middle of existing code. The technique rearranges training tokens into the form [PREFIX][SUFFIX][MIDDLE] or [PREFIX][MIDDLE][SUFFIX] and trains the model to complete the missing part. StarCoder, DeepSeek-Coder, and Codestral make extensive use of FIM, and it is the technical foundation of all modern code completion tools.

ShareLinkedIn X

In practice

A developer using GitHub Copilot or Cursor directly benefits from FIM every time they write a partial function and ask the model to complete the body: the model sees both the code before the cursor and the code after it. For those training their own code model, the FIM training pipeline requires randomly sampling spans to mask from the source code corpus and reformatting tokens with the special separators `<fim_prefix>`, `<fim_suffix>`, `<fim_middle>`. The typical ratio is 50% FIM + 50% left-to-right during pre-training to also preserve standard generative capability.

Related terms

Autoregressive Fine-tuning SFT

Seen in the wild

2 entries mentioning it

← All terms