A , or LLM, learns by predicting the next in a sequence. That simple objective, repeated at massive scale, gives the model grammar, facts, and even reasoning ability. Training happens in three broad stages: , supervised , and .
In , the model reads a huge of text and code, often trillions of tokens. It adjusts its parameters, the internal weights that decide how input is transformed into output, to reduce the prediction . Each step uses backpropagation: gradients flow backward through the network, nudging every weight a little. The result is a base model that knows a lot about language but follows instructions poorly.
Supervised , or SFT, fixes that. Engineers collect examples of high-quality question-and-answer pairs and continue training on them. The model learns the format of helpful responses and stops drifting off topic. A smaller, cleaner at this stage often beats a larger noisy one.
Finally, techniques such as rank several candidate answers and reward the ones humans prefer. This shapes tone, safety, and reasoning style. After these three stages, the model is ready to ship. Understanding the pipeline helps explain why good data, careful evaluation, and clear goals matter more than just throwing more compute at the problem.