All you need to know about Code LLM: Datasets, foundation models, fine-tuning, reasoning, and agents

Feb. 2025


In the following google doc., we categorize and summarize recent papers on the whole pipeline of code LLMs. It starts from pre-training and fine-tuning datasets and foundation models to newly emerging fine-tuning techniques and agents. The fine-tuning parts include the latest topic of test-phase reasoning. The agent part includes the top-ranked open-source coding and SE agents, covering different topics, such as patching and debugging.

Overview of code LLMs

It is a commentable link. We welcome new contributors (Feel free to leave your name, we will ack your contribution).