All you need to know about Code LLM: Datasets, foundation models, fine-tuning, reasoning, and agents

Feb. 2025


In the following google doc., we categorize and summarize recent papers on the whole pipeline of code LLMs. It starts from pre-training and fine-tuning datasets and foundation models to newly emerging fine-tuning techniques and agents. The fine-tuning parts include the latest topic of test-phase reasoning, and the agent part includes the top-ranked open-source and classifical code agents on patching, especially SWE-bench.

Overview of code LLMs




@article{guo2024code, 
  title   = {All you need to know about Code LLM: Datasets, foundation models, fine-tuning, reasoning, and agents},
  author  = {Guo, Wenbo and Li, Hongwei and Tang, Yuheng},
  journal = {henrygwb.github.io},
  year    = {2024},
  url     = {https://henrygwb.github.io/posts/code_llm.htm}
}