Feb. 2025
In the following google doc., we categorize and summarize recent papers on the whole pipeline of code LLMs. It starts from pre-training and fine-tuning datasets and foundation models to newly emerging fine-tuning techniques and agents. The fine-tuning parts include the latest topic of test-phase reasoning, and the agent part includes the top-ranked open-source and classifical code agents on patching, especially SWE-bench.
@article{guo2024code, title = {All you need to know about Code LLM: Datasets, foundation models, fine-tuning, reasoning, and agents}, author = {Guo, Wenbo and Li, Hongwei and Tang, Yuheng}, journal = {henrygwb.github.io}, year = {2024}, url = {https://henrygwb.github.io/posts/code_llm.htm} }