📚 Selected Publications
* indicates equal contribution. (Update in Sep 2025)
Diffusion for text

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation (preprint)
Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, Yizhe Zhang
DiffuCoder | We introduce DiffuCoder (7B), show that higher temperature diversifies both token choices and generation order; and propose coupled-GRPO, a diffusion-native RL method that avoids semi-AR and improves performance.

Scaling Diffusion Language Models via Adaptation from Autoregressive Models (ICLR 2025)
Shansan Gong*, Shivam Agarwal*, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong
DiffuLLaMA | We convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA.
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning (ICLR 2025)
Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong
Code | We demonstrate how discrete diffusion models effectively learn difficult subgoals that elude autoregressive models.
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (NeurIPS 2024)
Jiacheng Ye*, Shansan Gong*, Liheng Chen*, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Zhenguo Li, Wei Bi, Lingpeng Kong
DoT | DoT allows the reasoning steps to diffuse over time through the diffusion process.

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models
Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
Code| Accelerated version of DiffuSeq, where the discrete noise bridges the training and sampling stages, saving time consumption of these two stages.

DiffuSeq: Sequence to Sequence Text Generation With Diffusion Models
Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
DiffuSeq | Poster |
DiffuSeq is a powerful model for text generation, matching or even surpassing competitive AR, iterative NAR, and PLMs on quality and diversity.
Long context language models
GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models (ACL 2025)
Mukai Li, Lei Li, Shansan Gong, Qi Liu
GIRAFFE | Explore design choices to extend the context window of existing VLMs.
L-Eval: Instituting Standardized Evaluation for Long Context Language Models (ACL 2024 Outstanding)
Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu
L-Eval | A manually checked benchmark for long context language models with 20 sub-tasks.
Training-Free Long-Context Scaling of Large Language Models (ICML 2024)
Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong
ChunkLlama | A training-free method to extend Llama 2/3-70B to 100k context length.
In-Context Learning with Many Demonstration Examples
Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong
EVALM | The pre-trained language model with efficient attention and 8k context length.
Before LLMs

Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation
Shansan Gong, Kenny Q. Zhu
TCAR | Slides|
By leveraging different kinds of implicit feedback, we alleviate the trade-off between the precision and diversity.