📚 Selected Publications

* indicates equal contribution. (Update in Sep 2025)

Diffusion for text

preprint

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation (preprint)

Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, Yizhe Zhang

DiffuCoder | We introduce DiffuCoder (7B), show that higher temperature diversifies both token choices and generation order; and propose coupled-GRPO, a diffusion-native RL method that avoids semi-AR and improves performance.

ICLR 2025

Scaling Diffusion Language Models via Adaptation from Autoregressive Models (ICLR 2025)

Shansan Gong*, Shivam Agarwal*, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong

DiffuLLaMA | We convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA.

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning (ICLR 2025)

Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong

Code | We demonstrate how discrete diffusion models effectively learn difficult subgoals that elude autoregressive models.

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (NeurIPS 2024)

Jiacheng Ye*, Shansan Gong*, Liheng Chen*, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Zhenguo Li, Wei Bi, Lingpeng Kong

DoT | DoT allows the reasoning steps to diffuse over time through the diffusion process.

EMNLP 2023 Findings

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

Code| Accelerated version of DiffuSeq, where the discrete noise bridges the training and sampling stages, saving time consumption of these two stages.

ICLR 2023

DiffuSeq: Sequence to Sequence Text Generation With Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

DiffuSeq | Poster | DiffuSeq is a powerful model for text generation, matching or even surpassing competitive AR, iterative NAR, and PLMs on quality and diversity.

Long context language models

GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models (ACL 2025)

Mukai Li, Lei Li, Shansan Gong, Qi Liu

GIRAFFE | Explore design choices to extend the context window of existing VLMs.

L-Eval: Instituting Standardized Evaluation for Long Context Language Models (ACL 2024 Outstanding)

Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

L-Eval | A manually checked benchmark for long context language models with 20 sub-tasks.

Training-Free Long-Context Scaling of Large Language Models (ICML 2024)

Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

ChunkLlama | A training-free method to extend Llama 2/3-70B to 100k context length.

In-Context Learning with Many Demonstration Examples

Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong

EVALM | The pre-trained language model with efficient attention and 8k context length.

Before LLMs

SIGIR 2022

Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation

Shansan Gong, Kenny Q. Zhu

TCAR | Slides| By leveraging different kinds of implicit feedback, we alleviate the trade-off between the precision and diversity.

Sansa Gong

📚 Selected Publications

Diffusion for text

Long context language models

Before LLMs