I am a PhD candidate at HKU, supervised by Lingpeng Kong. My current research interests including diffusion language models and long context language models. I’m trying to explore different kinds of generation paradigms for better controllability and reasoning capacity.

Previouly, I work at Shark-NLP Shanghai AI Lab as a NLP researcher. I graduated from Shanghai Jiao Tong University (SJTU), supervised by Kenny Zhu. I used to work at pose estimation, face recognition, hierarchical text classification and recommendation systems.

➡️ Download my Resumé (update in Nov 2024)

“I can only show you the door, you’re the one that has to walk through it” – Morpheus (The Matrix)

📚 Publications

* indicates equal contribution. (Update in Feb 2025)

Diffusion for text

ICLR 2025

Scaling Diffusion Language Models via Adaptation from Autoregressive Models (ICLR 2025)

Shansan Gong*, Shivam Agarwal*, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong

DiffuLLaMA | We convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA.

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning (ICLR 2025)

Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong

Code | We demonstrate how discrete diffusion models effectively learn difficult subgoals that elude autoregressive models.

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (NeurIPS 2024)

Jiacheng Ye*, Shansan Gong*, Liheng Chen*, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Zhenguo Li, Wei Bi, Lingpeng Kong

DoT | DoT allows the reasoning steps to diffuse over time through the diffusion process.

EMNLP 2023 Findings

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

Code| Accelerated version of DiffuSeq, where the discrete noise bridges the training and sampling stages, saving time consumption of these two stages.

ICLR 2023

DiffuSeq: Sequence to Sequence Text Generation With Diffusion Models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

DiffuSeq | Poster | DiffuSeq is a powerful model for text generation, matching or even surpassing competitive AR, iterative NAR, and PLMs on quality and diversity.

Long context language models

GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models (preprint)

Mukai Li, Lei Li, Shansan Gong, Qi Liu

GIRAFFE | Explore design choices to extend the context window of existing VLMs.

Why Does the Effective Context Length of LLMs Fall Short? (ICLR 2025)

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong

STRING | A training-free method after analyzing the effective context length of LLMs.

L-Eval: Instituting Standardized Evaluation for Long Context Language Models (ACL 2024 Outstanding)

Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

L-Eval | A manually checked benchmark for long context language models with 20 sub-tasks.

Training-Free Long-Context Scaling of Large Language Models (ICML 2024)

Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

ChunkLlama | A training-free method to extend Llama 2/3-70B to 100k context length.

In-Context Learning with Many Demonstration Examples

Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong

EVALM | The pre-trained language model with efficient attention and 8k context length.

LLMs

BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models (ACL 2024 Findings)

Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

BBA is designed to maximize the potential of DSL in augmenting complex multi-modal reasoning tasks.

Before LLMs

Transferable and Efficient: Unifying Dynamic Multi-Domain Product Categorization (ACL 2023 Industry)

Shansan Gong*, Zelin Zhou*, Shuo Wang, Fengjiao Chen, Xiujie Song, Xuezhi Cao, Yunsen Xian, Kenny Zhu

Data | Poster | A new framework to unify the categorization process as well as leverage knowledge from different domains.

SIGIR 2022

Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation

Shansan Gong, Kenny Q. Zhu

TCAR | Slides| By leveraging different kinds of implicit feedback, we alleviate the trade-off between the precision and diversity.

🎊 Honors and Awards

2024 Tencent Rhino-bird Research Elite Program
2022 SIGIR Student Travel Award
2022 Outstanding Graduate in Shanghai Municipality
2019 Outstanding Undergraduate in SJTU

💬 Invited Talks

2023.06, DiffuSeq, Youth PhD Talk-ICLR 2023 by AI Time. | [Slides]
2023.05, Incorporate Diffusion Models into Conditional Text Generation, Global Lunch Seminar at SJTU CS department. | [Slides]

📖 Educations

2019.06 - 2022.03, Master, Computer Science, SEIEE, Shanghai Jiao Tong University.
2015.09 - 2019.06, Undergraduate, Information Engineering, SEIEE, Shanghai Jiao Tong University.

💻 Internship

2023.11 - 2024.10, Research Intern, Diffusion Text Generation, Tencent AI Lab , Shenzhen.
2021.12 - 2022.03, RE Intern, Product Categorization, Meituan , Shanghai.
2021.06 - 2021.10, SDE Intern, Bing Search Optimization, Microsoft STCA , Beijing.
2019.12 - 2022.03, CTO, iWenBooks APP Development, Yousheng Tech Inc , Shanghai.