Xin Gu

Computer Science and Technology, Chinese Academy of Sciences
Beijing, China

Email: guxin21@mails.ucas.ac.cn

Biography

I am a fourth year Ph.D. student at Chinese Academy of Sciences, advised by Prof. Tiejian Luo. Before that, I got his Bachelor's degree from University of Electronic Science and Technology of China at June 2021. My research focuses on multimodal video understanding, particularly on the following topics: Video Captioning, Spatio-Temporal Video Grounding, Long-term Video Understanding and Image Editing.

News

`01/2025`	Two paper is accepted by ICLR 2025!
`02/2024`	One papers are accepted by CVPR 2024!
`11/2023`	One paper is accepted by IJCV!
`06/2023`	One papers are accepted by ICCV 2023!
`02/2023`	One paper is accepted by CVPR 2023!
`06/2022`	Champion of the CVPR'22 LOVEU competition!
`11/2021`	Join Bytedance as a research intern.

Educations

	Chinese Academy of Sciences `[2021/09 - Present]` \| Ph.D. in Computer Science and Technology GPA: 3.5 / 4.0.
	University of Electronic Science and Technology of China `[2017/09 - 2021/06]` \| B.Sc. in School of Information and Software Engineering GPA: 3.9 / 4.0, Rank: 6 / 128.

Research Experiences

Publications and preprints

First Author / First Co-Author

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Jiali Yao*, Xinran Deng*, Xin Gu*, Mengrui Dai, Bing Fan, Zhipeng Zhang, Yan Huang, Heng Fan, Libo Zhang
[Paper (preprint)] [Code]

Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Xin Gu, Yaojie Shen, Chenxi Luo, Tiejian Luo, Yan Huang, Yuewei Lin, Heng Fan, Libo Zhang
[Paper (ICLR 2025)] [Code] [Huggingface Checkpoint] (Oral)

Multi-Reward as Condition for Instruction-based Image Editing
Xin Gu, Ming Li, Libo Zhang, Fan Chen, Longyin Wen, Tiejian Luo, Sijie Zhu
[Paper (ICLR 2025)] [Code] [Huggingface Checkpoint]

Edit3k: Universal representation learning for video editing components
Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, Yufei Wang, Tiejian Luo, Sijie Zhu
[Paper (preprint)] [Code] [Huggingface Dataset]

Context-Guided Spatio-Temporal Video Grounding
Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang
[Paper (CVPR 2024)] [Code] [Huggingface Checkpoint]

Local Compressed Video Stream Learning for Generic Event Boundary Detection
Libo Zhang*, Xin Gu*, Congcong Li, Tiejian Luo, Heng Fan
[Paper (IJCV)] [Code]

Accurate and Fast Compressed Video Captioning
Yaojie Shen*, Xin Gu*, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang
[Paper (ICCV 2023)] [Code]

Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu, Guang Chen, Yufei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
[Paper (CVPR 2023)] [Code]

Dual-Stream Transformer for Generic Event Boundary Captioning
Xin Gu, Hanhua Ye, Guang Chen, Yufei Wang, Libo Zhang, Longyin Wen
1st on LOVEU@CVPR'22: Generic Event Boundary Captioning
[Technical Report (CVPR 2022 Workshops)]

Awards

`12/2024`	Chu Lee Yuet Wah Scholarship
`04/2024`	President Award, University of Chinese Academy of Sciences
`06/2021`	Outstanding Graduate, University of Electronic Science and Technology of China
`08/2020`	Third Prize, National WeChat Mini Program Competition
`11/2019`	National Encouragement Scholarship
`10/2019`	Excellent Student Scholarship, University of Electronic Science and Technology of China
`05/2020`	Honorable Mention, Mathematical Contest in Modeling