|
|
Xin Gu
Computer Science and Technology, Chinese Academy of Sciences
Beijing, China
Email: guxin21@mails.ucas.ac.cn
[Google Scholar]
|
Biography
I am a fourth year Ph.D. student at
Chinese Academy of Sciences,
advised by Prof.
Tiejian Luo.
Before that, I got his Bachelor's degree from University of Electronic Science and Technology of China at June 2021.
My research focuses on multimodal video understanding, particularly on the following topics: Video Captioning, Spatio-Temporal Video Grounding, Long-term Video Understanding and Image Editing.
News
01/2025
|
Two paper is accepted by ICLR 2025!
|
02/2024
|
One papers are accepted by CVPR 2024!
|
11/2023
|
One paper is accepted by IJCV!
|
06/2023
|
One papers are accepted by ICCV 2023!
|
02/2023
|
One paper is accepted by CVPR 2023!
|
06/2022
|
Champion of the CVPR'22 LOVEU competition!
|
11/2021
|
Join Bytedance as a research intern.
|
Educations
|
[2021/09 - Present] | Ph.D. in Computer Science and Technology
GPA: 3.5 / 4.0.
|
|
[2017/09 - 2021/06] | B.Sc. in School of Information and Software Engineering
GPA: 3.9 / 4.0, Rank: 6 / 128.
|
Research Experiences
|
[2021/11 - Present] | Computer Vision Algorithm Intern
|
|
[2020/10 - 2024/12] | Cooperative Student
|
Publications and preprints
First Author / First Co-Author
|
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Jiali Yao*,
Xinran Deng*,
Xin Gu*,
Mengrui Dai,
Bing Fan,
Zhipeng Zhang,
Yan Huang,
Heng Fan,
Libo Zhang
[Paper (preprint)]
[Code]
|
|
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Xin Gu,
Yaojie Shen,
Chenxi Luo,
Tiejian Luo,
Yan Huang,
Yuewei Lin,
Heng Fan,
Libo Zhang
[Paper (ICLR 2025)]
[Code]
[Huggingface Checkpoint] (Oral)
|
|
Multi-Reward as Condition for Instruction-based Image Editing
Xin Gu,
Ming Li,
Libo Zhang,
Fan Chen,
Longyin Wen,
Tiejian Luo,
Sijie Zhu
[Paper (ICLR 2025)]
[Code]
[Huggingface Checkpoint]
|
|
Edit3k: Universal representation learning for video editing components
Xin Gu,
Libo Zhang,
Fan Chen,
Longyin Wen,
Yufei Wang,
Tiejian Luo,
Sijie Zhu
[Paper (preprint)]
[Code]
[Huggingface Dataset]
|
|
Context-Guided Spatio-Temporal Video Grounding
Xin Gu,
Heng Fan,
Yan Huang,
Tiejian Luo,
Libo Zhang
[Paper (CVPR 2024)]
[Code]
[Huggingface Checkpoint]
|
|
Local Compressed Video Stream Learning for Generic Event Boundary Detection
Libo Zhang*,
Xin Gu*,
Congcong Li,
Tiejian Luo,
Heng Fan
[Paper (IJCV)]
[Code]
|
|
Accurate and Fast Compressed Video Captioning
Yaojie Shen*,
Xin Gu*,
Kai Xu,
Heng Fan,
Longyin Wen,
Libo Zhang
[Paper (ICCV 2023)]
[Code]
|
|
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu,
Guang Chen,
Yufei Wang,
Libo Zhang,
Tiejian Luo,
Longyin Wen
[Paper (CVPR 2023)]
[Code]
|
|
Dual-Stream Transformer for Generic Event Boundary Captioning
Xin Gu,
Hanhua Ye,
Guang Chen,
Yufei Wang,
Libo Zhang,
Longyin Wen
1st on LOVEU@CVPR'22: Generic Event Boundary Captioning
[Technical Report (CVPR 2022 Workshops)]
|
Awards
12/2024
|
Chu Lee Yuet Wah Scholarship
|
04/2024
|
President Award, University of Chinese Academy of Sciences
|
06/2021
|
Outstanding Graduate, University of Electronic Science and Technology of China
|
08/2020
|
Third Prize, National WeChat Mini Program Competition
|
11/2019
|
National Encouragement Scholarship
|
10/2019
|
Excellent Student Scholarship, University of Electronic Science and Technology of China
|
05/2020
|
Honorable Mention, Mathematical Contest in Modeling
|
|