Hi, this is Yukang Chen (ιηεΊ·)βs website!
I am a Research Scientist in NVIDIA, working with Prof. Song Han.
I got my Ph.D. degree in CUHK, supervised by Prof. Jiaya Jia.
During my Ph.D. study, I worked closely with Prof. Xiaojuan Qi and Dr. Xiangyu Zhang.
I focus on Long AI - Efficiently scaling AI to long horizons.
This direction covers, but is not limited to, the following topics:
- π Long-context LLMs: Efficient long-context LLMs via sparse attention.
- π₯ Long-video VLMs: Scaling VLMs to long videos via sequence parallelism.
- π§ Long-sequence Reasoning: Long-sequence RL for LLMs/VLMs via sequence parallelism.
- π¬ Long-video Generation: ShortβLong AR with efficient fine-tuning via sparse attention.
- π Long-range Autonomous Driving: Long-range 3D perception in AD via sparse convolution.
If you are interested in Long AI and seeking collaboration, please feel free to contact me via Email.
π₯ News
- 2025.09: Β ππ Long-RL is accepted by Neuripsβ25!
- 2025.01: Β ππ LongVILA is accepted by ICLRβ25!
- 2024.09: Β ππ RL-GPT is accepted by Neuripsβ24 as Oral!
- 2024.02: Β ππ LISA is accepted by CVPRβ24 as Oral!
- 2024.01: Β ππ LongLoRA is accepted by ICLRβ24 as Oral!
- 2023.04: Β ππ 3D-Box-Segment-Anything is released, a combination of VoxelNeXt and SAM.
- 2023.04: Β ππ VoxelNeXt is accepted by CVPRβ23!
- 2022.03: Β ππ Focal Sparse Conv is accepted by CVPRβ22 as Oral!
- 2022.03: Β ππ Scale-aware AutoAug is accepted by T-PAMI!
π¬ Invited Talks and Report
- 2025.10: Invited Talk by ICCV 2025 HiGen Workshop (see link).
- 2025.10: LongLive was reported by ζ°ζΊε (see link).
- 2025.07: Long-RL was reported by ζΊε¨δΉεΏ (see link).
- 2023.10: LongLoRA was reported by ζ°ζΊε (see link).
- 2023.08: LISA was reported by ιεδ½ (see link).
- 2023.06: Invited Talk by CVRP 2023 ScanNet Workshop (see link).
- 2023.06: Invited Talk by VALSE 2023 Perception Workshop for VoxelNeXt.
- 2023.04: Invited Talk and reported by ε°ι¨εζ for VoxelNeXt (see link).
- 2022.06: Invited Talk by ζ·±θε¦ι’ for Focal Sparse Conv.
π Representative Publications (Full List)

QeRL: Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen
- Memory Saving - 33B LLMs RL on a single H100 GPU.
- Training Speedup - 1.7x end-to-end training speedup.
- High Performance - Comparable accuracy to full training.

LongLive: Real-time Interactive Long Video Generation
Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen
- Real-time Inference - 20.7 FPS generation on a single H100 GPU.
- Long Video Gen - Up to 240-second generation with interactive prompts.
- Efficient Fine-tuning - Extend Wan to minute-long in 32 H100 GPU-days.

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
- Efficient Fine-tuning - 100k context on a single 8x A100, 1.8x speed up.
- Easy Implementation - Shifted sparse attention, compatible with Flash-Attn.
- LongAlpaca - The first open-source long instruction-following dataset.

Long-RL: Scaling RL to Long Sequences
Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han
- MR-SP System - RL on hour-long videos (3,600 frames), up to 2.1x speedup.
- LongVILA-R1-7B - 8,192 frames/video and 71.1% on VideoMME with sub.
- LongVideo-Reason Dataset - 104K long-video QA-reasoning pairs.

LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han
- MM-SP System - 2M-tokens training on 256 GPUs, 1.4x faster than Megatron.
- LongVILA-7B - 99.8% on 6,000-frame (>1M tokens) needle-in-a-haystack.
- LongVILA-SFT Dataset - 54K high-quality long video QA pairs.

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
- Long-range Perception - 50m β 200m with minimal latency overhead.
- Compatible to Tracking - 1st on nuScenes LiDAR Tracking leaderboard (2022).
- VoxelNeXt x Segment Anything - 3D-Box-Segment-Anything

Focal Sparse Convolutional Networks for 3D Object Detection
Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia
- Learnable Conv Shape - Deformable kernels by cubic importance maps.
- Multi-modal Extension - Fuse important sparse features with RGB features.
π Academic Services
- Area Chair for AAAI 2026.
- Journal Reviewer: T-PAMI and T-TIP.
- Conference Reviewer: Neurips, ICLR, ICML, CVPR, ICCV, ECCV, and AAAI.
π Honors and Awards
- 2025 Worldβs Top 2% Scientists.
- 2023 Final-list candidate of ByteDance Scholarship.
- 2022 1st of nuScenes LiDAR Multi-Object Tracking leaderboard.
- 2019 Winner of COCO Detection Challenge (ICCV 2019 COCO Workshop).
- 2023 Winner of ScanNet Indoor Scene Understanding (CVPR 2023 ScanNet Workshop).