Hi, this is Yukang Chen (้็ๅบท)โs website!
I am a Research Scientist in NVIDIA, working with Prof. Song Han.
I got my Ph.D. degree in CUHK, supervised by Prof. Jiaya Jia.
During my Ph.D. study, I worked closely with Prof. Xiaojuan Qi and Dr. Xiangyu Zhang.
I focus on Long AI - Efficiently scaling AI to long horizons.
This direction covers, but is not limited to, the following topics:
- ๐ Long-context LLMs: Efficient long-context LLMs via sparse attention.
- ๐ฅ Long-video VLMs: Scaling VLMs to long videos via sequence parallelism.
- ๐ง Long-sequence Reasoning: Long-sequence RL for LLMs/VLMs via sequence parallelism.
- ๐ฌ Long-video Generation: ShortโLong AR with efficient fine-tuning via sparse attention.
- ๐ Long-range Autonomous Driving: Long-range 3D perception in AD via sparse convolution.
If you are interested in Long AI and seeking collaboration, please feel free to contact me via Email.
๐ฅ News
- 2026.04: ย ๐๐ TriAttention is accepted by ICMLโ26!
- 2026.01: ย ๐๐ LongLive and QeRL are accepted by ICLRโ26!
- 2025.09: ย ๐๐ Long-RL is accepted by Neuripsโ25!
- 2025.01: ย ๐๐ LongVILA is accepted by ICLRโ25!
- 2024.09: ย ๐๐ RL-GPT is accepted by Neuripsโ24 as Oral!
- 2024.02: ย ๐๐ LISA is accepted by CVPRโ24 as Oral!
- 2024.01: ย ๐๐ LongLoRA is accepted by ICLRโ24 as Oral!
- 2023.04: ย ๐๐ 3D-Box-Segment-Anything is released, a combination of VoxelNeXt and SAM.
- 2023.04: ย ๐๐ VoxelNeXt is accepted by CVPRโ23!
- 2022.03: ย ๐๐ Focal Sparse Conv is accepted by CVPRโ22 as Oral!
- 2022.03: ย ๐๐ Scale-aware AutoAug is accepted by T-PAMI!
๐ฌ Invited Talks and Report
- 2026.05: TriAttention was reported by ๆฐๆบๅ (see link).
- 2025.10: Invited Talk by ICCV 2025 HiGen Workshop (see link).
- 2025.10: LongLive was reported by ๆฐๆบๅ (see link).
- 2025.07: Long-RL was reported by ๆบๅจไนๅฟ (see link).
- 2023.10: LongLoRA was reported by ๆฐๆบๅ (see link).
- 2023.08: LISA was reported by ้ๅญไฝ (see link).
- 2023.06: Invited Talk by CVRP 2023 ScanNet Workshop (see link).
- 2023.06: Invited Talk by VALSE 2023 Perception Workshop for VoxelNeXt.
- 2023.04: Invited Talk and reported by ๅฐ้จๅๆ for VoxelNeXt (see link).
- 2022.06: Invited Talk by ๆทฑ่ๅญฆ้ข for Focal Sparse Conv.
๐ Representative Publications (Full List)

LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
Yukang Chen , Luozhou Wang, Wei Huang, Shuai Yang, Bohan Zhang, Yicheng Xiao, Ruihang Chu, Weian Mao, Qixin Hu, Shaoteng Liu, Yuyang Zhao, Huizi Mao, Ying-Cong Chen, Enze Xie, Xiaojuan Qi, Song Han
- The first open-source FP4 Infra for Long Video Gen.
- Real-time Inference - 45.7 FPS on 5B model.
- Support real-video training, few-step distillation, multi-shot, sequence-parallel, NVFP4 KV cache, and async VAE decoding.

LongLive: Real-time Interactive Long Video Generation
Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen
- Real-time Inference - 20.7 FPS generation on a single H100 GPU.
- Long Video Gen - Up to 240-second generation with interactive prompts.
- Efficient Fine-tuning - Extend Wan to minute-long in 32 H100 GPU-days.

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen
- High Efficiency - 2.5x higher FPS and 10.7x KV memory reduction in LLMs.
- OpenClaw - 32B LLM on a 24GB GPU.
- Long Video Gen - Reducing 50% KV Cache in AR Long Video Generation.

Long-RL: Scaling RL to Long Sequences
Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han
- MR-SP System - RL on hour-long videos (3,600 frames), up to 2.1x speedup.
- LongVILA-R1-7B - 8,192 frames/video and 71.1% on VideoMME with sub.
- LongVideo-Reason Dataset - 104K long-video QA-reasoning pairs.

LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han
- MM-SP System - 2M-tokens training on 256 GPUs, 1.4x faster than Megatron.
- LongVILA-7B - 99.8% on 6,000-frame (>1M tokens) needle-in-a-haystack.
- LongVILA-SFT Dataset - 54K high-quality long video QA pairs.

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
- Efficient Fine-tuning - 100k context on a single 8x A100, 1.8x speed up.
- Easy Implementation - Shifted sparse attention, compatible with Flash-Attn.
- LongAlpaca - The first open-source long instruction-following dataset.
๐ Academic Services
- Area Chair for AAAI 2026.
- Journal Reviewer: T-PAMI and T-TIP.
- Conference Reviewer: Neurips, ICLR, ICML, CVPR, ICCV, ECCV, and AAAI.
๐ Honors and Awards
- 2025 Worldโs Top 2% Scientists.
- 2023 Final-list candidate of ByteDance Scholarship.
- 2022 1st of nuScenes LiDAR Multi-Object Tracking leaderboard.
- 2019 Winner of COCO Detection Challenge (ICCV 2019 COCO Workshop).
- 2023 Winner of ScanNet Indoor Scene Understanding (CVPR 2023 ScanNet Workshop).