Hi, this is Yukang Chen (ι™ˆηŽ‰εΊ·)’s website!
I am a Research Scientist in NVIDIA, working with Prof. Song Han.
I got my Ph.D. degree in CUHK, supervised by Prof. Jiaya Jia.
During my Ph.D. study, I worked closely with Prof. Xiaojuan Qi and Dr. Xiangyu Zhang.

I focus on Long AI - Efficiently scaling AI to long horizons.
This direction covers, but is not limited to, the following topics:

  • πŸ“š Long-context LLMs: Efficient long-context LLMs via sparse attention.
  • πŸŽ₯ Long-video VLMs: Scaling VLMs to long videos via sequence parallelism.
  • 🧠 Long-sequence Reasoning: Long-sequence RL for LLMs/VLMs via sequence parallelism.
  • 🎬 Long-video Generation: Shortβ†’Long AR with efficient fine-tuning via sparse attention.
  • πŸš— Long-range Autonomous Driving: Long-range 3D perception in AD via sparse convolution.

If you are interested in Long AI and seeking collaboration, please feel free to contact me via Email.

LongAI

πŸ”₯ News

  • 2025.09: Β πŸŽ‰πŸŽ‰ Long-RL is accepted by Neurips’25!
  • 2025.01: Β πŸŽ‰πŸŽ‰ LongVILA is accepted by ICLR’25!
  • 2024.09: Β πŸŽ‰πŸŽ‰ RL-GPT is accepted by Neurips’24 as Oral!
  • 2024.02: Β πŸŽ‰πŸŽ‰ LISA is accepted by CVPR’24 as Oral!
  • 2024.01: Β πŸŽ‰πŸŽ‰ LongLoRA is accepted by ICLR’24 as Oral!
  • 2023.04: Β πŸŽ‰πŸŽ‰ 3D-Box-Segment-Anything is released, a combination of VoxelNeXt and SAM.
  • 2023.04: Β πŸŽ‰πŸŽ‰ VoxelNeXt is accepted by CVPR’23!
  • 2022.03: Β πŸŽ‰πŸŽ‰ Focal Sparse Conv is accepted by CVPR’22 as Oral!
  • 2022.03: Β πŸŽ‰πŸŽ‰ Scale-aware AutoAug is accepted by T-PAMI!

πŸ’¬ Invited Talks and Report

  • 2025.10: Invited Talk by ICCV 2025 HiGen Workshop (see link).
  • 2025.10: LongLive was reported by ζ–°ζ™Ίε…ƒ (see link).
  • 2025.07: Long-RL was reported by ζœΊε™¨δΉ‹εΏƒ (see link).
  • 2023.10: LongLoRA was reported by ζ–°ζ™Ίε…ƒ (see link).
  • 2023.08: LISA was reported by 量子位 (see link).
  • 2023.06: Invited Talk by CVRP 2023 ScanNet Workshop (see link).
  • 2023.06: Invited Talk by VALSE 2023 Perception Workshop for VoxelNeXt.
  • 2023.04: Invited Talk and reported by ε°†ι—¨εˆ›ζŠ• for VoxelNeXt (see link).
  • 2022.06: Invited Talk by 深蓝学陒 for Focal Sparse Conv.

πŸ“ Representative Publications (Full List)

Arxiv 2025
sym

QeRL: Quantization-enhanced Reinforcement Learning for LLMs

[Paper] [Code] [Abstract] QeRL

Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen

  • Memory Saving - 33B LLMs RL on a single H100 GPU.
  • Training Speedup - 1.7x end-to-end training speedup.
  • High Performance - Comparable accuracy to full training.
Arxiv 2025
sym

LongLive: Real-time Interactive Long Video Generation

[Paper] [Code] [Abstract] LongLive

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen

  • Real-time Inference - 20.7 FPS generation on a single H100 GPU.
  • Long Video Gen - Up to 240-second generation with interactive prompts.
  • Efficient Fine-tuning - Extend Wan to minute-long in 32 H100 GPU-days.
ICLR 2024 Oral
sym

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

[Paper] [Code] [Abstract] LongLoRA

Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

  • Efficient Fine-tuning - 100k context on a single 8x A100, 1.8x speed up.
  • Easy Implementation - Shifted sparse attention, compatible with Flash-Attn.
  • LongAlpaca - The first open-source long instruction-following dataset.
NeurIPS 2025
sym

Long-RL: Scaling RL to Long Sequences

[Paper] [Code] [Abstract] Long-RL

Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han

  • MR-SP System - RL on hour-long videos (3,600 frames), up to 2.1x speedup.
  • LongVILA-R1-7B - 8,192 frames/video and 71.1% on VideoMME with sub.
  • LongVideo-Reason Dataset - 104K long-video QA-reasoning pairs.
ICLR 2025
sym

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

[Paper] [Code] [Abstract] LongVILA

Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

  • MM-SP System - 2M-tokens training on 256 GPUs, 1.4x faster than Megatron.
  • LongVILA-7B - 99.8% on 6,000-frame (>1M tokens) needle-in-a-haystack.
  • LongVILA-SFT Dataset - 54K high-quality long video QA pairs.
CVPR 2023
sym

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

[Paper] [Code] [Abstract] VoxelNeXt

Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia

  • Long-range Perception - 50m β†’ 200m with minimal latency overhead.
  • Compatible to Tracking - 1st on nuScenes LiDAR Tracking leaderboard (2022).
  • VoxelNeXt x Segment Anything - 3D-Box-Segment-Anything
CVPR 2022 Oral
sym

Focal Sparse Convolutional Networks for 3D Object Detection

[Paper] [Code] [Abstract] Focal Sparse Conv

Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

  • Learnable Conv Shape - Deformable kernels by cubic importance maps.
  • Multi-modal Extension - Fuse important sparse features with RGB features.

πŸ“‹ Academic Services

  • Area Chair for AAAI 2026.
  • Journal Reviewer: T-PAMI and T-TIP.
  • Conference Reviewer: Neurips, ICLR, ICML, CVPR, ICCV, ECCV, and AAAI.

πŸŽ– Honors and Awards

  • 2025 World’s Top 2% Scientists.
  • 2023 Final-list candidate of ByteDance Scholarship.
  • 2022 1st of nuScenes LiDAR Multi-Object Tracking leaderboard.
  • 2019 Winner of COCO Detection Challenge (ICCV 2019 COCO Workshop).
  • 2023 Winner of ScanNet Indoor Scene Understanding (CVPR 2023 ScanNet Workshop).