Hi, this is Yukang Chen (陈玉康)’s website!
I am a Research Scientist in NVIDIA, working with Prof. Song Han.
I got my Ph.D. degree in CUHK, supervised by Prof. Jiaya Jia.
During my Ph.D. study, I worked closely with Prof. Xiaojuan Qi and Dr. Xiangyu Zhang.

I focus on Long AI - Efficiently scaling AI to long horizons.
This direction covers, but is not limited to, the following topics:

📚 Long-context LLMs: Efficient long-context LLMs via sparse attention.
🎥 Long-video VLMs: Scaling VLMs to long videos via sequence parallelism.
🧠 Long-sequence Reasoning: Long-sequence RL for LLMs/VLMs via sequence parallelism.
🎬 Long-video Generation: Short→Long AR with efficient fine-tuning via sparse attention.
🚗 Long-range Autonomous Driving: Long-range 3D perception in AD via sparse convolution.

If you are interested in Long AI and seeking collaboration, please feel free to contact me via Email.

LongAI

🔥 News

2025.09: 🎉🎉 Long-RL is accepted by Neurips’25!
2025.01: 🎉🎉 LongVILA is accepted by ICLR’25!
2024.09: 🎉🎉 RL-GPT is accepted by Neurips’24 as Oral!
2024.02: 🎉🎉 LISA is accepted by CVPR’24 as Oral!
2024.01: 🎉🎉 LongLoRA is accepted by ICLR’24 as Oral!
2023.04: 🎉🎉 3D-Box-Segment-Anything is released, a combination of VoxelNeXt and SAM.
2023.04: 🎉🎉 VoxelNeXt is accepted by CVPR’23!
2022.03: 🎉🎉 Focal Sparse Conv is accepted by CVPR’22 as Oral!
2022.03: 🎉🎉 Scale-aware AutoAug is accepted by T-PAMI!

💬 Invited Talks and Report

2025.10: Invited Talk by ICCV 2025 HiGen Workshop (see link).
2025.10: LongLive was reported by 新智元 (see link).
2025.07: Long-RL was reported by 机器之心 (see link).
2023.10: LongLoRA was reported by 新智元 (see link).
2023.08: LISA was reported by 量子位 (see link).
2023.06: Invited Talk by CVRP 2023 ScanNet Workshop (see link).
2023.06: Invited Talk by VALSE 2023 Perception Workshop for VoxelNeXt.
2023.04: Invited Talk and reported by 将门创投 for VoxelNeXt (see link).
2022.06: Invited Talk by 深蓝学院 for Focal Sparse Conv.

📝 Representative Publications (Full List)

Arxiv 2025

QeRL: Quantization-enhanced Reinforcement Learning for LLMs

[Paper] [Code] [Abstract]

Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen

Memory Saving - 33B LLMs RL on a single H100 GPU.
Training Speedup - 1.7x end-to-end training speedup.
High Performance - Comparable accuracy to full training.

Arxiv 2025

LongLive: Real-time Interactive Long Video Generation

[Paper] [Code] [Abstract]

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen

Real-time Inference - 20.7 FPS generation on a single H100 GPU.
Long Video Gen - Up to 240-second generation with interactive prompts.
Efficient Fine-tuning - Extend Wan to minute-long in 32 H100 GPU-days.

ICLR 2024 Oral

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

[Paper] [Code] [Abstract]

Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

Efficient Fine-tuning - 100k context on a single 8x A100, 1.8x speed up.
Easy Implementation - Shifted sparse attention, compatible with Flash-Attn.
LongAlpaca - The first open-source long instruction-following dataset.

NeurIPS 2025

Long-RL: Scaling RL to Long Sequences

[Paper] [Code] [Abstract]

Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han

MR-SP System - RL on hour-long videos (3,600 frames), up to 2.1x speedup.
LongVILA-R1-7B - 8,192 frames/video and 71.1% on VideoMME with sub.
LongVideo-Reason Dataset - 104K long-video QA-reasoning pairs.

ICLR 2025

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

[Paper] [Code] [Abstract]

Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

MM-SP System - 2M-tokens training on 256 GPUs, 1.4x faster than Megatron.
LongVILA-7B - 99.8% on 6,000-frame (>1M tokens) needle-in-a-haystack.
LongVILA-SFT Dataset - 54K high-quality long video QA pairs.

CVPR 2023

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

[Paper] [Code] [Abstract]

Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia

Long-range Perception - 50m → 200m with minimal latency overhead.
Compatible to Tracking - 1st on nuScenes LiDAR Tracking leaderboard (2022).
VoxelNeXt x Segment Anything - 3D-Box-Segment-Anything

CVPR 2022 Oral

Focal Sparse Convolutional Networks for 3D Object Detection

[Paper] [Code] [Abstract]

Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

Learnable Conv Shape - Deformable kernels by cubic importance maps.
Multi-modal Extension - Fuse important sparse features with RGB features.

📋 Academic Services

Area Chair for AAAI 2026.
Journal Reviewer: T-PAMI and T-TIP.
Conference Reviewer: Neurips, ICLR, ICML, CVPR, ICCV, ECCV, and AAAI.

🎖 Honors and Awards

2025 World’s Top 2% Scientists.
2023 Final-list candidate of ByteDance Scholarship.
2022 1st of nuScenes LiDAR Multi-Object Tracking leaderboard.
2019 Winner of COCO Detection Challenge (ICCV 2019 COCO Workshop).
2023 Winner of ScanNet Indoor Scene Understanding (CVPR 2023 ScanNet Workshop).