NVIDIA Research

Yukang Chen 陈玉康

Research Scientist | Long AI Systems

Email · Google Scholar · GitHub · Homepage

I am a Research Scientist at NVIDIA Research, working with Prof. Song Han. I received my Ph.D. in Computer Science from CUHK.

🔬 Research Focus

My research focuses on Long AI Systems through algorithm-system co-design: co-designing model algorithms, data/training recipes, distributed training systems, memory-efficient inference, and low-precision deployment to scale AI to long horizons efficiently.

My work spans long-video generation systems, long reasoning acceleration inference systems, long-video reinforcement learning systems, long-video understanding training systems, and long-context large language models.
Recent systems include LongLive-2.0 for FP4 long-video generation infrastructure, TriAttention for long-reasoning inference acceleration across vLLM/SGLang/TensorRT/OpenClaw, Long-RL/MR-SP for hour-level long-video RL, LongVILA/MM-SP for 2M-token VLM training.
If you are interested in Long AI Systems and collaboration, please feel free to contact me via Email.

✍️ Blogs

Research Blog

Pushing Intelligence to 4-bit

How 4-bit quantization becomes practical across training and inference, in LLMs, KV cache, FP4 attention, and video generation.

Research Blog

KV Cache Compression and Its Infra Problems

An infra view of KV compression: why attention-score methods collide with FlashAttention and paged attention, and how TriAttention resolves both.

Research Blog

Scaling Video Training with Parallelism

A practical view of sequence parallelism for long-video training, covering MM-SP, Balanced SP, and objective-aware temporal sharding.

Research Blog

Why Video Gen Is an Infra Problem

A systems-oriented perspective on long video generation infrastructure, efficiency, and deployment.

🚀 Representative Systems & Algorithms

Long-video Generation System

LongLive-2.0 / LongLive

FP4/NVFP4 long-video generation infrastructure with Balanced SP, teacher-forcing layout co-design, W4A4 inference, KV cache compression, parallel dequantization, and asynchronous streaming VAE decoding.

Long Reasoning Acceleration Inference System

TriAttention

Training-free KV cache compression for long reasoning, integrated with vLLM, SGLang, TensorRT deployment path, LongLive KV-compressed video generation, and OpenClaw custom-provider deployment.

Long-video Reinforcement Learning System

Long-RL / MR-SP

A full-stack long-video RL system combining LongVideo-Reason, CoT-SFT/RL, sequence parallelism, vLLM-based rollout/prefill, and cached video embeddings for hour-level video reasoning.

Long-video Understanding Training System

LongVILA / MM-SP

Algorithm-system co-design for long-video VLMs, enabling 2M-token context training on 256 GPUs without gradient checkpointing through Multi-Modal Sequence Parallelism.

Long-context Large Language Model

LongLoRA

Efficient long-context fine-tuning via shifted sparse attention and improved LoRA, extending Llama2-7B to 100k context and Llama2-70B to 32k context on a single 8x A100 machine.

Long-range Autonomous Driving Perception

VoxelNeXt

Fully sparse VoxelNet for 3D object detection and tracking; extends perception range by 4x without inference overhead and ranked 1st on nuScenes LiDAR 3D detection and tracking leaderboards.

🎓 Background

NVIDIA ResearchResearch Scientist, Efficient AI / Long AI Systems, Sep 2024 - Present

The Chinese University of Hong KongPh.D., Computer Science, Aug 2020 - Jul 2024

Long AI Systems

🔥 News

2026.04Paper TriAttention is accepted by ICML'26!
2026.01Paper LongLive and QeRL are accepted by ICLR'26!
2025.09Paper Long-RL is accepted by NeurIPS'25!
2025.01Paper LongVILA is accepted by ICLR'25!
2024.09Oral RL-GPT is accepted by NeurIPS'24 as Oral!
2024.02Oral LISA is accepted by CVPR'24 as Oral!
2024.01Oral LongLoRA is accepted by ICLR'24 as Oral!
2023.04Release 3D-Box-Segment-Anything is released, combining VoxelNeXt and SAM.
2023.04Paper VoxelNeXt is accepted by CVPR'23!
2022.03Oral Focal Sparse Conv is accepted by CVPR'22 as Oral!
2022.03Journal Scale-aware AutoAug is accepted by T-PAMI!

📝 Representative Publications

Full List

Arxiv 2026

LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

[Paper] [Code] [Demo] [Abstract] Star...

Yukang Chen , Luozhou Wang, Wei Huang, Shuai Yang, Bohan Zhang, Yicheng Xiao, Ruihang Chu, Weian Mao, Qixin Hu, Shaoteng Liu, Yuyang Zhao, Huizi Mao, Ying-Cong Chen, Enze Xie, Xiaojuan Qi, Song Han

The first open-source FP4 Infra for Long Video Gen.
Real-time Inference - 45.7 FPS on 5B model.
Support real-video training, few-step distillation, multi-shot, sequence-parallel, NVFP4 KV cache, and async VAE decoding.

ICLR 2026

LongLive: Real-time Interactive Long Video Generation

[Paper] [Code] [Demo] [Abstract] Star...

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen

Real-time Inference - 20.7 FPS generation on a single H100 GPU.
Long Video Gen - Up to 240-second generation with interactive prompts.
Efficient Fine-tuning - Extend Wan to minute-long in 32 H100 GPU-days.

ICML 2026

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

[Paper] [Code] [Demo] [Abstract] Star...

Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen

High Efficiency - 2.5x higher FPS and 10.7x KV memory reduction in LLMs.
OpenClaw - 32B LLM on a 24GB GPU.
Long Video Gen - Reducing 50% KV Cache in AR Long Video Generation.

NeurIPS 2025

Long-RL: Scaling RL to Long Sequences

[Paper] [Code] [Demo] [Abstract] Star...

Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han

MR-SP System - RL on hour-long videos (3,600 frames), up to 2.1x speedup.
LongVILA-R1-7B - 8,192 frames/video and 71.1% on VideoMME with sub.
LongVideo-Reason Dataset - 104K long-video QA-reasoning pairs.

ICLR 2025

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

[Paper] [Code] [Abstract] Star...

Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

MM-SP System - 2M-tokens training on 256 GPUs, 1.4x faster than Megatron.
LongVILA-7B - 99.8% on 6,000-frame (>1M tokens) needle-in-a-haystack.
LongVILA-SFT Dataset - 54K high-quality long video QA pairs.

ICLR 2024 Oral

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

[Paper] [Code] [Abstract] Star...

Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

Efficient Fine-tuning - 100k context on a single 8x A100, 1.8x speed up.
Easy Implementation - Shifted sparse attention, compatible with Flash-Attn.
LongAlpaca - The first open-source long instruction-following dataset.

💬 Invited Talks and Reports

2026.07Talk Invited Talk at ICML 2026 F2S Workshop.
2026.07Talk Invited Talk at ACL 2026 SELVA Workshop.
2026.05Report TriAttention was reported by 新智元 (see link).
2025.10Talk Invited Talk at ICCV 2025 HiGen Workshop.
2025.10Report LongLive was reported by 新智元 (see link).
2025.07Report Long-RL was reported by 机器之心 (see link).
2023.10Report LongLoRA was reported by 新智元 (see link).
2023.08Report LISA was reported by 量子位 (see link).
2023.06Talk Invited Talk at CVPR 2023 ScanNet Workshop.
2023.06Talk Invited Talk at VALSE 2023 Perception Workshop for VoxelNeXt.
2023.04Talk Invited Talk and report by 将门创投 for VoxelNeXt (see link).
2022.06Talk Invited Talk by 深蓝学院 for Focal Sparse Conv.

📋 Academic Services

Area ChairAAAI 2026

Journal ReviewerT-PAMI and T-TIP

Conference ReviewerNeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, and AAAI

🎖 Honors and Awards

2025World's Top 2% Scientists.

2023Final-list candidate of ByteDance Scholarship.

20221st on nuScenes LiDAR 3D Object Detection leaderboard.

20221st on nuScenes LiDAR Multi-Object Tracking leaderboard.

2023Winner of ScanNet Indoor Scene Understanding (CVPR 2023 ScanNet Workshop).

2019Winner of COCO Detection Challenge (ICCV 2019 COCO Workshop).