NVIDIA Research

Yukang Chen 陈玉康

Research Scientist | Long AI Systems

I am a Research Scientist at NVIDIA Research, working with Prof. Song Han. I received my Ph.D. in Computer Science from CUHK.

🔬 Research Focus

My research focuses on Long AI Systems through algorithm-system co-design: co-designing model algorithms, data/training recipes, distributed training systems, memory-efficient inference, and low-precision deployment to scale AI to long horizons efficiently.

  • My work spans long-video generation systems, long reasoning acceleration inference systems, long-video reinforcement learning systems, long-video understanding training systems, and long-context large language models.
  • Recent systems include LongLive-2.0 for FP4 long-video generation infrastructure, TriAttention for long-reasoning inference acceleration across vLLM/SGLang/TensorRT/OpenClaw, Long-RL/MR-SP for hour-level long-video RL, LongVILA/MM-SP for 2M-token VLM training.
  • If you are interested in Long AI Systems and collaboration, please feel free to contact me via Email.

🚀 Representative Systems & Algorithms

Long-video Generation System

LongLive-2.0 / LongLive

FP4/NVFP4 long-video generation infrastructure with Balanced SP, teacher-forcing layout co-design, W4A4 inference, KV cache compression, parallel dequantization, and asynchronous streaming VAE decoding.

Long Reasoning Acceleration Inference System

TriAttention

Training-free KV cache compression for long reasoning, integrated with vLLM, SGLang, TensorRT deployment path, LongLive KV-compressed video generation, and OpenClaw custom-provider deployment.

Long-video Reinforcement Learning System

Long-RL / MR-SP

A full-stack long-video RL system combining LongVideo-Reason, CoT-SFT/RL, sequence parallelism, vLLM-based rollout/prefill, and cached video embeddings for hour-level video reasoning.

Long-video Understanding Training System

LongVILA / MM-SP

Algorithm-system co-design for long-video VLMs, enabling 2M-token context training on 256 GPUs without gradient checkpointing through Multi-Modal Sequence Parallelism.

Long-context Large Language Model

LongLoRA

Efficient long-context fine-tuning via shifted sparse attention and improved LoRA, extending Llama2-7B to 100k context and Llama2-70B to 32k context on a single 8x A100 machine.

Long-range Autonomous Driving Perception

VoxelNeXt

Fully sparse VoxelNet for 3D object detection and tracking; extends perception range by 4x without inference overhead and ranked 1st on nuScenes LiDAR 3D detection and tracking leaderboards.

🎓 Background

NVIDIA ResearchResearch Scientist, Efficient AI / Long AI Systems, Sep 2024 - Present
The Chinese University of Hong KongPh.D., Computer Science, Aug 2020 - Jul 2024

Long AI Systems

🔥 News

💬 Invited Talks and Reports

📝 Representative Publications

Full List
Arxiv 2026
sym

LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

[Paper] [Code] [Demo] [Abstract] LongLive

Yukang Chen , Luozhou Wang, Wei Huang, Shuai Yang, Bohan Zhang, Yicheng Xiao, Ruihang Chu, Weian Mao, Qixin Hu, Shaoteng Liu, Yuyang Zhao, Huizi Mao, Ying-Cong Chen, Enze Xie, Xiaojuan Qi, Song Han

  • The first open-source FP4 Infra for Long Video Gen.
  • Real-time Inference - 45.7 FPS on 5B model.
  • Support real-video training, few-step distillation, multi-shot, sequence-parallel, NVFP4 KV cache, and async VAE decoding.
ICLR 2026
sym

LongLive: Real-time Interactive Long Video Generation

[Paper] [Code] [Demo] [Abstract] LongLive

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, Yukang Chen

  • Real-time Inference - 20.7 FPS generation on a single H100 GPU.
  • Long Video Gen - Up to 240-second generation with interactive prompts.
  • Efficient Fine-tuning - Extend Wan to minute-long in 32 H100 GPU-days.
ICML 2026
sym

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

[Paper] [Code] [Demo] [Abstract] TriAttention

Weian Mao, Xi Lin, Wei Huang, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen

  • High Efficiency - 2.5x higher FPS and 10.7x KV memory reduction in LLMs.
  • OpenClaw - 32B LLM on a 24GB GPU.
  • Long Video Gen - Reducing 50% KV Cache in AR Long Video Generation.
NeurIPS 2025
sym

Long-RL: Scaling RL to Long Sequences

[Paper] [Code] [Demo] [Abstract] Long-RL

Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han

  • MR-SP System - RL on hour-long videos (3,600 frames), up to 2.1x speedup.
  • LongVILA-R1-7B - 8,192 frames/video and 71.1% on VideoMME with sub.
  • LongVideo-Reason Dataset - 104K long-video QA-reasoning pairs.
ICLR 2025
sym

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

[Paper] [Code] [Abstract] LongVILA

Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

  • MM-SP System - 2M-tokens training on 256 GPUs, 1.4x faster than Megatron.
  • LongVILA-7B - 99.8% on 6,000-frame (>1M tokens) needle-in-a-haystack.
  • LongVILA-SFT Dataset - 54K high-quality long video QA pairs.
ICLR 2024 Oral
sym

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

[Paper] [Code] [Abstract] LongLoRA

Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

  • Efficient Fine-tuning - 100k context on a single 8x A100, 1.8x speed up.
  • Easy Implementation - Shifted sparse attention, compatible with Flash-Attn.
  • LongAlpaca - The first open-source long instruction-following dataset.

📋 Academic Services

Area ChairAAAI 2026
Journal ReviewerT-PAMI and T-TIP
Conference ReviewerNeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, and AAAI

🎖 Honors and Awards

2025World's Top 2% Scientists.
2023Final-list candidate of ByteDance Scholarship.
20221st on nuScenes LiDAR 3D Object Detection leaderboard.
20221st on nuScenes LiDAR Multi-Object Tracking leaderboard.
2023Winner of ScanNet Indoor Scene Understanding (CVPR 2023 ScanNet Workshop).
2019Winner of COCO Detection Challenge (ICCV 2019 COCO Workshop).