Hello! I am a Research Scientist in NVIDIA Research, working with Prof. Song Han. I got my Ph.D degree in CUHK and master degree in Institute of Automation, CAS. My research projects focuses on LongAI, that is "Boost AI's Long ability while staying Efficient". My representative works include VoxelNeXt, LongLoRA, and LongVILA. My on-going research directions include long video generation and long-chain reasoning. To know more about me at Google Scholar and Github. Any discussion is welcome via E-mail.
[2024/09] Two papers have been accepted by NeurIPS 2024, including one Oral, RL-GPT.
[2024/02] Four papers have been accepted by CVPR 2024, including one Oral, LISA.
[2024/01] LongLoRA has been accepted by ICLR 2024 as an Oral presentation.
[2023/08] We release LongLoRA , an efficient fine-tuning approach for long-context large language models. We also release LongAlpaca , this is the long instruction following dataset, LongAlpaca-12k , and the corresponding models, LongAlpaca-70B.
[2023/08] We release LISA , Reasoning Segmentation via Large Language Model.
[2023/07] We release IST-Net , this is a prior-free pose estimator with SOTA performance and efficiency.
[2023/07] Three papers accepted by ICCV 2023, IST-Net , FocalFormer3D for 3D ojbect detection and tracking, and Accelerated DETR for 3D instance segmentation.
[2023/04] We release 3D-Box-Segment-Anything . We extend Segment Anything to 3D world by combining it and VoxelNeXt. When we provide a prompt (e.g., a point / box), the result is not only 2D segmentation mask, but also 3D boxes.
[2023/03] VoxelNeXt is accepted by CVPR 2023 , a fully sparse VoxelNet. It achieves SOTA on Argoverse2 3D object detection and nuScenes LiDAR mutli-object tracking Code. VoxelNeXt has been merged into the official OpenPCDet codebase.
[2023/03] SphereFormer is accepted by CVPR 2023 , spherical window 3D transformer backbone. It achieves SOTA on SemanticKITTI and nuScenes LiDAR semantic segmentation Code.
[2023/03] We release SparseTransformer , a fast and memory-efficient libarary for sparse transformer for 3D point cloud. This project is lead by Xin Lai.
[2023/03] 3 Papers accepted by CVPR 2023!
[2022/12] We release spconv-plus , a library based on spconv and integrate several new sparse convolution types and operators that might be useful into this library.
[2022/09] Spatial Pruned Sparse Conv is accepted by NeurIPS 2022 , 50% FLOPs saving for efficient 3D object detection Code.
[2022/06] LargeKernel3D 1st NDS on nuScenes Leaderboard , the first large-kernel 3D sparse CNN backbone Code.
[2022/03] Focal Sparse Conv is accepted by CVPR 2022 Oral , a dynamic sparse convolution for high performance 3D object detection Code. Focal Sparse Conv has been merged into the official OpenPCDet codebase.
[2022/03] Scale-aware AutoAug is accepted by T-PAMI, a dynamic training strategy for journal extension Code.
[2021/03] Scale-aware AutoAug is accepted by CVPR 2021, scale-aware augmentations with automatic augmentation Code.
[2019/12] DetNAS is accepted by NeurIPS 2019 , the first work that introduces NAS into object detection Code.
[2019/10] Winner of Microsoft COCO 2019 Instance Segmentation Track.
[2019/12] RENAS is accepted by CVPR 2019 , the first work that combine reinforcement learning and evolutionary search in NAS Code.
The Chinese University of Hong Kong
PhD., Computer Science and Engineering.
Aug 2020 - Jul 2024.
Supervisor: Prof. Jiaya Jia.
Institute of Automation, Chinese Academy of Sciences
M.Phil., Pattern Recognition and Intelligent System.
Sep 2017 - Jul 2020.
Supervisor: Prof. Gaofeng Meng and Prof. Shiming Xiang.