About me

I am a PhD student at the SCSE at Nanyang Technological University. I work in the MMLab@NTU, supervised by Dr. Ziwei Liu.

My ongoing projects (since mid 2023) focus on:

  • Visual generalist models: developing models to process diverse visual-centric data (e.g., images, videos, 3D, audio, IMU, etc.) and solve a wide range of tasks such as perception, reasoning, generation, robotics, and video gaming. Representative works include Octopus, FunQA, Otter, etc.
  • AI safety for foundation models: addressing hallucination problems in large language models (LLMs) and large multimodal models (LMMs). We introduce UPD to withhold answers when faced with unsolvable problems.

My previous works mainly include:

  • The PSG series (2022-2023): I took the lead in the PSG series (PSG, PVSG, PSG4D) that highlights the relation modeling for scene understanding. I also collaborated on some PSG-related works such as Relate-Anything and PairNet. I am still trying to improve the performance of the PSG series.
  • OOD Detection (2021-2022): I led an impactful survey and a popular codebase OpenOOD for the AI safety community.
  • Prompt Tuning (2022): I contributed to the foundational work (CoOp & CoCoOp) for prompt tuning in vision language models.

Before coming to NTU, I completed my B.S. in Telecommunications in 2017 from a joint program between BUPT and QMUL in 2017. From 2018 to 2020, I worked with the CMU SAILING Lab, MIT IDSS, and SenseTime EIG Research as research assistant/intern. I started my PhD at Rice ECE in 2020, but due to the pandemic and visa issues, I had to leave and gratefully joined MMLab@NTU.

News

(last updated: 2023-12)
2023-11: We introduce OtterHD, a visual chatbot that captures details.
2023-10: We introduce Octopus, an embodied vision language programmer that plays GTA-V.
2023-08: PSG4D is accepted as NeurIPS-23 Spotlight.
2023-08: COLA is accepted to NeurIPS-23.
2023-06: We introduce FunQA, a novel benchmark to test VLM’s reasoning ability.
2023-06: We introduce OpenOOD v1.5.
2023-05: We introduce Otter trained on MIMIC-IT. Play with the powerful visual chatbot!
2023-05: We introduce RAM to find any relations based on SAM and PSG.
2023-05: We are happy to introduce SAD: Segment Any RGBD.
2023-02: PVSG is accepted by CVPR-23.
2023-01: GMoE is accepted as an ICLR 2023 Oral Paper.
2022-10: I attended ECCV-22 in person with my shiny backpack.
2022-09: OpenOOD is accepted by NeurIPS-22 Datasets and Benchmarks Track.
2022-08: We organize The AI Talks. Subscribe to learn from the leading AI researchers!
2022-07: PSG is accepted by ECCV-22. The PSG challenge with $150K prize starts.
2022-07: CoOp and CoCoOp are accepted by IJCV and CVPR-22, respectively.
2021-10: Released a comprehensive survey on OOD detection.
2021-09: Selected as an ICCV-2021 Outstanding Reviewer.
2021-07: SCOOD is accepted to ICCV-2021.
2021-01: Join MMLab@NTU!