Bo Zhao

bozhaonanjing [AT] gmail [DOT] com

ABOUT ME

Bo Zhao is an Associate Professor (Tenure Track) at School of Artificial Intelligence, Shanghai Jiao Tong University. Before, he was with BAAI as Principal Investigator, leading DCAI group. He received Ph.D. from The University of Edinburgh and M.Eng. from Peking University. His research interests include Embodied AI, Multimodal LLM and Data-centric AI. He received ICML 2022 Outstanding Paper Award. He was the only nominee of The University of Edinburgh for Informatics-Europe Best Dissertation Award 2023. He received NSFC fundings on MLLMs and Dataset Condensation. He served as an Area Chair for NeurIPS'25/24 and BMVC'24, and organizers for DD workshops at CVPR'24 and ECCV'24. 入选国家级青年人才项目。

I am working on Embodied AI, MLLM and Data-centric AI. Collaborations are welcome. Feel free to contact me.

News:

I am recruiting Ph.D./Master Students and Research Assistants/Interns. If you are interested, please read this page.
Try our long video understanding model and benchmark - Video-XL and MLVU.
Try our lightweight MLLM - Bunny-3B/4B/8B: Demo Code.

PUBLICATIONS

Full list in Google Scholar.

[ACL 2025] MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval. Junjie Zhou, yongping xiong, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, Defu Lian. Coming Soon.
[ICML 2025] BOOD: Boundary-based Out-Of-Distribution Data Generation. Qilin Liao, Shuo Yang, Bo Zhao, Ping Luo, Hengshuang Zhao. Coming Soon.
[CVPR 2025 Oral (Top 7‰)] Video-XL: Towards Vision Language Models For Extra-Long Video Understanding. Yan Shu, Zheng Liu, Peitian Zhang, Minghao Qin, Junjie Zhou, Zhengyang Liang, Tiejun Huang, Bo Zhao. PDF. Code. Oral Ratio: 96/13008.
[CVPR 2025] MLVU: Benchmarking Multi-task Long Video Understanding. Junjie Zhou*, Yan Shu*, Bo Zhao*, Boya Wu, Zhengyang Liang, Shitao Xiao, Minghao Qin, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu. PDF. Code.
[CVPR 2025] Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark For Evaluating MLLMs On Leading Questions. Yexin Liu, Zhengyang Liang, Yueze Wang, Xianfeng Wu, Feilong Tang, Muyang He, Jian Li, Zheng Liu, Harry Yang, Ser-Nam Lim, Bo Zhao. PDF. Code.
[CVPR 2025 Oral (Top 7‰)] Towards Universal Dataset Distillation via Task-Driven Diffusion. Ding Qi, Jian Li, Junyao Gao, Shuguang Dou, Ying Tai, Jianlong Hu, Bo Zhao, Yabiao Wang, Chengjie Wang, Cairong Zhao. Coming soon. Oral Ratio: 96/13008.
[ICRA 2025] SpatialBot: Precise Spatial Understanding with Vision Language Models. Wenxiao Cai, Iaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao. PDF. Code.
[ICRA 2025] NaVid-4D: Unleashing Spatial Intelligence in Egocentric RGB-D Videos for Vision-and-Language Navigation. Haoran Liu* Weikang Wan*, Xiqian Yu*, Minghan Li*, Jiazhao Zhang, Bo Zhao, Zhibo Chen, Zhongyuan Wang, Zhizheng Zhang, He Wang. PDF. Code. Video.
[IJCV 2025] Image Captions are Natural Prompts for Training Data Synthesis. Shiye Lei*, Hao Chen*, Sen Zhang, Bo Zhao†, Dacheng Tao†. Coming Soon.
[TIP 2024] Normalizing Batch Normalization for Long-Tailed Recognition. Yuxiang Bao∗, Guoliang Kang∗, Linlin Yang, Xiaoyue Duan, Bo Zhao, Baochang Zhang. PDF. Code.
[NeurIPS 2024 Spotlight (Top 3%)] SegVol: Universal and Interactive Volumetric Medical Image Segmentation. Yuxin Du, Fan Bai, Tiejun Huang, Bo Zhao. PDF. Code.

The ninth-highest rated paper (9/15671) in NeurIPS 2024. Ranking.

[NeurIPS 2024] Fetch and Forge: Efficient Dataset Condensation for Object Detection. Ding Qi, Jian Li, Jinlong Peng, Bo Zhao, Shuguang Dou, Jialin Li, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cairong Zhao. PDF.
[NeurIPS 2024 D&B Track] Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation? Pedro R. A. S. Bassi*, Wenxuan Li*, et al. PDF. Code.
[ECCV 2024] Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking. Jiyao Zhang*, Weiyao Huang*, Bo Peng*, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong. Project Page. PDF.
[ACL 2024] VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval. Junjie Zhou, Shitao Xiao, Zheng Liu, Bo Zhao, Yongping Xiong. PDF. Code.
[RSS 2024] RAG-Driver Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model. Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd. Project Page. PDF. Code.
[ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching. Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao#. PDF. Code.
[CVPR 2023 Highlight (Top 3%)] Accelerating Dataset Distillation via Model Augmentation. Lei Zhang*, Jie Zhang*, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu. PDF.
[WACV 2023] Dataset Condensation with Distribution Matching. Bo Zhao; Hakan Bilen. PDF. Code.
[NeurIPS Workshops 2022] Synthesizing Informative Training Samples with GAN. Bo Zhao; Hakan Bilen. PDF. Code.
[ICML 2022 Outstanding Paper Award (Top 2‰)] Privacy for Free: How does Dataset Condensation Help Privacy? Tian Dong; Bo Zhao; Lingjuan Lyu. PDF.
[CVPR 2022] CAFE: Learning to Condense Dataset by Aligning Features. Kai Wang*; Bo Zhao*; Xiangyu Peng; Zheng Zhu; Shuo Yang; Shuo Wang; Guan Huang; Hakan Bilen; Xinchao Wang; and Yang You. PDF.
[ICML 2021] Dataset Condensation with Differentiable Siamese Augmentation. Bo Zhao; Hakan Bilen. PDF. Code.
[ICLR 2021 Oral (Top 2%)] Dataset Condensation with Gradient Matching. Bo Zhao; Konda Reddy Mopuri; Hakan Bilen. PDF. Code.

The second-highest rated paper 2/2997 in ICLR 2021. Ranking.

[WACV 2021] Continual Representation Learning for Biometric Identification. Bo Zhao*; Shixiang Tang*; Dapeng Chen; Hakan Bilen; Rui Zhao. PDF. Code.
[arXiv 2020] iDLG: Improved Deep Leakage from Gradients. Bo Zhao; Konda Reddy Mopuri; Hakan Bilen. arXiv. Code.
[ICML 2018] MSplit LBI: Realizing Feature Selection and Dense Estimation Simultaneously in Few-shot and Zero-shot Learning. Bo Zhao*; Xinwei Sun*; Yanwei Fu; Yuan Yao; Yizhou Wang. PDF. Code.
[ACM TOG 2018 & SIGGRAPH 2019] EasyFont: A Style Learning based System to Easily Build Your Large-scale Handwriting Fonts. Zhouhui Lian; Bo Zhao; Xudong Chen; Jianguo Xiao. PDF.
[SIGGRAPH ASIA 2016] Automatic Generation of Large-scale Handwriting Fonts via Style Learning. Zhouhui Lian; Bo Zhao; Jianguo Xiao. PDF.

PROJECTS

Emu3: Next-Token Prediction is All You Need. Project Page. Technical Report.
Bunny: A family of lightweight multimodal models. Project Page. Technical Report.
SegVol: Universal and Interactive Volumetric Medical Image Segmentation. Project Page. PDF.
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models. Project Page. PDF.

COMPETITIONS

[CVPR 2024] The 2nd place in MeViS: Motion expressions guided Video Segmentation. Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu. Certificate.

Page updated

Google Sites

Report abuse