WorldModelBench: The 1st Workshop on Benchmarking World Models

CVPR 2025 Workshop



Introduction

World models refer to predictive models of physical phenomena in the world surrounding us. These models are fundamental for Physical AI agents, enabling crucial capabilities such as decision-making, planning, and counterfactual analysis. Effective world models must integrate several key components, including perception, instruction following, controllability, physical plausibility, and future prediction. Over the past year, we have seen remarkable progress in building such world models – from video models trained with text-only conditioning to those leveraging richer conditioning sources (image, video, control). Research teams from both academia and industry have released numerous open-source and proprietary models.

This proliferation of world models opens doors to their use in several downstream applications, ranging from content creation, autonomous driving, to robotics. However, these models vary substantially in their training methodologies, data recipes, architectural designs, and input conditioning approaches. As a research community, we are compelled to critically examine their capabilities through comprehensive evaluation. This requires not only identifying relevant evaluation criteria (e.g., physical correctness, alignment with input prompts, generalizability) but also developing appropriate metrics and establishing standardized evaluation methodologies for fair assessment.

The goal of the WorldModelBench workshop is to provide a forum to facilitate in-depth discussions on evaluating world models. The workshop will cover a range of topics, including but not limited to:

  • Designing accessible benchmarks for evaluating world models
  • Designing methodology, protocols and metrics for quantitative evaluation
  • Downstream evaluation of models through different tasks
  • Considerations surrounding safety and bias in world models


Call For Papers

We welcome submissions on any aspects related to evaluating world-models, including but not limited to:

  • Methods for developing world (and video) models, including novel architectures, training approaches, and scaling strategies
  • Applications of world foundation models and video generation models to downstream embodied tasks, such as robotics and autonomous driving
  • Novel metrics, benchmarks or datasets to evaluate world models
  • Analysis of safety considerations and potential biases in world foundation models and video generation models

Submission Guideline:

  • Submission website: openreview submission page
  • Our workshop accepts both full paper submissions (4-8 pages excluding references) and extended abstract submissions (2-4 pages including references).
  • Full paper submissions (4-8 pages excluding references) should NOT be published before. Please refer to CVPR 2025 author guidelines: https://cvpr.thecvf.com/Conferences/2025/AuthorGuidelines
  • Submission Format: official CVPR template (double column; no more than 8 pages, excluding reference).
  • Our paper reviewing process is double blind.


Important Dates (Anywhere on Earth)

Paper submission deadline April 7th, 2025
Notifications to accepted papers April 28th, 2025
Paper camera ready May 12th, 2025


Schedule (Tentative)

Introduction and Opening Remarks TBD
Spotlight Presentation 1 TBD
Spotlight Presentation 2 TBD
Spotlight Presentation 3 TBD
Coffee Break TBD
Poster Session TBD
Invited Talk 1 TBD
Invited Talk 2 TBD
Invited Talk 3 TBD
Roundtable Discussion TBD


Invited Speakers

Wenhu Chen is a Professor at University of Waterloo and Vector Institute, also a Research Scientist at Google Deepmind. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He received the Area Chair Award in AACL-IJCNLP 2023, the Best Paper Honorable Mention in WACV 2021, and the UCSB CS Outstanding Dissertation Award in 2021.


Deepti Ghadiyaram is a Professor at Boston University and a member of Technical Staff at Runway. Her research focuses on improving the safety, interpretability, and robustness of AI systems. Previously she spent over 5 years at Meta AI Research working on image and video understanding models, fair and inclusive computer vision models, and ML explainability. She has served as a Program Chair for NeurIPS 2022 Dataset and Benchmarks track, hosted several tutorials and organized workshops and an Area Chair for CVPR, ICCV, ECCV, and NeurIPS.


Aditya Grover is a Professor at UCLA and a co-founder of Inception Labs. He leads the Machine Intelligence (MINT) group at UCLA to develop AI systems that can interact and reason with limited supervision. His current research is at the intersection of generative models and sequential decision making. He received many prestigious awards, such as NSF Career Award, Schmidt AI 2050 Early Career Fellowship, Kavli Fellow by the US National Academy of Sciences, Outstanding Paper Award at NeurIPS, etc.


Organizers

Heng Wang
NVIDIA
Ming-Yu Liu
NVIDIA
Mike Zheng Shou
National University of Singapore
Jay Zhangjie Wu
National University of Singapore
Xihui Liu
University of Hong Kong


Deepti Ghadiyaram
Boston University
Gowthami Somepalli
University of Maryland, College Park
Huaxiu Yao
University of North Carolina at Chapel Hill
Wenhu Chen
University of Waterloo
Jiaming Song
Luma AI
Humphrey Shi
Georgia Tech


Contact

To contact the organizers please use worldmodelbench@gmail.com



Acknowledgments

Thanks to languagefor3dscenes for the webpage format.