WorldModelBench: The 1st Workshop on Benchmarking World Models

CVPR 2025 Workshop

Room 108 - June 12 (8:30 am - 12:00 pm), 2025


Introduction

World models refer to predictive models of physical phenomena in the world surrounding us. These models are fundamental for Physical AI agents, enabling crucial capabilities such as decision-making, planning, and counterfactual analysis. Effective world models must integrate several key components, including perception, instruction following, controllability, physical plausibility, and future prediction. Over the past year, we have seen remarkable progress in building such world models – from video models trained with text-only conditioning to those leveraging richer conditioning sources (image, video, control). Research teams from both academia and industry have released numerous open-source and proprietary models.

This proliferation of world models opens doors to their use in several downstream applications, ranging from content creation, autonomous driving, to robotics. However, these models vary substantially in their training methodologies, data recipes, architectural designs, and input conditioning approaches. As a research community, we are compelled to critically examine their capabilities through comprehensive evaluation. This requires not only identifying relevant evaluation criteria (e.g., physical correctness, alignment with input prompts, generalizability) but also developing appropriate metrics and establishing standardized evaluation methodologies for fair assessment.

The goal of the WorldModelBench workshop is to provide a forum to facilitate in-depth discussions on evaluating world models. The workshop will cover a range of topics, including but not limited to:

  • Designing accessible benchmarks for evaluating world models
  • Designing methodology, protocols and metrics for quantitative evaluation
  • Downstream evaluation of models through different tasks
  • Considerations surrounding safety and bias in world models


Accepted Papers

Call For Papers

We welcome submissions on any aspects related to evaluating world-models, including but not limited to:

  • Methods for developing world (and video) models, including novel architectures, training approaches, and scaling strategies
  • Applications of world foundation models and video generation models to downstream embodied tasks, such as robotics and autonomous driving
  • Novel metrics, benchmarks or datasets to evaluate world models
  • Analysis of safety considerations and potential biases in world foundation models and video generation models

Submission Guideline:

  • Submission website: openreview submission page
  • Our workshop accepts both full paper submissions (4-8 pages excluding references) and extended abstract submissions (2-4 pages including references).
  • Full paper submissions (4-8 pages excluding references) should NOT be published before. Please refer to CVPR 2025 author guidelines: https://cvpr.thecvf.com/Conferences/2025/AuthorGuidelines
  • Submission Format: official CVPR template (double column; no more than 8 pages, excluding reference).
  • Our paper reviewing process is double blind.


Important Dates (Anywhere on Earth)

Paper submission deadline April 18th, 2025
Notifications to accepted papers May 16th, 2025
Paper camera ready May 30th, 2025


Schedule

Opening Remarks (5 min) 8:30am - 8:35am
Oral Presentation 1 (15 min): WorldScore: A Unified Evaluation Benchmark for World Generation 8:35am - 8:50am
Oral Presentation 2 (15 min): WorldModelBench: Judging Video Generation Models As World Models 8:50am - 9:05am
Oral Presentation 3 (15 min): WorldSimBench: Towards Video Generation Models as World Simulators 9:05am - 9:20am
Invited Talk 1 (30 min) by Wenhu Chen: Frontier of World Model Evaluation: Understanding and Generation 9:20am - 9:50am
Poster Session & Coffee Break (40 min) 9:50am - 10:30am
Invited Talk 2 (30 min) by Deepti Ghadiyaram: Towards Safe and Sensible Generative Media 10:30am - 11:00am
Invited Talk 3 (30 min): Aditya Grover: Diffusion Language Models for Multimodal Understanding 11:00am - 11:30am
Invited Talk 4 (30 min) by Haoqi Fan: BAGEL: Unified Multimodal Model as World Foundational Model 11:30am - 12:00pm
Closing Remarks (5 min) 12:00pm - 12:05pm


Invited Speakers

Wenhu Chen is a Professor at University of Waterloo and Vector Institute, also a Research Scientist at Google Deepmind. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He received the Area Chair Award in AACL-IJCNLP 2023, the Best Paper Honorable Mention in WACV 2021, and the UCSB CS Outstanding Dissertation Award in 2021.


Deepti Ghadiyaram is a Professor at Boston University and a member of Technical Staff at Runway. Her research focuses on improving the safety, interpretability, and robustness of AI systems. Previously she spent over 5 years at Meta AI Research working on image and video understanding models, fair and inclusive computer vision models, and ML explainability. She has served as a Program Chair for NeurIPS 2022 Dataset and Benchmarks track, hosted several tutorials and organized workshops and an Area Chair for CVPR, ICCV, ECCV, and NeurIPS.


Aditya Grover is a Professor at UCLA and a co-founder of Inception Labs. He leads the Machine Intelligence (MINT) group at UCLA to develop AI systems that can interact and reason with limited supervision. His current research is at the intersection of generative models and sequential decision making. He received many prestigious awards, such as NSF Career Award, Schmidt AI 2050 Early Career Fellowship, Kavli Fellow by the US National Academy of Sciences, Outstanding Paper Award at NeurIPS, etc.


Haoqi Fan Haoqi Fan is a Research Scientist at Seed Edge, where he leads efforts to build world foundational models. He spent seven years at Facebook AI Research (FAIR), focusing on self-supervised learning and backbone design for image and video understanding. His works won the ActivityNet Challenge at ICCV 2019 and were nominated for Best Paper at CVPR 2020. He has also co-organized several tutorials at CVPR, ICCV, and ECCV.


Organizers

Heng Wang
NVIDIA
Ming-Yu Liu
NVIDIA
Mike Zheng Shou
National University of Singapore
Jay Zhangjie Wu
National University of Singapore
Xihui Liu
University of Hong Kong


Deepti Ghadiyaram
Boston University
Gowthami Somepalli
University of Maryland, College Park
Huaxiu Yao
University of North Carolina at Chapel Hill
Wenhu Chen
University of Waterloo
Jiaming Song
Luma AI
Humphrey Shi
Georgia Tech


Contact

To contact the organizers please use worldmodelbench@gmail.com



Acknowledgments

Thanks to languagefor3dscenes for the webpage format.