Why Object State Change Matters
- Object state change (OSC) is common in daily life and indicates whether a task has been completed.
- Existing text-to-video evaluations mainly focus on semantic alignment, visual quality and physical plausibility, but overlook whether the object reaches the intended target state.
- T2V-generated videos may look plausible but still fail to produce correct and temporally consistent object state changes.
What is OSCBench
OSCBench evaluates whether T2V models can correctly reason and render action-induced object state changes.
Overview of the OSCBench construction and evaluation pipeline. We build unified action and object categories from instructional cooking data via a human-in-the-loop process, and construct regular, novel, and compositional OSC scenarios as text prompts for video generation. The generated videos are evaluated by humans and MLLMs, and we analyze their correlations to assess automatic evaluation reliability.
Benchmark Statistics
20 action elements → 8 action categories
134 object elements → 28 object categories
108 regular scenarios
20 novel scenarios
12 compositional scenarios
Each scenario contains 8 action–object pairs
1,120 prompts overall
Example prompts and failure cases from regular, novel, and compositional OSC scenarios.
Main Results
Overall performance of T2V models from human and MLLM-based evaluators.
OSC performance across action categories by human evaluation.
Interesting Results
Regular scenario: A chef is slicing leek at a street food stand
(Artificial artifacts)
Regular scenario (minimal prompt): Peeling zucchini
(Artificial artifacts)
Novel scenario: A woman is zesting grapefruit in the kitchen
(memorization rather than understanding)
Compositional scenario: A robot with robotic hands is mincing and sauteing ginger in the kitchen
(Only one state change)
BibTeX
@article{OSCBench2026,
title={OSCBench: Benchmarking Object State Change in Text-to-Video Generation},
author={Han, Xianjing and Zhu, Bin and Hu, Shiqi and Li, Mingzhe Franklin and Carrington, Patrick and Zimmermann, Roger and Chen, Jingjing},
year={2026},
url={https://arxiv.org/abs/2603.11698}
}