PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Full-4D: Generating Full-Scope 4D Scene from a Single-View Video

Tingxi Chen^*1,2, Ke Hao^*1,2, Yabo Chen^†2, Zhengxue Cheng¹, Rong Xie¹, Song Li^✉1, Haibin Huang², Chi Zhang², Xuelong Li^✉2

¹Shanghai Jiao Tong University, ²Institute of Artificial Intelligence, China Telecom (TeleAI)

^*Equal Contributions, ^†Project Leader, ^✉Corresponding Author

arXiv

Given a single-view input video, our Full-4D generates a synchronized multi-view 4D video grid over large angular ranges and lifts into 4D scene representation. The rendered results demonstrate strong generalization across diverse scenes and styles.

Abstract

Generating 4D scenes from a single-view video is inherently ill-posed: a single viewpoint lacks the information needed to recover a complete, dynamic scene with full coverage. Existing methods are typically limited to monocular videos, simple 3D effects, or only small viewpoint perturbations around the original viewpoint, falling short of true 4D generation. Meanwhile, the lack of large-scale datasets capturing full-scope 4D scenes with synchronized multi-view videos further hinders progress in this direction. We propose a novel single-view video-to-4D framework that casts full-scope 4D generation as a multi-view video synthesis followed by optimization-based 4D reconstruction from the generated views. To instantiate this formulation end-to-end, we make three key contributions. First, we introduce Real-MV-4D, a large-scale dataset of synchronized multi-view videos captured in diverse real-world environments to provide the 4D supervision. Second, we train a multi-view video diffusion model, incorporating spatial-aware projection guidance and a fused time(T)-view(V) attention mechanism to generate a synchronized T×V video grid with strict alignment. Third, we lift the synthesized multi-view videos into an explicit 4D representation (i.e. 4DGS), regularized by a Flow Matching Distillation loss that exploits the multi-view prior to improve novel-view rendering. Extensive experiments demonstrate that our method outperforms existing approaches in both visual fidelity and geometric consistency, enabling full-scope 4D scene generation from single-view videos.

Full-4D: Generating Full-Scope 4D Scene from a Single-View Video

Given a single-view input video, our Full-4D generates a synchronized multi-view 4D video grid over large angular ranges and lifts into 4D scene representation. The rendered results demonstrate strong generalization across diverse scenes and styles.

Abstract

Generated 4D Scenes

Baselines Comparison

More Results