Full-4D: Generating Full-Scope 4D Scene from a Single-View Video

Tingxi Chen*1,2, Ke Hao*1,2, Yabo Chen†2, Zhengxue Cheng1, Rong Xie1, Song Li✉1, Haibin Huang2, Chi Zhang2, Xuelong Li✉2
1Shanghai Jiao Tong University, 2Institute of Artificial Intelligence, China Telecom (TeleAI)
*Equal Contributions, Project Leader, Corresponding Author
Full-4D Teaser Image

Given a single-view input video, our Full-4D generates a synchronized multi-view 4D video grid over large angular ranges and lifts into 4D scene representation. The rendered results demonstrate strong generalization across diverse scenes and styles.

Abstract

Generating 4D scenes from a single-view video is inherently ill-posed: a single viewpoint lacks the information needed to recover a complete, dynamic scene with full coverage. Existing methods are typically limited to monocular videos, simple 3D effects, or only small viewpoint perturbations around the original viewpoint, falling short of true 4D generation. Meanwhile, the lack of large-scale datasets capturing full-scope 4D scenes with synchronized multi-view videos further hinders progress in this direction. We propose a novel single-view video-to-4D framework that casts full-scope 4D generation as a multi-view video synthesis followed by optimization-based 4D reconstruction from the generated views. To instantiate this formulation end-to-end, we make three key contributions. First, we introduce Real-MV-4D, a large-scale dataset of synchronized multi-view videos captured in diverse real-world environments to provide the 4D supervision. Second, we train a multi-view video diffusion model, incorporating spatial-aware projection guidance and a fused time(T)-view(V) attention mechanism to generate a synchronized T×V video grid with strict alignment. Third, we lift the synthesized multi-view videos into an explicit 4D representation (i.e. 4DGS), regularized by a Flow Matching Distillation loss that exploits the multi-view prior to improve novel-view rendering. Extensive experiments demonstrate that our method outperforms existing approaches in both visual fidelity and geometric consistency, enabling full-scope 4D scene generation from single-view videos.

Generated 4D Scenes

Baselines Comparison

More Results