In sports broadcasting, multiple static cameras capture exactly the same moment. At time t, all views of a subject differ only by rigid transformations. This eliminates the need for deformation graphs, warping fields, 4D canonicalization, or Gaussian tracking across time. This is a domain-informed insight that fundamentally changes how dynamic novel view synthesis should be modeled.
Many dynamic view synthesis methods implicitly rely on some form of temporal correspondence—tracking points, tracking Gaussians, or tracking deformation fields. In fast-paced sports and multi-person stage performances, these assumptions frequently break due to:
TACV takes a different stance. With a synchronized, calibrated multi-view rig, the scene at each time step is already strongly constrained by geometry. We model the sequence as a time-indexed set of archival multi-view snapshots: optimize each time step independently, store it, and later render it from any virtual camera pose—enabling “rewind” + novel-view replay.
t” + render novel views.Below is the demo of the TACV GUI / interactive playback.
Below are selected quantitative results and qualitative figures. In many challenging sports/performance sequences, several baselines fail to produce usable reconstructions; therefore, this page emphasizes what TACV enables in the pre-calibrated multi-view setting, with comparisons shown where available.
TACV targets time-archival camera virtualization under a synchronized, calibrated multi-view capture setup
commonly used in sports broadcasting and stage performances. At each discrete time instance t,
N static cameras capture synchronized RGB images I_t = {I_t^(1), …, I_t^(N)}.
Camera intrinsics K_i and extrinsics (R_i, t_i) are known (or estimated),
providing strong geometric constraints at every time instance.
For each time t, TACV learns a temporally indexed functional scene representation F_t
(a time-specific implicit radiance field) to model RGB appearance at that moment, enabling novel-view synthesis for
any past or current time. We implement F_t as a compact neural implicit model: a small MLP augmented
with a multi-resolution hash-grid encoding for efficient, high-detail rendering. Each time step is optimized
independently, stored as a self-contained checkpoint for archival access, and can be parallelized across time
when multiple GPUs are available.
t.t.t (time-indexed).TACV evaluates on synthetic sports/performance datasets and real multi-camera captures. For research and evaluation, we provide example datasets and optional pretrained checkpoints:
| Dataset | Type | #Views | #Time instances | Notes |
|---|---|---|---|---|
| Dancing-Walking-Standing (DWS) | Synthetic | 100 | 65 | Multi-person motion |
| Soccer Penalty Kick (S-PK) | Synthetic | 60 | 109 | Soccer action |
| Soccer Multi-Player (S-MP) | Synthetic | 60 | 83 | Multiple players; occlusions |
| Baseball Bat | Real world | 31 | 100 | Fast motion |
| Hand Gesture | Real world | 31 | 201 | Non-rigid articulation |
Camera virtualization enables photorealistic novel-view synthesis for live performances and sports broadcasting using a limited set of synchronized, calibrated static cameras, but existing dynamic-scene methods still struggle to deliver spatially/temporally coherent rendering with practical time-archival capability in fast, multi-person motions. Dynamic 3D Gaussian Splatting variants can be real-time, yet often depend on accurate SfM point clouds and fragile temporal tracking assumptions that break under large non-rigid motion and independent multi-body interactions. TACV advocates revisiting a neural volume rendering formulation: it treats each time instant as a geometry-constrained multi-view snapshot (views differ only by rigid transforms at that time), learns a compact per-time-step implicit representation, and stores it for true time-archival— letting users “rewind” to any past moment and render novel viewpoints for replay, analysis, and long-horizon archival without requiring per-time-step point-cloud initialization or heavy temporal coupling.
Camera virtualization—an emerging solution to novel view synthesis—holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time- archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view- synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at Preprint submitted to Computer Vision and Image Understanding January 22, 2026 a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival , i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events—a functionality absent in existing neural rendering approaches and novel view synthesis methods for dynamic scenes. While, in principle, dynamic 3DGS approaches can also perform time-archival, however, it will require either a multi-view structure-from-motion (SfM) point cloud to be stored at every time step or some form of additional multi-body temporal modeling constraint—both of which are complex, computationally expensive, and could be memory-intensive. We argue that a dynamic scene observed under a well-constrained synchronized multiview setup—typical in sports and visual performance scenarios, is already strongly constrained by geometry, and we may not need a temporally coupled constraint or 3d point cloud initialization. Extensive experiment and ablations on established benchmarks and our newly proposed dynamic scene datasets demonstrate that our method surpasses 4DGS-based baselines in rendered image quality and other performance metric for time-archival view-synthesis for a dynamic scene, thus setting a new standard for virtual camera systems in dynamic visual media. Furthermore, our approach could be an encouraging step towards compactly modeling the plenoptic function, allowing for time-archival of a long video sequence.
Use the repo citation for now; update when the paper is public (DOI/arXiv).
@misc{zhang_tacv_code,
title = {Time-Archival Camera Virtualization (TACV) -- Code},
author = {Zhang, Yunxiao (Jack)},
year = {2026},
howpublished = {GitHub repository},
note = {Manuscript under revision at CVIU}
}
I am grateful to Prof. Suryansh Kumar for advising this project, drafting the manuscript and insightful discussions that shaped the problem framing and evaluation. Parts of the implementation are adapted from NVIDIA instant-ngp; we thank the authors for releasing their code and follow the original license and attribution requirements in this repository.