Bidirectional Flow Fields for Sparse Input Novel View Synthesis Of Dynamic Scenes

IEEE ICIP, 2025

Kapl Choudhary, Nagabhushan Somraj and Rajiv Soundararajan

Indian Institute of Science

Video Comparisons

We organize the video comparisons into 3 sets:

Baseline Comparison: We compare our model (BF-DeRF) with 4DGS, STGS, and RF-DeRF across different datasets.
Impact of Initialization & Hyperparameters: We present findings on how initialization and hyperparameter sensitivity affect Gaussian splatting models.
Visualization Results: Finally, we showcase the visualization results obtained by our model.

The videos are encoded using H.264 codec with yuv420p as the pixel format, and at a frame rate of 30fps.

Comparisons with Baselines

4DGS vs BF-DeRF(ours)

Scene details: InterDigital dataset, Birthday scene with 3 input views. BF-DeRF successfully preserves the writing on the hanging balloon, whereas 4DGS fails to capture the details, including the moving person. Additionally, the red paper flower disk on the left side of the scene becomes deformed in 4DGS, while BF-DeRF reconstructs its structure more accurately

Scene details: InterDigital dataset, Theater Scene 3 input views. While 4DGS struggles to fully reconstruct the puppet, BF-DeRF achieves a significantly better reconstruction.

STGS vs BF-DeRF(ours)

Scene details: Nvidia dataset, balloon2 scene with 3 input views. STGS reconstruction fails to render details (green strips) on the dinosaur balloon, Whereas our model reconstructs finer details better. The STGS-rendered video looks dark-shaded because the model is very sensitive to hyperparameters. Ground Truth video lighting is similar to BF-DeRF model-rendered video (check the video just below).

Scene details: Same comparison as above example with Ground Truth video comparision. This comparision shows that STGS-rendered video looks dark-shaded, which seems to learn different color grading in novel view. Ground Truth and BF-DeRF model video lighting are similar.

RF-DeRF vs BF-DeRF(ours)

Scene details: N3DV dataset, flame_salmon_1 scene with 3 input views. The reconstruction quality around both moving regions (face, flame, and burner) and static regions(objects on top of cupboard, left poster hanging on wall, and windows behind the person) is much better in our BF-DeRF model because of bi-directional flow fields whereas RF-DeRF fails to render good-quality videos due to the unidirectional motion model.

Impact of Initialization & hyperparameters

4DGS vs BF-DeRF(ours)

Scene details: Effects of poor initializations 4DGS on 2 input views. Nvidia dataset balloon2 scene, from 4DGS rendered video, we can not perceive fine details of the scene. In contrast, our model is able to get a relatively better reconstruction of the fine details of the scenes(green strips).

STGS vs BF-DeRF(ours)

Scene details: Effects of sensitive hyperparameter, STGS on 2 input views. Nvidia dataset balloon1 scene, from STGS rendered video, we can not perceive any information about the scene and the color of the rendered video is very different from original colors. In contrast, our volumetric model is able to get a relatively better understanding of the scene.

Scene details: Effects of sensitive hyperparameter, STGS on 2 input views. Nvidia dataset balloon2 scene, from STGS rendered video, we can clearly see the dark blue colored video because of hyperparameters and also we can not perceive any information about the scene. In contrast, our volumetric model is able to get a relatively better understanding of the scene.

Scene Flow Visualization

Pixel Tracking

Scene details: cook spinach from N3DV dataset. Flow visualization: colored circles are starting pixels which are being traced trhought video. The train behind these pixels shows the path taken by these pixels.