Self-Evolving Depth-Supervised

3D Gaussian Splatting from

Rendered Stereo Pairs

BMVC 2024


KUIS AI Center, Koç University1
University of Bologna2

3D Gaussian Splatting (3DGS) struggles to model the underlying 3D geometry accurately, as is evident in the depth maps generated by 3DGS itself. Supervising 3DGS with depth obtained from images through Structure-from-Motion (SfM), Depth Completion (DC), Monocular Depth Estimation (MDE), or Multi-View Stereo (MVS) may soften the problem. However, any of these techniques have shortcomings in terms of accuracy or generalization ability.


Method

We exploit 3DGS itself to render stereo pairs and process for more accurate depth supervision. Given a camera pose among those in the training set, we derive a corresponding right viewpoint in a fictitious stereo configuration according to an arbitrary stereo baseline. During training, for each image in the training set we render a corresponding right frame; we process the two through a stereo network to obtain depth. We train 3DGS by minimizing the difference between rendered and real images, as well as between rendered depth and the depth map obtained from stereo.


Results

Ours (RGB)
3DGS (RGB)
Ours (RGB)
3DGS (RGB)
Ours (RGB)
3DGS (RGB)
Ours (RGB)
3DGS (RGB)
Ours (Depth)
3DGS (Depth)
Ours (Depth)
3DGS (Depth)
Ours (Depth)
3DGS (Depth)
Ours (Depth)
3DGS (Depth)

BibTeX

@inproceedings{safadoust2024BMVC,
      title={Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs},
      author={Safadoust, Sadra and Tosi, Fabio and G{\"u}ney, Fatma and Poggi, Matteo},
      booktitle={British Machine Vision Conference (BMVC)},
      year={2024}
    }