ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

ECCV 2024

Sungmin Woo*, Wonjoon Lee*, Woojin Kim, Dogyoon Lee, Sangyoun Lee
Yonsei University

ProDepth performs uncertainty-aware adaptive fusion of the depth probability distributions from both single-frame and multi-frame cues. The uncertainty indicates the probability of dynamic pixels.

Abstract

Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training.

In this paper, we propose a novel framework called ProDepth, which effectively addresses the mismatch problem caused by dynamic objects using a probabilistic approach. We initially deduce the uncertainty associated with static scene assumption by adopting an auxiliary decoder. This decoder analyzes inconsistencies embedded in the cost volume, inferring the probability of areas being dynamic. We then directly rectify the erroneous cost volume for dynamic areas through a Probabilistic Cost Volume Modulation (PCVM) module. Specifically, we derive probability distributions of depth candidates from both singleframe and multi-frame cues, modulating the cost volume by adaptively fusing those distributions based on the inferred uncertainty. Additionally, we present a self-supervision loss reweighting strategy that not only masks out incorrect supervision with high uncertainty but also mitigates the risks in remaining possible dynamic areas in accordance with the probability. Our proposed method excels over state-of-the-art approaches in all metrics on both Cityscapes and KITTI datasets, and demonstrates superior generalization ability on the Waymo Open dataset.

 

Methodology

An overview of the proposed ProDepth.

 

Depth and Error Maps

Error maps depict large depth errors in red and small in blue.

Identification of Dynamic Areas

Without using additional semantic information, ProDepth identifies the dynamic areas as a probabilistic representation.

Quantitative Results

Quantitative comparison of ProDepth with SOTA self-supervised depth estimtators on KITTI and Cityscapes datasets.

BibTeX

@article{woo2024prodepth,
  title={ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion},
  author={Woo, Sungmin and Lee, Wonjoon and Kim, Woo Jin and Lee, Dogyoon and Lee, Sangyoun},
  journal={arXiv preprint arXiv:2407.09303},
  year={2024}
}