DepthDive: Enhanced Underwater Depth Estimation using Monocular Images

Abhimanyu Bhowmik, Madhushree Sannigrahi, Krittapat Onthuam

Abstract: Accurate underwater depth estimation is vital for applications such as autonomous underwater vehicles, marine biology, and underwater archaeology. Traditional methods often rely on expensive and complex equipment, whereas monocular depth estimation offers a more cost-effective alternative. Despite significant advancements in terrestrial monocular depth estimation driven by deep learning, these models are inefficient in underwater environments due to challenges such as light attenuation, water turbidity, and data scarcity. This paper introduces DepthDive, a novel approach that adapts the Depth Anything Model (DAM) for underwater depth estimation using monocular images. The model is fine tuned via the parameter effcient fine tuning (PEFT), specifically low rank adaptation (LoRA). In addition, this work proposed a data sample filtering method to improve the quality of underwater depth dataset. Experimental results demonstrate that DepthDive significantly improves depth estimation accuracy in underwater environments, even with limited datasets, showcasing the potential of fine-tuning foundation models for specialized applications.

Analysis of Underwater Dataset

Context

We conducted an extensive survey to compile multiple small underwater datasets with reliable depth annotations. The table below provides a comprehensive comparison of various datasets, detailing their attributes such as camera type, size, image type, depth type, lighting conditions, depth range, and estimation methods.

Table

Name	Camera	Size	Image Type	Depth Type	Lighting	Depth (m)	Estimation Method
SQUID Berman et al., 2020	Stereo	57 (Video)	Natural	Real (Metric)	Clear	3-30	AprilTags with size reference
Eiffel Tower Boittiaux et al., 2023	Mono	18,082	Natural	Real (Relative)	Dark	1,700	Structure-From-Motion (SFM)
NAREON Dion´ısio et al., 2023	Mono	7,000	Natural	Real (Relative)	Varying	0.01 - 2.5	Hybrid imaging system
FLSea VI Randall et al., 2023	Mono	22,451	Natural	Real (Metric)	Varying	0-12	AprilTags with size reference
SeaThru Akkaynak et al., 2019	Mono	1,157	Natural	Real (Metric)	Clear	4-10	Structure-From-Motion (SFM)
VAROS Zwilgmeyer et al., 2021	Mono	4,713	Synthetic (Blender)	Real (Metric)	Dark	-	Information from Blender
ATLANTIS Zhang et al., 2023	Mono	3,200	Synthetic (Generated)	Real (Relative)	Varying	-	Using MiDas
DRUVA Varghese et al., 2023	Mono	20 (30 fps)	Natural	Generated (Relative)	Clear	3-6	Using USe-ReDI-Net
USOD 10k Hong et al., 2023	Mono	10,255	Natural	Generated (Relative)	Varying	5-60	Using DPT

Table Fields Description

Name: The dataset name and reference.
Camera: Type of camera used (Mono for Monocular, Stereo for Stereoscopic).
Size: Number of images or videos in the dataset.
Image Type: Whether the images are Natural or Synthetic.
Depth type: Indicates if depth data is Real (either Metric or Relative) or Generated.
Lighting: Describes the lighting conditions during image capture.
Depth (m): The range of depth values present in the dataset.
Estimation method: Method used to estimate or generate depth information.

This table provides a quick reference for researchers and developers working with these datasets, allowing for easier comparison and selection based on specific project requirements.

Proposed Methodology

Model Overview: Depth Anything Architecture

Utilising the same approach as MiDas, DepthAnything is a state-of-the-art monocular depth estimation model generally developed for general scene depth estimation. The model utilises both labelled and unlabelled datasets by adapting the teacher-student method. The teacher model learns the labelled dataset and predicts the pseudo-label of the unlabeled dataset. The student model is then able to learn from both datasets. The model excels in zero-shot depth estimation and is a potential baseline for underwater depth estimation.

DepthAnything Model

Evaluation matrics

Metric	Definition
Absolute Relative Error (AbsRel)	$\text{AbsRel} = \frac{1}{n} \sum_{i=1}^{n} \frac{\|d_i - \hat{d}_i\|}{d_i}$
Squared Relative Error (SqRel)	$\text{SqRel} = \frac{1}{n} \sum_{i=1}^{n} \frac{(d_i - \hat{d}_i)^2}{d_i}$
Root Mean Squared Error (RMSE)	$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (d_i - \hat{d}_i)^2}$
Logarithmic RMSE	$\text{RMSElog} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\log d_i - \log \hat{d}_i)^2}$
Scale Invariant MSE in Log Scale (SiLog)	$\text{SiLog} = \sqrt{\mathbb{E}\left[ (\log(\hat{d}) - \log(d))^2 \right] - \left( \mathbb{E}\left[ \log(\hat{d}) - \log(d) \right] \right)^2}$
Peak Signal-to-Noise Ratio (PSNR)	$\text{PSNR} = 10 \log_{10} \left(\frac{\text{MAX}^2_d}{\sqrt{\text{MSE}}}\right)$
Structural Similarity Index (SSIM)	$\text{SSIM}(d, \hat{d}) = \frac{(2 \mu_d \mu_{\hat{d}} + C_1)(2 \sigma_{d \hat{d}} + C_2)}{(\mu_d^2 + \mu_{\hat{d}}^2 + C_1)(\sigma_d^2 + \sigma_{\hat{d}}^2 + C_2)}$
Pearson Correlation	$\text{r} = \frac{\sum_{i=1}^{n} (d_i - \bar{d})(\hat{d}_i - \bar{\hat{d}})}{\sqrt{\sum_{i=1}^{n} (d_i - \bar{d})^2 \sum_{i=1}^{n} (\hat{d}_i - \bar{\hat{d}})^2}}$
$\delta_i$	$\delta_i = \text{percentage of } \left(\max\left(\frac{d}{\hat{d}}, \frac{\hat{d}}{d}\right) < 1.25^i\right)$

Dataset Used

we analyzed almost all available underwater datasets to combine them for training our model. However, as shown in Table, many of these datasets were of poor quality. Benchmark datasets in the literature, such as SQUID and SeaThru , have unreliable depth maps with missing objects. These maps are typically generated using the Structure-from-Motion (SFM) technique, which often blurs distant objects and fails to capture the depth of moving objects. The most accurate ground truths are found in synthetically generated datasets like VAROS and ATLANTIS. However, these synthetic datasets do not fully mimic real-world conditions, as they lack the presence of moving objects and the varying light and turbidity conditions found in actual underwater environments.

Data Sample Filtering

For real-world datasets, we often get inaccurate ground truth values. If we finetune our model on those data points, the model might learn biased distributions of depth maps. To avoid this, we developed a method, which can eliminate inaccurate ground truths, providing a better dataset for model fine-tuning. Our method includes converting RGB images to RMI input space, which takes into account underwater light characteristics of propagation. The red wavelength suffers more aggressive attenuation underwater, so the relative differences between {R} channel and {G, B} channel values can provide useful depth information for a given pixel. We take the maximum value of {B} and {G} channels and mask the pixels, which have zero depth values in the ground truth. The resultant images are given in figure.

(a)

(b)

(a) Data Sample Filtering procedure, (b) Results after using the filter method.

Results

Table 1: Performance Comparison of Depth Anything Model with VAROS Dataset

	AbsRel ↓	SqRel ↓	RMSE ↓	RMSElog ↓	SIlog ↓	log10 ↓	PSNR↑	SSIM↑	Person corr↑	δ1 ↑	δ2 ↑	δ3 ↑
Without Training	5.9228	52.6330	5.8734	1.8618	0.6001	0.7679	8.4031	0.7501	0.7093	0.0189	0.0404	0.0669
5 Epochs Training	0.2336	0.2696	0.1581	0.2753	0.2195	0.0780	18.8912	0.9367	0.7885	0.7878	0.9220	0.9567

Table 2: Performance Comparison of Depth Anything Model with FlSeaVI Dataset

	AbsRel	SqRel ↓	RMSE ↓	RMSElog ↓	SIlog ↓	log10 ↓	PSNR↑	SSIM↑	Person corr↑	δ1 ↑	δ2 ↑	δ3 ↑
Without Training	3.8750	36.1175	7.5801	1.4900	0.9353	0.5726	11.7313	0.7005	-0.8213	0.0796	0.1608	0.2440
5 Epochs Training	0.0762	0.4483	0.7114	0.3690	0.3629	0.0404	24.3683	0.9488	0.8658	0.9633	0.9753	0.9794

Table 3: Performance Comparison of Different Models on FLSeaVI and SeaThru Datasets

Dataset	Model	AbsRel ↓	SqRel ↓	RMSE ↓	RMSElog ↓	δ1 ↑	δ2 ↑	δ3 ↑
FLSeaVI	UW-Net\cite{gupta2019unsupervised}	0.527	1.765	1.725	1.961	0.337	0.565	0.699
FLSeaVI	Amitai et al\cite{amitai2023self}	0.203	1.955	1.546	0.245	0.768	0.923	0.966
FLSeaVI	Ours	0.0762	0.4483	0.7114	0.3690	0.9633	0.9753	0.9794
SeaThru (D3 and D5)	IDisc-KITTI\cite{piccinelli2023idisc}	4.702	4.4288	5.891	1.192	0.093	0.241	0.359
SeaThru (D3 and D5)	IDisc-Atlantis \cite{zhang2023atlantis}	1.630	1.4279	1.371	0.354	0.553	0.850	0.955
SeaThru (D3 and D5)	NewCRFs-KITTI\cite{yuan2022neural}	2.874	1.5768	3.251	0.934	0.213	0.375	0.465
SeaThru (D3 and D5)	NewCRFs-Atlantis \cite{zhang2023atlantis}	1.683	1.4764	1.435	0.378	0.476	0.837	0.952
SeaThru (D3 and D5)	Ours	0.7925	0.9480	1.6575	0.8268	0.1797	0.4052	0.6128

PSNR and SSIM plot for various Synthetic Data Ratio

(A) Image samples before training, (B) Image samples after training

Contact

For any queries, please contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Data_Transfer		Data_Transfer
DepthAnythingPEFT		DepthAnythingPEFT
Images		Images
Literature Review		Literature Review
Report		Report
initial_scripts		initial_scripts
result_csv/VAROS/eval		result_csv/VAROS/eval
results		results
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DepthDive: Enhanced Underwater Depth Estimation using Monocular Images

Analysis of Underwater Dataset

Context

Table

Table Fields Description

Proposed Methodology

Model Overview: Depth Anything Architecture

Evaluation matrics

Dataset Used

Data Sample Filtering

Results

Table 1: Performance Comparison of Depth Anything Model with VAROS Dataset

Table 2: Performance Comparison of Depth Anything Model with FlSeaVI Dataset

Table 3: Performance Comparison of Different Models on FLSeaVI and SeaThru Datasets

Contact

About

Releases

Packages

Contributors 3

Languages

License

abhimanyubhowmik/Underwater_Depth_Estimation

Folders and files

Latest commit

History

Repository files navigation

DepthDive: Enhanced Underwater Depth Estimation using Monocular Images

Analysis of Underwater Dataset

Context

Table

Table Fields Description

Proposed Methodology

Model Overview: Depth Anything Architecture

Evaluation matrics

Dataset Used

Data Sample Filtering

Results

Table 1: Performance Comparison of Depth Anything Model with VAROS Dataset

Table 2: Performance Comparison of Depth Anything Model with FlSeaVI Dataset

Table 3: Performance Comparison of Different Models on FLSeaVI and SeaThru Datasets

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages