Alternative Approach

SliceNet

An alternative approach to solving MVS using Sphere Sweep and Cost Volume Computation, would be to predict depth using a single panorama image view. Intuitively, this makes sense as in all cases, the model outputs a panorama depth map anyway.

SliceNet

SliceNet[1] estimates depth from a single input panorama image.

  • Panorama image is fed to a pretrained ResNet50 feature extractor.​
  • The last 4 layer outputs are used to ensure that both, high level details and spatial context, are captured.​
  • These outputs are passed through 3 asymmetric 1×1 convolutional layers to reduce the channels and heights by a factor of 8.​
  • The width component is then resized to 512 by interpolation and the reshaped components are concatenated to get 512 column slices of feature vectors of length 1024.​
  • These slices sequentially represent the 3600 view and so, are passed through a bi-directional LSTM setup.​
  • The reshaped output is then upsampled to obtain the depth map.​

The authors used an Adaptive Reverse Huber Loss[2] to train this network.

This loss is essentially a combination of L1 and L2 loss. However, just this alone wasn’t enough. As per studies[3], CNNs tend to lose details during tasks such as depth estimation. Thus, the training signal also included loss terms penalizing the gradient along X and Y. These gradients were calculated using horizontal and vertical sobel filters[4].