We aim to explore the possibility to leverage new capturing technique and sensors to recover a scene/object shape and appearance. Among many advance sensors that provides potential depth information (i.e. Structured light sensor, Time-of-Flight sensor, etc), we focus on the application of focal stack due to its availability and passive nature. In this project, we answer the question of recovering depth information from focal stack input by learning a Multiplane Image (MPI) representation of the scene/object.


Recovering shape of a scene/object has always been an interesting question, especially with monocular setting. The ability to use focal stack to acquire such information is even more interesting under certain scenarios, such as microscopy and photography. Most current approaches rely on learning prior from large dataset. We, on the other hand, want to focus on a single scene and formulate this as an inverse rendering problem and learn a 3D representation of the scene/object, which allows us to not only recover the depth information but also re-render the scene/object with novel visual effects.

Problem Statement

Given focal stack images, an all-in-focus image, and known camera and lens parameters, we want to learn a Multiplane Image representation of the scene/object and recover the depth map from it.


Overview of the framework

Our approach directly optimize a learnable Multiplane Image for a specific scene. The scene reconstructed scene is acquired by applying a disk kernel with the corresponding MPI layers. We supervise the learning process by minimizing the photometric loss between the reconstructed scene and known target scene. More details of the approach, its limitation and potential improvements will be discussed in Approaches and Results.