Overview
There are 3 main methods for pose estimation that have driven the ideas behind our project.
Direct Regression Methods
Direct regression-based methods for pose estimation directly regress the rotation and translation pose parameters. Although this method is more efficient than keypoint-based methods, it tends to be less accurate. A popular work in this class of methods is “A Pose Proposal and Refinement Network for Better 6D Object Pose Estimation.”
Key-Point Based Methods + PnP
Keypoint-based methods take in an image and output a set of 2D keypoints. Since we have access to a 3D model of the aircraft at inference time, we can use keypoint-based methods to find 2D-3D correspondences. One downside of this approach is that it requires keypoints to be visible in the image. If there is significant occlusion and there aren’t enough visible keypoints, it will not be possible to find enough correspondences to solve for the pose. Then using Perspective-n-Point (PnP), we can solve for the desired rotation and translation. A popular work in this class of methods is “6-DoF Object Pose from Semantic Keypoints.”
Dense Prediction Methods
This class of methods predicts dense representations that can be leveraged to estimate pose. A popular work in this class of methods, “PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation” predicts per-pixel vector fields for each keypoint that are then used in conjunction with RANSAC to vote for keypoint locations. These 2D keypoint locations are then used with PnP to solve for object pose as sparse keypoint methods do. Another popular work in this class of methods is “ROCA: Robust CAD Model Retrieval and Alignment from a Single Image,” which predicts 2D-3D correspondences in the Normalized Object Coordinate (NOC) space along with depth maps to regress pose parameters. A downside of this dense prediction approach is that it requires high-quality annotations that are often difficult to acquire outside of simulation.
Iterative Pose Refinement
This method can be applied to any of the previously mentioned methods. As the name suggests, this technique can refine initial pose estimates by initializing predictions with these initial estimates and continuously predicting and initializing for multiple time steps until the pose difference between the predicted pose and ground truth pose is below a certain threshold. Although this method yields accurate pose estimates, it comes at a cost in efficiency. A popular work that uses this iterative refinement method is “DeepIM: Deep Iterative Matching for 6D Pose Estimation.”