Project Motivation - Dense Keypoint Detection on Hands

Up to this point, research on human poses were only focused on certain keypoints, such as the joints. However, human joints are not enough to create a dense 3D avatar of a human.
Preferably, a compact mesh of the entire surface of the body would be needed to do such a task.
In order to create a mesh, we would need to know as many locations of surface points as possible. That is, we need the ability to detect more keypoints other than human joints, such as points on the surface of the body.

There are several difficulties. For example, it would take a lot of time to create datasets with actual 3D meshes of people. It would also be time-consuming to create datasets with many keypoints on the human body, especially if certain parts of the body are occluded or if the image is blurry.

Hands are a key component of human communication. In particular, gestures often play a key role in helping face-to-face verbal communication. The goal of this project is to create a 2D dense keypoint detector on hand images in-the-wild. Hands are very difficult to detect, because they are small compared to the body. Also, they are often partially or wholly occluded.

Research on hand keypoints is also limited to joints and fingertips. Typically, 21 points are detected in datasets like OpenPose.
However, these points also would not be dense enough to generate 3D hands for avatars.

In this project, we aim to detect more keypoints on hands.
In particular, we want to detect “caging points”, such as the image on the left. These points “cage” the finger joints, wrist, nails, and palm.
There are a total of 115 keypoints, including 21 joints and fingertip keypoints in the canonical field.

Because hand keypoints are very difficult to detect, we expect that this research can be extended to dense keypoint detections on other parts of the body as well.