Motivation
Modeling and understanding pedestrian behavior is an important component of building safe and secure smart cities. It is one of the primary components of video surveillance and has drawn increasing attention in recent years for various applications like pedestrian walking path prediction, traffic flow segmentation, crowd counting and segmentation, and abnormal event detection. So why predict pedestrian trajectories? Some applications are:
Why is it challenging?
Pedestrian behavior modeling is challenging, especially for scenes with crowds. Previous studies have shown that the walking behavior of an individual can be influenced by a variety of factors including scene layout (e.g. entrances, exits, walls, and obstacles), pedestrian beliefs (the choice of source and destination), and interactions with other moving pedestrians.
As seen in the figure above, the same action can have different meanings based on the context and situation. Thus, understanding the context of specific human actions can help predict anomalous activities like crimes in advance. This will ultimately enable us to build behavioral twins of intersections in smart cities.
Problem Statement
As explained in our motivation above, we aim to model and understand human behavior at traffic intersections. We aim to leverage 2D pose estimation in multiple views, triangulation, and 3D trajectory forecasting to predict 3D pose trajectories.
The two possible scenarios for this problem statement are trajectory forecasting and action forecasting. However, the focus of this project is trajectory forecasting.
The project goals are:
1. Predict 3D trajectory and poses for each pedestrian in the scene
2. Model each pedestrian with a 3D skeleton and not just 2D point trajectories.
3. Leverage high-resolution and time-synchronized birds-eye-view static cameras with known camera matrices.
The three steps to solving this problem are:
Step 1: Pose estimation
Pose estimation helps predict the 2D joint locations of every pedestrian in the frame.
Step 2: Triangulation
Triangulation is used to obtain ground truth 3D pose sequences for each pedestrian. Using the camera matrices and 2D pose information of a pedestrian from at least two camera views, we can estimate the 3D pose information of the given pedestrian.
Step 3: Trajectory Forecasting
The final block in our pipeline is trajectory forecasting. It uses the 3D pose sequence information to predict the most probable 3D trajectories for each pedestrian.
Code
The code for our project can be found at https://github.com/Michael-MuChienHsu/pedestrian_prediction