Synthetic Data Collection
Framework
We must have sufficient data to train our models such that they can generalize to different kinds of aircraft and to different hangar environments.
Since it’s difficult to obtain enough training data containing real aircraft, we create a synthetic dataset using a 3D CAD model. We use a CAD model of a C17 aircraft downloaded from the internet for this purpose. The downloaded model is rendered from thousands of randomly generated viewpoints to create a dataset containing images of aircraft and corresponding poses.

However, this only creates a dataset of aircraft rendered on plain backgrounds. To make the data more realistic and to teach the network to distinguish between foreground and background, we compose the rendered images of aircraft with plausible background images downloaded from the internet. Composing aircraft images with random backgrounds also serves the additional purpose of artificially expanding the size of the dataset. We generate 5000 images of aircraft and download 300 background images. By randomly composing foreground and background images, we expand the size of the dataset to 5000×300 = 1.5 million. The data generation pipeline described is shown in the figure above.
Dataset

The figure above shows examples of images used as inputs to the model during training.
Model Training
Overview

We train a model to directly regress the pose given an image containing an aircraft. We parametrize the rotation using quaternions to improve the trainability and training stability of our model. This simple framework is shown in the figure above.
Training Configuration
The configuration of training is described in the table below:
| Encoder | ResNet-34 without final classifier | 
| Regressor | Fully connected layer with output size 7 | 
| Loss Function | L1 loss | 
| Learning Rate | 1e-3 with step decay after each epoch | 
| Batch Size | 16 | 
| Epochs | 100 | 
