Experiments

Dataset

We conducted experiments on an aircraft fuselage to evaluate the performance of our pipeline (see image below). Artificial dents were introduced manually using a hammer, with depths ranging from 1–5 mm and an average depth of approximately 2 mm. In total, 391 dents were created across a 3 × 0.75 m area. Using our pipeline, scanning the entire fuselage required roughly 10 minutes.

The aircraft fuselage we experiment on

Rig Setup

To set up the two-camera stereo system, we built a rig that holds both cameras and the laser (see images below). This ensures that the cameras and laser share a common baseline and maintain fixed relative positions. Note that this configuration is intended only as a convenient test platform for evaluating our pipeline. For practical deployment, the system could be mounted on a drone carrying the cameras and the laser.

Rig setup that holds cameras and the laser

First Stage Results

The results of the first-stage pipeline are shown below. For images captured at a distance of 1.5 m, we achieve a recall of 93.6%. However, when the camera is moved farther from the aircraft—approximately 2.5 m—the recall drops sharply to 76.2%. This decline occurs because, at greater distances, many laser-line deformations fall below 1 px and become difficult for computer vision algorithms to detect. It is also worth noting that the precision at this stage is low: various artifacts such as screws, holes, and rivets can also distort the laser line, leading to numerous false positives. These are addressed in the second-stage pipeline.

Images taken with the cameras. The under-exposed image (right) is the input to the first stage pipeline.

DistanceTPFNFPPrecisionRecall
1.5m3662513173.6%93.6%
2.5m2989313469.0%76.2%

Result of the first-stage pipeline

Second Stage Results

For the second stage, we use the patches extracted in the first stage as our dataset. We evaluate the performance of the ML classifier using 5-fold cross-validation. The resulting confusion matrix is shown below, with the classifier achieving an accuracy of 89.6%. Note that this accuracy is somewhat limited by the relatively small size of our dataset (approximately 1,000 patches). We expect that expanding the dataset would substantially improve the classifier’s performance.

Input patches to the second stage pipeline

Predicted NegativePredicted Positive
Actual Negative0.890.10
Actual Positive0.110.90

Confusion Matrix of the Second Stage ML classifier

Final Results

The table below shows the performance of the whole pipeline. The 2nd stage pipeline will improve the precision by ~24% with the cost of slightly lowering the recall rate by ~8%.

DistanceTPFNFPPrecisionRecall
1.5m329621495.8%84.2%
2.5m2681221494.7%68.6%

Result of the whole pipeline