M
es
Boat Clasification and attention
Completed

Boat Clasification and attention

Python PyTorch Captum

This project focuses on the development of a computer vision system for boat classification. The main objective is to detect the presence of boats in images and determine whether they are docked or navigating. We present the decisions made regarding data transformations, architecture, hyperparameters, and results obtained in this work.

1. Methodology

1.1. Image Preprocessing: Data Augmentation

The data transformations were performed taking into account the domain characteristics, including horizontal flips, rotations of no more than 5 degrees, random adjustments of brightness, contrast, and saturation, and Gaussian blur. We highlight RandomCutout, a class created to randomly cut a rectangle in any part of the image, not exceeding a given cut ratio. All these transformations were applied randomly using RandomApply.

Example of transformed images

Figure 1: Example of transformed images.

Since a model pretrained on ImageNet images is used, normalization and resizing techniques based on that dataset's parameters were adopted.

The decision to resize images directly (considering potential quality loss) and not add padding was because, in model training, it focused on the added black borders rather than the image. We can observe this with the help of Captum. In Figure 2, we show which regions the pretrained model without data augmentation paid more attention to when predicting a test image (the model is the one with the lowest loss in the validation set, across all k folds).

Attention in model regions (with padding)

Figure 2: Attention in model regions (with padding).

And we see how for the same model (Figure 3), changing only the way data is transformed, it focuses on more specific parts of the image. This approach improves metrics on the test set (Table 1).

Attention in model regions (without padding)

Figure 3: Attention in model regions (without padding).

Metric Model with Padding Model without Padding
Test Accuracy 0.93 0.95
Test Recall 0.9232 0.9375
Test F1 0.9287 0.9461
Test Precision 0.9367 0.9605
Test AUC 0.9738 0.9905

Table 1: Comparison of metrics between model with and without padding.

1.2. Neural Network Architecture

The architecture used is MobileNet, a not very complex architecture. Modifying the last layer to perform binary classification.

1.3. Training and Validation Strategies

Stratified k-fold with k=5 was used to ensure folds maintained a balanced class distribution. Additionally, a sampler was implemented during training to balance classes in each batch. Earlystopping technique was used, and the best model of each fold was stored according to the loss metric.

The selection of these hyperparameters depended fundamentally on the model weights initialization, as in models with random weights, convergence was much slower.

Metric Pretrained Random weights
Epochs 12 25
Patience 3 11
Learning rate 0.001 0.01
Optimizer ADAM ADAM
Criteria Cross entropy Cross entropy

Table 2: Selected hyperparameters

2. Experiments and Results

2.1. Boat/No-boat Classification Model

Below, we present the graphs of metric evolution in the validation set for the four models (with/without data augmentation and with/without random weights).

I. Pretrained model without data augmentation (preT_wF):

Validation set metrics pretrained without data augmentation

Figure 5: Validation set metrics pretrained without data augmentation.

II. Pretrained model with data augmentation (preT_wT):

Validation set metrics pretrained with data augmentation

Figure 6: Validation set metrics pretrained with data augmentation.

III. Random weights without data augmentation (preF_wF):

Validation set metrics random weights without data augmentation

Figure 7: Validation set metrics random weights without data augmentation.

IV. Random weights with data augmentation (preF_wT):

Validation set metrics random weights with data augmentation

Figure 8: Validation set metrics random weights with data augmentation.

We show the metrics (Table 3) and ROC curves (Figures 9, 10, 11, and 12) on the test set of the k models obtained for each initialization (as indicated before, in each fold we stored the one with the lowest loss in the validation set).

Model Accuracy µ Accuracy σ Recall µ Recall σ F1 µ F1 σ Precision µ Precision σ AUC µ AUC σ
preT_wF 0.93 0.02 0.9083 0.0250 0.9197 0.0227 0.9446 0.0136 0.9688 0.0116
preT_wT 0.92 0.02 0.9092 0.0279 0.9139 0.0225 0.9260 0.0142 0.9805 0.0061
preF_wF 0.85 0.03 0.8232 0.0332 0.8354 0.0338 0.8793 0.0311 0.8992 0.0410
preF_wT 0.84 0.03 0.8149 0.0322 0.8269 0.0330 0.8752 0.0280 0.9195 0.0190

Table 3: Mean and standard deviation of best model metrics per fold in test.

ROC Curve preT_wF ROC Curve preT_wT ROC Curve preF_wF ROC Curve preF_wT

Figures 9, 10, 11, and 12: ROC curves for the different models.

Finally, we show the metrics of the best model for each initialization (understanding by best, the one with the lowest loss in the test set, not in validation).

Model Accuracy Recall F1 Precision AUC
preT_wF 0.9492 0.9375 0.9461 0.9605 0.9905
preT_wT 0.9322 0.9232 0.9287 0.9367 0.9893
preF_wF 0.8983 0.8750 0.8891 0.9268 0.9333
preF_wT 0.8644 0.8464 0.8550 0.8731 0.9262

Table 4: Model metrics

It seems very interesting to see how the 4 models are able to classify Figure 3 shown previously correctly, but paying attention to different parts of the image.

Attention model preT_wF Attention model preT_wT Attention model preF_wF Attention model preF_wT

Figures 13, 14, 15, and 16: Attention of models preT_wF, preT_wT, preF_wF, and preF_wT.

2.2. Docked/Not-docked Boat Classification Model

For this model, we use the previously trained model that obtained the best loss. And we train it following the same process, but instead of 4 different models, we will use the one already trained with data augmentation.

Training metrics on the validation set:

Validation set metrics pretrained with data augmentation

Figure 17: Validation set metrics pretrained with data augmentation.

Metrics and ROC curves on the test set of the best models per fold:

Model Accuracy µ Accuracy σ Recall µ Recall σ F1 µ F1 σ Precision µ Precision σ AUC µ AUC σ
preT_wT 0.87 0.05 0.8687 0.0484 0.8679 0.0496 0.8835 0.0342 0.9427 0.0218

Table 5: Mean and standard deviation of best model metrics per fold in test.

ROC preT_wT

Figure 18: ROC preT_wT.

Metrics of the best model on the test set:

Model Accuracy Recall F1 Precision AUC
preT_wT 0.9189 0.9196 0.9189 0.9196 0.9766

Table 6: Statistics of the preT_wT model

Finally, it is interesting to see again how the attention of the new model changes compared to the previous pretrained one:

Original image Attention model preT_wF Attention model preT_wT

Figures 19, 20, and 21: Original image, attention model preT_wF (boats), and attention model preT_wT (docked boats).