Road defect detection using deep active learning
May 4 5 min

Road defect detection using deep active learning

At Element AI, our teams use our active learning library BaaL to quickly move from labelling to production models. If you would like to get a good introduction to active learning, we recommend that you read our initial release blog post. Recently, the ability to detect road surface defects was identified as an interesting use case for active learning. The end goal was to automatically determine if a segment of road needed to be resurfaced. More specifically, we needed a rough estimate of the defect area. For this reason, we treated this problem as a semantic segmentation problem.

Data definition

We were able to find a public dataset, but unfortunately, the labels provided were for bounding boxes only. Consequently, to generate the polygon labels required for semantic segmentation, we involved the Element AI data labelling team to help us define our task.

We identified three types of defects and another feature for detection:

  1. Cracks
  2. Patches
  3. Manholes
cracks patches manholes

Active learning model definition

To perform active learning, we rely on MC-Dropout (Gal et al.) and BALD (Houlsby et al.) to estimate the uncertainty of each unlabelled sample.

Our model is a U-Net (Ronneberger et al.), to which we added a Dropout layer before the last convolution. This Dropout layer allows us to use MC-Dropout (more on that later). We trained our network using standard weighted cross-entropy. The weights are automatically detected by the proportion of pixels per class at each active learning step.

Without Active Learning (traditional method)

Linear process where labelling takes place before the model is trained.

without learning

With Active Learning (AI-enhanced method)

AI model is always in the loop and learns throughout the labelling process.

with learning

MC-Dropout and BALD

Monte-Carlo Dropout—otherwise known as MC-Dropout—is a technique proposed by Gal et al. in which they estimate the posterior distribution of the model using Dropout. It can be shown that Dropout acts practically like a Bayesian ensemble. By doing Monte-Carlo estimation of this prior distribution, we get a distribution of predictions that may have high variance if the model parameters are uncertain. This technique can only estimate the epistemic uncertainty.

Bayesian Active Learning by Disagreement (BALD) is a heuristic that can be used with MC-Dropout to quantify the uncertainty of a distribution. A nice property of BALD is that is doesn’t make a Gaussian assumption on the distribution like variance.


Something we need to remember when using active learning in a real-world project is that we need to recompute uncertainty as fast as possible. To do so, we need to limit the number of MC estimations because they slow down the process. In this case, we limited the number of MC samplings to 20. In our experiments, we saw that it created a good trade-off between speed and the quality of the uncertainties that were computed using our method.

Labelling with active learning

At Element AI, we have a team of labellers with whom the active learning team collaborates.

They work closely with the data scientist in charge of the project so that the data can be easily integrated into their machine learning pipeline. We believe that having a conversation with machine learning experts can help labellers produce high-impact labels and experts to better understand the data.

Cold-start problem

Active learning suffers from the cold-start problem, which means we can’t use it before many samples have been labelled. While some approaches such as coresets (Sener and Savarese) and few-shot learning (Snell et al.) have been proposed, we have not yet experimented with them. Consequently, we randomly label a small amount of data to create a test set and an initial training dataset. In our future work, we aim to integrate coresets as the first step of our active-learning pipeline.

When to stop

We monitored several metrics during labelling, including:

  • Precision
  • Recall
  • Validation Loss

We explained the metrics to the labelling team and showed them what a converged process looked like.

This way, the labelling team can be autonomous and connect with the data science expert assigned to the project when the process converges or when the labelling budget is reached.


After a few days of labelling, we decided to stop the labelling effort as the model converged. Here are our findings.

BaaL results

Left: Precision and recall improved for cracks, patches and manhole covers as the number of labelled samples increased. Right: The model prioritized labelling the samples that would most maximize its performance, meaning the model trained in significantly less time and using fewer data points.

Time saved

There are 9,900 samples in this dataset; we labelled 900 for training as well as 90 for validation.

Based on a market study aggregating crowdsourced data with other data sources, the cost of labelling one image is estimated at 50 cents when requiring high-quality labelling.

Based on the public dataset used, we also estimate the time to label one image to be 30 seconds. In the end, because we only labelled 990 out of 9900 images, we saved $4,500 and 75 hours.


Here are some predictions from the resulting models trained on less than 10% of the data.

predictions baal


We just released BaaL 1.2.0, which includes our new calibration module and more! See the changelist at

Authors: Parmida Atighehchian, Frédéric Branchaud-Charron and Lorne Schell