Accelerating the Creation of AI Models to Combat COVID-19 with Element AI Orkestrator
Issam Laradji Issam Laradji
July 3 5 min

Accelerating the Creation of AI Models to Combat COVID-19 with Element AI Orkestrator

In collaboration with the University of British Columbia (UBC), the Vancouver Coastal Health Research Institute (VCHRI), radiologists at Vancouver General Hospital (VGH), Amazon Web Services (AWS) and SapienML, Element AI postdoctoral researcher Issam Laradji contributed to this project to develop an open-source artificial intelligence (AI) model for analyzing COVID-19 infections in chest CT-Scans.

As an AI researcher, the potential for applying deep learning methods to medical science is truly exciting. So when I was recently invited to participate in a project that aims to use AI to diagnose COVID-19 based on lung scans, I jumped at the opportunity. Led by Dr. Savvas Nicolaou and Dr. William Parker, both of the University of British Columbia (UBC), the initiative collected hundreds of patient lung scans from countries around the world and enlisted radiologists and medical students to assess and label each one. Once these images were properly labelled, my collaborators and I used them to build AI models that can analyze new lung scans without the need for a medical expert to weigh in.

Using AI to more efficiently find patterns and matches in large numbers of CT scans of lungs would mean that diagnoses could potentially be made faster and at lower cost. But developing these kinds of AI models is not easy, and you need the right tools to make a difference.

Element AI is a company that develops transformative AI solutions to bridge human and machine collaboration. Our fundamental research teams are constantly launching compute-intensive experiments and projects that could put a strain on our IT infrastructure. To address the growing needs for compute power and better access to our GPU clusters, we developed Element AI Orkestrator, a software that actively manages the allocation of GPU resources. In this article, I'll explain some of the obstacles we faced in this research project and how Element AI Orkestrator was indispensable in helping us overcome them.

When you're dealing with a project of this scale, one of the most significant challenges is working with medical datasets, which often have imperfect segmentation labels. Segmentation labels identify the infected regions on a particular slice of a scan and are often incomplete and very noisy. The medical professionals who annotate these images sometimes have different ways of labelling the infected regions, which can result in inconsistencies that can make it difficult for the AI models to parse. Another challenge is that labelling a single slice of a scan — which is a 2D model at a specific height of, in this case, the lungs — can take a long time. The outcome is that you end up with very limited annotated data.

For this project, I launched hundreds of computationally heavy deep learning experiments every few days. This meant running a large number of hyperparameter searches; I searched over about 20 different model architectures and a range of different batch sizes, loss functions and data augmentation methods. Without support from Element AI Orkestrator, running these experiments would have been much more difficult — if not impossible. Using this software together with our Python library Haven, with just a few simple commands we can run and manage thousands and thousands of experiments in parallel. Of course one of the challenges of running such a high number of experiments is understanding which ones are succeeding and which ones are doing poorly.

Ork and Haven

These tools allowed us to monitor all of our experiments at the same time and organize them in such a way that it was easy to filter and interact with them. Any failures could be clustered together, so we could investigate the logs to understand why they failed, and then debug and rerun just those specific experiments. Conversely, we were also able to identify and extract the most interesting handful of experiments out of thousands, displayed in a dynamically generated leaderboard and plots that made it easy to evaluate their performance. In the end, I was able to move down to a small subset of experiments that were successful. And ultimately, the platform enabled the scale that the project needed and also improved the overall workflow, eliminating a lot of the engineering I would have otherwise needed to do, and allowing me to launch and effectively manage a huge number of experiments in parallel.

This was the first phase of a project that I hope will continue. Given the time and resources it can take to label medical images, one path that I think we’d like to explore is how we can leverage active learning to reduce the amount of data we need to train our models. I’m looking forward to continuing to collaborate on this inventive project that I hope will save lives and help us move forward from the coronavirus crisis.

Funding for this project is provided by the UBC Community Health and Wellbeing Cloud Innovation Centre (UBC CIC), powered by Amazon Web Services (AWS), as well as the AWS Diagnostic Development Initiative (DDI).