Machine learning based tools in the analysis of RWD; focus on image recognition

Healthcare provides various types of Real World Data (RWD) that can be used to generate Real World Evidence. Traditional statistical approaches are suitable for the majority of the data (numbers, measures, classes), however, utilizing the full potential of more complex data types (such as images, free text, and various signals from medical devices) require more sophisticated analytic tools. This is where machine learning (ML) based algorithms and artificial intelligence (AI) have a role to play.

The Finnish RWD/RWE landscape has a lot to offer. The various registries, biobanks and hospital data-lakes collect massive amounts of data from everyday patient care and beyond. Some can offer nation-wide coverage, some almost real time updated data. These data sources can effectively be used in RWE generation via research projects. Usually, the main interest has been on structured data (numbers and labels in a table) that can be easily processed and analyzed.

However, some of the data-sources (namely hospital data-lakes and biobanks) collect and store also more complex data-types and formats, such as images (for example from pathology, or radiology), unstructured text (such as patient records, and medication prescriptions), and signals from various measurement devices (such as ECG signals). These data formats are difficult or impossible to analyze using traditional statistical approaches. Thus there is a need for more sophisticated analysis tools to unleash the full potential of the data.

Auria Biobank is one of the Finnish data sources that collect and utilize these data in scientific research purposes. The development manager of Auria Biobank, Antti Karlsson, is working hands-on with these tools. He was a speaker in Medaffcon’s Customer Evening and shared his insight on neural networks, with emphasis on image processing.

Neural networks

Neural networks have solved multiple challenging problems. Problems that were believed to be almost impossible for computers to solve only a decade ago. When it comes to the analysis of images, neural networks can, for example, classify images with very high accuracy. These tools have made the recent leaps possible for example in the fields of artificial vision and facial recognition. Everyday applications that are based on advanced image analytics include i.e. Google’s reverse image search, and real time “selfie filters” used in Snapchat and other photo/video apps.

However, in the medical field, these approaches are still mainly limited to academic research. Even though the ML based image classifiers can yield amazing classification accuracies or detect patterns and differences that are impossible for humans to do.

One of the limitations, why this is still the case, is the “lack of training data” in the medical field. In non-medical fields of science, the images that can be used for ML training are generally easily and publicly available or quick and cheap to generate. However, in the case of medical imaging, the images are sensitive patient data and thus not openly distributed due to patient security. Additionally, taking medical images is relatively slow, expensive, and might require invasive procedures (such as biopsies to generate histology images). Especially correctly pre-labeled medical images are hard to find, making the classifier training hard or even impossible for certain problems, at least for the time being.

Overall, there’s a great need for high-quality and properly annotated medical images, if we want to utilize the full potential of image recognition also in the medical field. Here lies also a huge potential for Finnish hospital data-lakes. The work they are now putting in collecting and storing such Real World data to be used now and in the near future might be the key to implement these approaches and tools also to everyday medical practice.

Antti Karlsson, PhD (Theoretical Physics), Data Science Development Manager, Auria Biobank


The author of this post is Antti Karlsson, PhD, Development Manager at Auria Biobank (picture by Mikko Tukiainen). 

For more information:

Possibilities of machine learning and artificial intelligence in RWE studies

*Read more about Antti’s ideas and views on text-mining and natural language processing in Auria Biobank’s blog post ’Mikä ihmeen ULMFiT?’ (in Finnish).