Leveraging The Capabilities of Machine Learning in the Space of Health Care System – A Brief Study on the Crossover of Medical Science and Computer Technologies

(Figure 1: Example of a mammogram where the left image being the raw mammogram while the right image is CAD enhanced using a NASA software which was originally used to enhance earth imagery. This picture is credited to Barton Medical Imaging and is sourced from a NASA press release [11].)

1. Introduction

The rise and development of digital transformation in the space of Health Care System is always attributed by continual improvisation in the aspects of both application and practical implications. An underlying challenge which is prevalent in the field of Health Care System is the vast diversity of different sub-systems and that a proper unification and adoption of fully integrated or centralized system could not be accomplished in major part of the world yet. Understanding the complexity involved in the inherent nature of the complicated human biology along with the wide variation between any two individual patients has shown the importance of the inclusion of human element in the diagnosis and treatment of various diseases however, the advancement in the spectrum of digital technologies is in no doubts becoming the indispensable tools involved in providing the best health care ecosystem.

The recent developments with data technologies have enabled the widespread acceptance and adoption of Machine Learning in various industries including the Health Care Industry in order to facilitate pristine quality of health care services. The goal here is to use the large volume of healthcare data to find and analyse diagnostic decisions and build prediction models to help the physicians to make better decisions at individual patient level. This complex ecosystem of Machine Learning uses data like genetic information, medical imaging data, drug combination and interaction data to enhance the outcomes. This also includes Natural Language Processing (NLP) of existing medical records.

In this study we would be focusing predominantly on two of the largest applications of Machine Learning (ML) technology in the space of medical and biomedical arena. As being one of the most prominent emerging technology, Machine Learning has found a vast range of applications which adds value in the space of healthcare system. From the wide variety of applications, we would specifically study two most pivotal applications in this study.

a) The first in the list is the application of Machine Learning technology in processing and interpreting the medical images such as Magnetic Resonance Imaging (MRI), Ultrasound (USG) Imaging, Computerized Axial Tomography (CAT or CT) Scan Imaging and Positron Emission Tomography (PET) Scan Imaging. The reports of these imaging system is ideally a series of images which traditionally demands a radiologist to analyse and interpret and then make a diagnostic decision. Machine Learning technology is spontaneously progressing in this area to find, analyse and predict the image data to indicate a disease state or seriousness level.

b) The second in the application of Machine Learning surrounds the area of human genetics with the interest of finding and predicting diseases and its causes. With the recent development of Next-Generation Sequencing (NGS) techniques and the evolution of genetic data which includes huge databases of genetic information, the approach to conclude meaningful and useful interpretation of how genetics affect human health is now at the warfront of many researches. Understanding how any complex diseases sprouts and how genetics may be involved in increasing or decreasing an individual person’s risk, a predictive model developed with Machine Learning architecture can practically aid in preventative healthcare ecosystem. Such predictive models can provide the physicians with more precise and tailored information for a specific patient in order to reduce the risk of incorrect treatment approach or acquiring more complex diseases.The common challenge which persists in all the two discussed topics is how to translate the health data which are acquired from the sophisticated medical devices and medical IoT systems into a structured, understandable, useful, trustworthy information for the patients and the physicians.

2. Artificial Intelligence and the Evolution of Machine Learning

The history of Artificial Intelligence (AI) goes back to the era of World War II. This technology was fathered by John McCarthy sharing the credit with Alan Turing. Johan McCarthy was one of the greatest computer scientist while being an eminent cognitive scientist as well. Alan Turing’s work in disrupting the German Enigma machine during the world war became the basis many of the recent developments in the scope of computer science [1].

Machine Learning has got its root firmly planted to Artificial Intelligence. Machine Learning is basically a subset of Artificial Intelligence and was coined by Arthur Samuel. His work on training computers to play checkers was published in the late 1950s while he was working with IBM [2]. Machine Learning, being the sub-set of Artificial Intelligence (AI) and computer science, which focuses on the use of data and algorithms to imitate the way that human brain learns and while gradually improving its accuracy. Machine Learning directly mimics the decisionmaking ability and processing capacity of the human conscience, it gives machine the ability to learn and develop in an automated way without any intervention of human. An important part of Machine Learning is Artificial Neural Networks (ANNs) [3,4] which is based on the theoretical structure of human neuron connection and interaction. It is also important to note that computing or artificial technologies have not yet been advanced enough to take over human intelligence but it do aim for reducing the computational time or any kind of turn-around-time. With the introduction of Deep Blue by IBM and AlphaGo by Google, we have witnessed several leaps in the development of Artificial Intelligence which has proved the capabilities of Machine Learning to solve real world and complex problems [5, 6]. The exponential widespread of implementation of the Machine Learning technology is mostly attributed to the availability of huge quantity of datasets and the fine-tuning of the computational techniques which in turn reduces the overfitting and improves the trained models. The main driving force to the wide adoption of Machine Learning techniques are these two factors. Machine Learning models when coupled with the network of interconnected devices or the IoT systems creates a rich and efficient infrastructure to build a predictive and automated systems.

Machine learning has become a primary method to understand the massive influx of health data in today’s advanced infrastructure. Many used-cases have already proved the promising and effective results of Machine Learning.

3. Application of Machine Learning in Medical Images

In today’s medical practice the images from USG, CT scan, PET Scan or X-Ray are digital in nature. This becomes advantageous when it comes to effectively utilize such image data. To do so, there are several challenges which needs to be addressed. The medical imaging is a collection of techniques to create visual representations of the interior parts of the human body for diagnosis, analysis and appropriate medical intervention. The imaging techniques in healthcare system is preferred as the initial tool for clinical diagnosis in order to understand the internal conditions of a human body while also avoiding the risks like infections, strokes and other complications associated with surgical approaches.

The current standard of clinical practice depends on assessment of the medical images by trained physicians, pathologists or radiologists whose responsibility is to examine the images and conclude the root cause of a clinical ailment or patient’s complaint. This standard of operation however does carry the risk of human error along with variations also incur recurrent costs and often demands years of expertise in determining the root causes. By the demonstration portrayed by Andrew Ng where he used images pulled from YouTube videos, it proves why medical image processing was one of the first to be taken up during the initial adoption of Machine Learning techniques in the healthcare system [7].

The importance of accuracy in the diagnosis is crucial as because any misinterpretation or human error might lead to severe consequences, sometimes fatal. With the development of image processing, most of the architectures now depends fundamentally on Deep Learning (DL) and more specifically Artificial Neural Networks (ANNs). In the recent research and developments, the approach was to improvise the ANNs and utilize it in the form of Convolutional Neural Networks (CNNs) in order to enhance and boost the performance to optimal level when classifying the medical images. CNNs strongly holds the mast when it comes to object detection in the medical images [8]. Acceleration of Graphic Processing Unit (GPU) became a concrete base for the deep development of CNNs towards efficiency. However, a prominent challenge in establishing a competent model still persists. The biggest challenge is the need for a large quantity of annotated medical image data, the cost to collate and create such a database is often difficult as it demands the time of trained physicians to annotate the medical images. Along with this the patient’s right to privacy intervenes negatively in the possibility to make such databases open-sourced. This bottleneck increases the risk of overfitting and also results in depletion of accuracy of the prediction models [9]. Objections against the implementation of Machine Learning in the clinical diagnosis system have been desked on the basis of proper validation of the prediction and analytical models. Validating the results with other datasets could be difficult due to the lack of a large enough reference datasets for a particular disease. And the effort needed to aggregate these data can take more time than actually training the model. The medical imaging data is naturally more difficult to collect and even more difficult to store and process.

a) Detection of Lesion and Computer Automated Detection (CAD) techniques

Currently Machine Learning has got the most common usage in medical healthcare system specifically for Computer Automated Detection (CAD) in the detection of lesions which are found in Brain scans, mammograms and other body scans [10]. These techniques make use of CNNs to conclude down at the probability if a patient’s lesion is in fact a lesion or not. It often uses several 2D slices of 3D rotational scans of either CAT or MRI or even USG in training the system. A variety of methods such as randomized rotation of the images or aligning the lesions in centre of the images. Considering the area of mammography, CAD technique have especially reached a state where it is being used as a “second opinion” for the clinical radiologists which is immensely helping in improving the accuracy of screenings without incurring additional costs associated with using a human as the “second opinion” (ref. Figure 1).

CAD is currently being used in the detection and diagnosis predominantly. While a lesion can be categorized being either benign or malignant using the knowledge of the physician and assessment however, the actual detection is crucial in the process of treating a patient. CAD helps with the actual recognition of potential lesions in a medical image. As an example, detection and segmentation of glioblastoma is quite a difficult task and due to the invasive and wide variety of these tumours. They are not easily localized and assessing the treatment like chemotherapy itself is an even difficult task. Deep Learning technology has aided by helping in automating the assessment of glioblastoma MRIs [12]. Computer aided diagnosis system describes if a lesion is malignant in nature or not and is used to improve the accuracy of diagnosis and also aid the process of early diagnosis in the clinical practice. These technological integration is already in use predominantly in brain related ailments due to the complex nature of assessing the brain health.

4. Application of Machine Learning in Genetics for the Prediction and Analysis of Complex Diseases

Genetic engineering and study has seen a leap since around 2008 which carried huge volume of genetic information and datasets. This has created a pile of difficult challenges in the aspects of how to handle the exponentially growing volume of those data. The advancements in genetic sequencing speed, namely NGS technologies have fuelled the increasing speed at which the whole human genome is sequenced. From the basic level of understanding, we know that the human genome is a complex structure that is responsible for all the information of human development, evolution and characteristics. The genomic structure is highly interconnected and decoding most of these is still a mystery to us. The diversity of the genomic structure between people adds on to the complexity of understanding the genetic interactions. A lot of health care approaches have focused on acquiring large samples of human genomes in order to identify and help in understanding the statistical relevance of trends among the different population of human race. As we know already there are 23 chromosomes of the human genome which contains around 20,000 genes which have been identified to be the primary coding sequences which are responsible for the proteins necessary in building the biological components of the cellular structure [13]. This count is an approximate estimate and some of the studies also estimates that there may be as many as 25,000 genes or as few as 19,000 as well [14, 15]. There is a large pool of genetic information in the human body that does not code for any proteins, which are not included in these estimates. A growing scale of research literature indicates that there are certain sections of what has been termed as genetic dark matter or missing heritability do exist [16]. These terms refer to the portions of DNA which have no direct intervention in the protein coding process but may be relevant to the level of gene expression in a person’s genetic code [17]. The levels of gene expression may cause difference in protein synthesis, may result into overload or even deficiency. Along with this, any structural differences in the physical arrangement of how the DNA is bound into chromosomes and then subsequently gets unwrapped during both the duplication process and translation process can also impact the level of gene expression. As an example to understand, methylation or acetylation of the DNA backbone can make it difficult (methylation) or easier (acetylation) to unwind the DNA strand during normal cell processes like replication or protein synthesis. Understanding this highly interconnected, complex and nonlinear network between all the different areas of the human genome structure is pretty difficult. With the help of Machine Learning, researchers have taken a step towards finding the patterns and trends that can be modelled into a predictable manner. With the use of the exponentially growing volume of genetic data, machine learning has got the potential to predict accurately who might be at risk of getting certain diseases such as cancers and Alzheimer’s disease.

a) Prediction of Cancer by Germline Copy Number Variations.

One of the exciting areas we would discuss in this chapter, is the utilization of the germline copy number variations in the prediction of different types of cancer. We can make use of Machine Learning models specifically the Gradient Boosting Machines (GBM), which is a form of Decision Trees (DT), to predict if a person has a particular type of cancer. The testing models built were found to be able to predict cancers such as Ovarian Cancer (OV) and Glioblastoma multiforme with an AUC (are under the ROC curve, where ROC stands for Receiver Operating Characteristic) of 0.89 and 0.86 respectively [18], using the copy number variation data taken from germline blood samples. The results indicated a significant inherited portion contributing to cancer risk in many. And since these CNV (copy number variation) data is taken from germline DNA, the probability of continued inheritance to future generations are high. This method does not only depend solely on SNPs (Single Nucleotide Polymorphisms) unlike other methods [19]. This method takes the approach of a whole genome by averaging the copy numbers of an individual’s entire genome as the basis of predicting carcinoma.

Future studies and researches are expected to improve the performance of these models and could possibly be used as a standalone tool to assess an individual’s risk towards any diseases, understanding the fact that these models can be designed and generalized for predicting any fatal diseases. The progressive work encompasses other potentially complicated diseases which might have inherited trace or components responsible.

5. Conclusion

Integration and implementation of digital technologies like Machine Learning in the space of healthcare is approaching towards a revolutionary era. The amalgamation of bio-informatics, molecular biology, genetic engineering and computer science is taking the shape of such an infrastructure which would change the way traditional medical diagnosis, treatment, research and development is operating till now. Not only this but also it would facilitate our knowledge about heredity and environmental factors which are still unfurled and might be responsible for many of the complex diseases. The capability of making use of the copy number variations (CNVs) in the prediction of carcinoma and its diagnosis can be a breakthrough development in coming phase. Understanding how the genomic landscape works and how machine learning can be used to develop an interpretable method, interlinks across genes to decode the inherited carcinoma risks could potentially improve the healthcare system on an individual level of study.

The Cancer Genome Atlas and UK biobank are two of the many invaluable resources which is aiding with the statistical capacity enhancing the scientific studies and analysis.

Natural Language Processing plays a pivotal role and would be an essential factor in improving the practicality of translating the scientific findings of other digital models like Machine Learning in the clinical setup. Both work hand-in-hand to deliver a seamless and easily interpretable output. To achieve this, multiple sophisticated systems are integrated building a complex network in order to effectively extract the vast wealth of information into a specific format which can be effectively used and understood by the physicians and healthcare professionals.

Medical Image analysis is gradually becoming an uncompromisable segment in many diagnostic endeavours and this would be a continual process of development to improve the accuracy of radiological diagnosis. The area of detection and validation of malignant growth and the verification of the existing diagnosis carries a great potential to improve the patient outcomes and reducing errors. As medical imaging system is a non-invasive method of looking inside the human body, any prospect of improvements in this space would always be beneficial while reducing the need for risky surgical approaches.

A vast variety of IoT devices and storage infrastructure need to be upgraded and standardized in order to ensure a seamless exchange of data and processing taking into consideration the exponentially increasing volume of data being generated and collected. The fact about how widely the human genetic variation could contribute in predicting an individual’s risk facilitates the patients and the doctors to make effective lifestyle changes in order to take a preventative approach. Similarly, the predictive models can also be an informative system for the physicians to help them take proper types of prognostics and diagnostics decisions which becomes specifically relevant to an individual patient while saving both time and money.

The field of studies like bioinformatics, genetic engineering, forensic science and radiology can be upgraded and made highly effective and beneficial with the integration of the emerging digital technologies like Artificial Intelligence, Machine Learning, NLP etc. These digital technologies not only make lengthy analysis easier but also provides wide exposure of improvement and development while reducing the cost, time and variations. Looking into the industrial developments happening at a blazing speed, it is high time that we invest into the futuristic approaches of studies to match up the market and quality demands.