The role of artificial intelligence in identifying asthma in pediatric inpatient setting
Introduction
Asthma is pathologically characterized by chronic airway inflammation. It is estimated that over 10 million children under the age of 14 have asthma. Furthermore, its incidence among Chinese children is rising rapidly with a relatively poor control rate (1-3). This is in part attributable to the failure of primary pediatricians to distinguish asthma from common respiratory tract virus infections in children, with asthma often being misdiagnosed as bronchitis or pneumonia. The average accuracy of asthma diagnosis among pediatricians at all levels was reported to be only 59.29%, and the level of diagnosis in primary units was significantly lower than that of tertiary class hospitals (4). Poor asthma control results in excessive and inefficient use of medical resources and leads to the abuse of antibiotics and systemic glucocorticoids. Moreover, this not only develops the continuous progression of chronic inflammatory diseases of airways that hamper the athletic abilities of children, but is also a significant cause of chronic airway diseases, such as chronic obstructive pulmonary disease (COPD), in adults (5). Therefore, an automatic asthma detection mechanism for children is critical for the efficient and accurate diagnoses of asthma by primary pediatricians.
With recent advancements in machine learning technologies, researchers have conducted novel studies on asthma. Latent class analysis can be used to identify a set of heterogeneous diseases with the diagnostic label of asthma (6), but such approaches cannot be directly applied to diagnosing asthma from the electronic medical records (EMRs) of patients. Prosperi and Marinho (7) used logistic regression and random forests to identify asthma, wheeze, and eczema. However, their model used a large set of attributes including 223 non-genetic variables and 215 single-nucleotide polymorphisms. The need for such large variables will not only lower the efficiency of the model, but also reduce its applicability in real world settings since generating such a large set of variables is not common in practice. Additionally, the data set used was small with only 554 subjects, which might have been insufficient in obtaining a general model. Tomita et al. (8) explored the effectiveness of deep neural networks in modeling combinations of symptom-physical signs and objective tests, using a total of 22 variables to predict the initial diagnosis of adult asthma. They used fewer variables than the study reported by Prosperi. However, deep neural networks-based methods are more sensitive to missing values commonly observed in real clinical practice, thereby making their model less practical. A dataset with only 566 EMRs was another limitation of their work.
The present study thus aimed to identify an effective and efficient artificial intelligence (AI) model which can be used to assist pediatricians in diagnosing asthma in real clinical settings. It is hoped that such a model can eventually be deployed as a personalized diagnostic tool to lower the misdiagnosis rate of asthma among children and to reduce the misuse of antibiotics and systemic glucocorticoids.
Methods
Study approval was granted by The Institutional Review Board (IRB) of the Medical Ethics Committee of Children’s Hospital, Zhejiang University School of Medicine, China (IRB approval ID: 2020-IRB-039). The procedures were performed in accordance with the Declaration of Helsinki (as revised in 2013) and relevant guidelines & regulations. Informed consent was obtained after the procedure was fully explained to all participants and their legal guardians.
Datasets
In this study, two sets of retrospective EMRs of patients under the age of 14 were collected from the Children’s Hospital, Zhejiang University School of Medicine, China. DataSet-1 consisted of 3,761 cases, with 1,624 positive asthma cases and 2,137 negative cases from the Pulmonology Department. DataSet-2 was composed of 2,123 cases with 337 positives and 1,786 negatives from non-pulmonology departments including the Cardiovascular Department, Endocrinology Department, Nephrology Department, Neurology Department, and Hematology Department of the hospital. All records were reviewed by at least two respiratory experts who performed independent asthma diagnoses for each record, based on the guide of Children’s bronchial asthma diagnosis and prevention (2016 version) (9). Initially, two respiratory experts issued their independent assessment. Subsequently, the gold standard for diagnostic decision (i.e., ground truth label for model training) for each record was made by judgment consistency. Where a discrepancy between the two independent judgments was apparent, additional independent assessments were performed by another two respiratory experts. The second round of expert opinions prevailed if a diagnostic agreement was made. In cases where an agreement was not achieved by the second review, the process was repeated until a diagnostic agreement was reached.
Feature extraction
As shown in Table 1, the collected raw dataset consists of free texts arranged in a set of predefined fields in EMRs such as gender, birth date, chief complaint, physical examinations, lab tests, history of current disease, family history of diseases, history of diseases, and so on.
Full table
In order to train machine learning-based models, features were extracted from the relevant raw EMR fields. Due to the semi-structured property of the raw EMR data, traditional rule-based nature language processing (NLP) methods were utilized for feature extraction. In this study, a set of NLP methods including regular expression, word distance, synonym analysis, etc. were applied to convert the raw free texts into features with either numerical or yes/no (1/0 binary) values. For example, the regular expression for total serum immunoglobin E (IgE) values were written as “IgE [:><]*?([+]+|[0-9]+\+) ”. For all binary features (i.e., yes/no), zero was the default value, unless the corresponding regular expression identified given key words with no surrounding negatives in a thesaurus; in this case, feature value was set to 1. Features and their corresponding properties/values extracted from the EMR fields including cough, wheeze, tachypnea, chest tightness, history of wheezing, family history of asthma and allergy, ages, and various allergen test results, are listed in Table 2. The features extracted from the EMR fields were highly sparse with many empty entries. To reduce the level of sparsity, we limited the number of regular expressions. As a result, 16 regular expressions were developed for feature extraction from commonly used fields in real clinical settings (as shown in Table 2). The statistics of missing features for both datasets are also summarized in Table 3.
Full table
Full table
Statistical analysis
In this study, four sets of statistical analysis between asthma positive and negative groups were analyzed using t-test within the Pulmonology Department, the non-Pulmonology Department, as well as across the Pulmonology and the non-Pulmonology Departments. As shown in Tables 4 and 5, most of the features extracted from the EHR data showed significant statistical difference (*P<0.05) between asthma positive and negative groups within both the Pulmonology Department and the non-Pulmonology Department, which confirmed that the features extracted were indeed discriminative. Among all asthma positive patients, the majority of the extracted features showed insignificant statistical difference (*P>0.05) between the Pulmonology Department and the non-Pulmonology Department which confirmed that all asthma positive patients have similar characteristics. Among asthma negative patients, the majority of the extracted features showed significant statistical difference (*P<0.05) between the Pulmonology Department and the non-Pulmonology Department, which also make sense since patients with diverse diseases should have different characteristics.
Full table
Full table
Machine learning models
In this study, 80% of DataSet-1 from the Pulmonology Department which comprised 1,299 asthma cases and 1,709 negative cases was used as training data for modeling. The remaining 20% comprising 325 positive and 428 negative cases was used as TestSet-1 for performance evaluation. Four machine learning based models, including CatBoost, Naïve Bayes, Support Vector Machines (SVM), and Logistic Regression (10-12) have been developed to identify asthma. CatBoost is an innovative ordered gradient boosting algorithm, which uses ordered target-based statistics for categorical features processing and permutation strategies to avoid prediction shift. Its base learner is an oblivious tree and each tree corresponds to a partition of the feature space. The model learns the feature space partition at each training iteration and finally obtains the aggregated data as a classification result. Naïve Bayes is a generative model that uses Bayes probabilities for calculating the classification total probabilities based on feature value combination; it assumes that each feature is independent. SVM aims at locating a hyper plane in the feature space with maximum margin. Logistic Regression aims to squeeze weighted sum combinations of feature values into categories. In our study, we applied a grid-search strategy with 5-fold cross validation on the training data to find the optimal hyper-parameters for all models including “maximum tree depth”, “maximum iteration number”, “learning rate”, “L2 leaf regularization”, and “early stopping rounds” for CatBoost; the “margin cost” parameter “C”, the kernel parameter “gamma”, and “kernel type” for SVM; and the “penalty strength” parameter “C” and “maximum iteration number” for Logistic Regression.
Performance evaluation metrics
Algorithm performance was measured by two metrics, namely, the accuracy rate (Eq. [1]) and the area under the receiver operating curve (AUC). The accuracy rate calculated the percentage of correctly predicted individuals among the whole test set, whereas the receiver operating curve (ROC) was generated by plotting the curve of sensitivity (Eq. [2]) vs. specificity (Eq. [3]).
[1]
[2]
[3]
where TP, FP, TN, FN are true positive, false positive, true negative, and false negative rates, respectively. TP and TN represent correctly predicted positives and negatives with respect to the ground truth labels. FP and FN represent incorrectly predicted positives and negatives with respect to the ground truth labels.
Results
Two independent test sets were applied for performance evaluation. TestSet-1 consisted of 753 cases with 325 positive asthma cases and 428 negative cases from the Pulmonology Department. Table 6 summarizes the overall performance of all four models on TestSet-1 where the CatBoost model outperformed all other models with an accuracy of 84.7% and an AUC of 90.0%. This result exceeded the second-best Logistic Regression and SVM by about 4% in accuracy and 2% in AUC. Naïve Bayes exhibited relatively lower scores, displaying less than 80% accuracy and an AUC of 84.62% respectively. Figure 1 illustrates the AUC of these models more explicitly.
Full table
Similarly, Table 7 and Figure 2 demonstrate the overall performance of all four models on TestSet-2, composed of 2,123 cases (with 337 asthma-positive and 1,786 asthma-negative cases) from the non-pulmonology departments including the Cardiovascular Department, Endocrinology Department, Nephrology Department, Neurology Department, and Hematology Department. Since this dataset was composed of data from other departments not involved in training, it was a perfect testbed to evaluate the robustness and generalization capacity of the model via quantitatively and qualitatively assessing its efficacy in a real clinical setting. As shown in Table 7 and Figure 2, the CatBoost model displayed superior performance with an accuracy of 96.7% and an AUC of 98.1%. On TestSet-2, Naïve Bayes performed well with an accuracy of 91.0% and an AUC of 96.4%. The Logistic Regression was ranked last among these four models with an accuracy of 88.6% and an AUC of 92.3%.
Full table
In addition to accuracy and AUC, in this study, feature importance was also explored. This was calculated based on each feature’s contribution to the model in performance. Figure 3 shows the feature importance calculated by the CatBoost model. Age, considered seemingly irrelevant to asthma symptoms, was a highly correlated feature (Figure 3A). In clinical practice, pediatricians often consider wheezing and wheezing history with age. For example, a patient over the age of three exhibiting wheeze is more likely to have asthma than the patient’s younger counterparts. Wheezing and wheezing history was ranked as second and third most important features, exhibiting a combined weighted score of over 40%. While total serum IgE is an important feature in practice, it was not ranked among the top three most important features by CatBoost. This could be attributed to the fact that 40% of patients would not have been administered serum IgE tests, resulting in empty entries in the EMR. The degree of empty fields in this feature left a degree uncertainty, and potentially led to a lower ranking by the model. Figure 3B demonstrated polarity (positive or negative) impacts of features on model performance.
We also conducted ablation experiments on top feature bagging. As illustrated in Table 8, a superior performance was observed with more features in general. However, the performance improvement was marginal on top 12 features and above, which is also demonstrated in the feature importance graph in Figure 3.
Full table
Discussion
Implications and findings
Our findings reveal that CatBoost clearly exhibited the highest accuracy and AUC compared to all other models on both test sets. CatBoost’s success may have be explained by its ability to process categorical features and modeling feature combinations. Additionally, CatBoost’s new capacity in undertaking feature combination increases its nonlinear modeling abilities. On TestSet-1, Naïve Bayes performed the worst, which was likely due to the feature independence assumption of this model. In practice, features from allergen test results are related to serum IgE, and thereby violate the independence assumption. On the other hand, Naïve Bayes performed well on TestSet-2; it is reasonable to conclude that as this model received higher negative label probability with more negative individuals, it had increased chances for negative predictions. In addition, the insensitivity of Logistic Regression to categorize features and its decreased ability to model nonlinear functions, along SVM’s assumption of kernel space, might have contributed to their lower ranking compared to CatBoost on both data sets. Generally, all models demonstrated improved performance on TestSet-2. This was due to the presence of increased predictable negative instances of patients unlikely to have asthma, such as patients with no cough, wheeze, tachypnea, oppression in chest, history of wheeze, negative allergen test results, or family history of asthma and allergies.
Advantages of AI
At present, the accuracy of Chinese primary pediatricians in diagnosing asthma is relatively low. One study of pediatricians in Jinhua City, Zhejiang Province, reported the average accuracy of asthma diagnosis among pediatricians at all professional levels to be only 59.29%. In addition, the diagnostic accuracy of primary units at lower professional levels (e.g., level 1 and level 2) was lower than that of level 3 (4). Therefore, if this AI system is routinely applied to the work of primary pediatricians, it will greatly increase the accuracy of their asthma diagnosis, and thus, significantly improve the management of asthma in Chinese children.
Limitations
Currently, this system is only suitable for identifying the existence of asthma, and it cannot provide disease severity classification or treatment suggestions. Furthermore, it is also unable to prompt clinicians to complete medical records and auxiliary examinations.
Future expectations
The complete AI system including the NLP-based feature extraction module and the machine learning-based asthma identification module can be integrated into hospitals’ information and data management systems as a whole to identify asthma in pediatric inpatient setting, through extracting data from EMR as inputs for the developed AI system. With a questionnaire-based interactive interface for feature inputs, it could allow the AI model to be applied as an efficient screening tool in the outpatient service system to identify more patients with a mild level asthma.
At present, the developed asthma identification model has been integrated into a smart phone APP with a questionnaire-based interactive interface for feature inputs and applied in the outpatient service in the Children’s Hospital, Zhejiang University School of Medicine, China.
On the basis of a pre-diagnosis function, systems focused on auxiliary treatment and follow-up could be simultaneously developed to utilize AI-assisted methods as complete management tools for asthma patients.
Conclusions
The AI model can quickly and accurately identify children with asthma, which can aid primary pediatricians in making more precise diagnoses, whilst further preventing undetected cases. Our findings are of great clinical value and practical significance in improving the asthma control level of children in China, optimizing medical resources, and decreasing the abuse of antibiotics and systemic glucocorticoids.
Acknowledgments
Funding: This study was supported in part by grants from the National Natural Science Foundation of China (No. 62076218), the Zhejiang Province Research Project of Public Welfare Technology Application (LGF18H260004), and the National Natural Science Foundation of China (No. 81870023 and 81000765).
Footnote
Data Sharing Statement: Available at http://dx.doi.org/10.21037/atm-20-2501a
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at http://dx.doi.org/10.21037/atm-20-2501a). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Study approval was granted by The Institutional Review Board (IRB) of the Medical Ethics Committee of Children’s Hospital, Zhejiang University School of Medicine, China (IRB Approval ID: 2020-IRB-039). The procedures were performed in accordance with the Declaration of Helsinki (as revised in 2013) and relevant guidelines and regulations. Informed consent was obtained after the procedure was fully explained to all participants and their legal guardians.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- National Cooperative Group on Childhood Asthma. Institute of Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention. Chinese Center for Disease Control and Prevention. Third nationwide survey of childhood asthma in urban areas of China. Zhonghua Er Ke Za Zhi 2013;51:729-35.
- Sha L, Liu C, Shao M, et al. Ten years comparison of diagnosis and treatment of asthma in urban children in China. Zhonghua Er Ke Za Zhi 2016;54:182-6. [PubMed]
- Wong GW, Kwon N, Hong JG, et al. Pediatric asthma control in Asia: phase 2 of the Asthma Insights and Reality in Asia-Pacific (AIRIAP 2) survey. Allergy 2013;68:524-30. [Crossref] [PubMed]
- McGeachie MJ, Yates KP, Zhou X, et al. Patterns of Growth and Decline in Lung Function in Persistent Childhood Asthma. N Engl J Med 2016;374:1842-52. [Crossref] [PubMed]
- Ye X, Xu C, Wang Z. Cognition of asthma guidelines among doctors in Jinhua city. Chinese J of Pub Health 2015;31:804-6.
- Howard R, Rattray M, Prosperi M, et al. Distinguishing Asthma Phenotypes Using Machine Learning Approaches. Curr Allergy Asthma Rep 2015;15:38. [Crossref] [PubMed]
- Prosperi MC, Marinho S, Simpson A, et al. Predicting phenotypes of asthma and eczema with machine learning. BMC Med Genomics 2014;7 Suppl 1:S7. [Crossref] [PubMed]
- Tomita K, Nagao R, Touge H, et al. Deep learning facilitates the diagnosis of adult asthma. Allergol Int 2019;68:456-61. [Crossref] [PubMed]
- Pediatric Branch of Chinese Medical Association. the guide of Children’s bronchial asthma diagnosis and prevention. China J Pediatr 2016;54:3.
- Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features. Adv in Neural Infor Proc Sys, 2018.
- Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995;20:273-97. [Crossref]
- Hand D, Yu K. Idiot’s Bayes – not so stupid after all? International Statistical Review 2001;69:385-99.
(English Language Editors: E. Tan and J. Gray)