Big-data Clinical Trial Column
Missing values in big data research: some basic skills
Abstract
Missing value occurs when there is no data value for a variable in an observation. The phenomenon of missing value is universal in clinical researches involving big data. Nurses may forget to record urine output at a certain time point. Patients may have only one measurement of blood lactate, while the researcher is interested in exploring the impact of lactate trend on mortality outcome. Other reasons of missing values include but not limited to coding errors, faulty equipment and nonresponses (1). In statistical packages, some commands (e.g., logistic regression) may automatically deleted observations with missing values. There is no problem if there are a few incomplete observations. However, when there are a large number of observations with missing values, the default listwise deletion may result in significant loss of information. In such situation, analysts should take a close look at the missing patterns and find appropriate means to cope with it. The present article will introduce how missing values are handled in R, and provide some basic skills in dealing with missing values.