Software Implementation of Missing Data Recovery: Comparative Analysis
The paper contains a comparative analysis of the possibilities of using different software products to solve the problem of missing data on the example of the sample for which different variants of data skips are simulated. The study provided an opportunity to identify the strengths and weaknesses of these software products, as well as to determine the effectiveness of a particular method for different amounts of missed information.
Thus, the easiest way to handle the situation with missing data is Statistica, but there are offered only simple methods of processing data with missing values in Statistica. So, this program will help to cope with the missed data when there is a small number of omissions (up to 10%). SPSS offers a wider range of data imputation methods than Statistica, and at the same time it offers a more user-friendly interface compared to the R or SAS programming language. In the R and SAS software environments, you can use different methods of missing data imputation from the simplest to the most complex, such as, for example, multiple imputation. Thus, R and SAS are the most powerful missing data recovery programs, but they are more complex for users because they require knowledge of the programming language.
It is found out that none of the mentioned software-analytical environments has built-in procedures for processing categorical data with missing values. There are approaches that can be implemented by analogy for ordered categories in R and SAS software environments, but it does not cover all the needs of the analysis of research, which are implemented in the form of surveys with the results that are mostly presented as answers. The methods used to impute quantitative data cannot be applied to categorical data, even if numbers are used to encode responses.
The study undoubtedly proved that handling the missing data, as well as the choosing of possible ways to use certain methods of data imputation in different software environments should be approached very carefully and the problem of imputation should be solved in each case based on careful analysis of the existing database, considering not only the characteristics of the data and the number of gaps, but also the specific of a particular study.
Dealing with missing data involves a wide range of the issues, which includes both the exploration of the nature of gaps, the methodology for data processing and imputation, depending not only on their nature but also on the type and the use of various software environments on missing data imputation.
It is planned in future research to assess the effectiveness of the recoverability of imputation methods in different software environments, as well as to develop methodological principles for restoring gaps for categorical data and implement them into practice.
2. Zloba, E., & Yatskiv, I. (2002). Statisticheskie metody vosstanovleniia propushchennykh dannykh [Statistical methods for missing data recovering]. Computer Modelling & New Technologies, Vol. 6(1), 51–61 [in Russian].
3. Kutlaliev, A. (2011). Metod mnozhestvennoho vosstanovleniia dannykh [Multiple Data Imputation Method]. Sotsiolohicheskie metody v sovremennoi issledovatelskoi praktike – Sociological methods in modern research practice, 201–208. Retrieved from https://publications.hse.ru/mirror/pubs/share/folder/21tn35z9vl/direct/92272011 [in Russian].
4. Little, R. J. A., & Rubin, D. B. (1990). Statisticheskii analiz dannykh s propuskami [Statistical analysis with missing data]. Moscow: Finance and Statistics [in Russian].
5. Ratitch, B., & O’Kelly, M. (2011). Implementation of Pattern-Mixture Models Using Standard SAS/STAT Procedures. Proceedings of PharmaSUG 2011. Retrieved from https://www.pharmasug.org/proceedings/2011/SP/PharmaSUG-2011-SP04.pdf
6. Ratitch B., O’Kelly, M., & Tosiello, R. (2013). Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models. Pharmaceutical Statistics, Vol. 12, Is. 6, 337–347.
7. Yuan, Y. (2014). Sensitivity Analysis in Multiple Imputation for Missing Data. Proceedings of PharmaSUG 2014. Retrieved from https://support.sas.com/resources/papers/proceedings14/SAS270-2014.pdf
8. Smuk, M. (2015) Missing Data Methodology: Sensitivity analysis after multiple imputation. PhD thesis. London School of Hygiene & Tropical Medicine. Retrieved from https://researchonline.lshtm.ac.uk/id/eprint/2212896/1/2015_EPH_PhD_SMUK_M.pdf
9. Kovtun, N. V., & Fataliieva, A.-N. Y. (2019). New Trends in Evidence-based Statistics: Data Imputation Problems. Statystyka Ukrainy – Statistics of Ukraine, 87 (4), 4–13. Retrieved from https://doi.org/10.31767/su.4(87)2019.04.01
10. IBM SPSS Statistics 25 Documentation. (2018). Retrieved from https://www.ibm.com/support/pages/ibm-spss-statistics-25-documentation#en
11. Missing Value Analysis. IBM SPSS Statistics Subscription documentation. IBM Knowledge Center. Retrieved from https://www.ibm.com/support/knowledgecenter/en/SSLVMB_sub/statistics_kc_ddita_cloud/spss/product_landing_cloud.html
12. Shipunov, A. B., Baldin, E. M., Volkova, P. A., Korobeinikov, A. I., Nazarova, S. A., & Petrov, S. V. (2014). Nahliadnaia statystyka. Ispolzuem R! [Visual statistics. Let us use R!]. Retrieved from https://cran.r-project.org/doc/contrib/Shipunov-rbook.pdf [in Russian].
13. StatSoft, Inc. (2012). Elektronnyy uchebnik po statistike [Electronic textbook on statistics]. Moscow: StatSoft. Retrieved from http://www.statsoft.ru/home/textbook/default.htm [in Russian].
14. Missing data in SAS. Introduction to SAS. UCLA: Statistical Consulting Group. stats.idre.ucla.edu. Retrieved from https://stats.idre.ucla.edu/sas/modules/missing-data-in-sas/
15. SAS 9.4 Product Documentation. SAS. Resources / Documentation. support.sas.com. Retrieved from https://support.sas.com/documentation/94/
16. Rdatasets. Vincent Arel-Bundock’s Github projects. vincentarelbundock.github.io. Retrieved from https://vincentarelbundock.github.io/Rdatasets/datasets.html
Abstract views: 25 PDF Downloads: 17