“Big Data” as an Information Source and a Toolkit for Official Statistics: Capacities, Problems, Prospects
Keywords:
“Big Data”, information sources, statistical toolkit, o fficial statistics, in formation technologies.Abstract
Issues are discussed, related with potential use by official statistics of the so called “Big Data”, which refers to data extracted from websites, mobile phones, cash machines in retail sales networks, traffic surveillance cameras etc. These data are nicknamed as “big” mainly due to large scopes, not enabling for their processing by standard statistical tools but requiring special software and techniques.
It is argued that “Big Data” have advantages such as timeliness, wide coverage of targeted population segments; their collection does not require special questionnaires or surveys, training or recruiting numerous paid personnel like supervisors or interviewers. When “Big Data” are used, accuracy requirements can be loosened, analysis of phenomena and processes can be made by quite simple procedures. As scopes of these data are increasing incessantly, often second by second, the only thing to do is to process them in a proper way, to analyze and use the output information.
It is emphasized that use of “Big Data” is complicated due to the need to address problems like indeterminacy of the covered data sets; bias of estimates; accessibility of data, because they are mostly collected by private companies or belong to them; protection of private data, storage of large scopes of “Big Data” and their processing; statistical incorporation of numerous large data sets; risks of potential manipulation with data etc.
Arguments are given that applied and official statistics have prototypes of tools capable to solve a major part of the above problems, once properly developed and adapted. They include methods for calibration of survey results, statistical aggregation of data, or model-based assessment of data. As regard “cloud” technologies for data storage and processing, their use can solve the problems of weak capacity of data carriers in statistical offices, and the problems of storage of private and confidential data.
Results of studies conducted by leading statisticians of our days demonstrate that official statistics has no alternatives to use of “Bid Data”. The sooner this advanced field of statistics and information technologies comes in focus of the State Statistics Service, universities and research institutions, the easier new information sources and new statistical toolkit can be integrated in the official statistics within the forthcoming ten or fifteen years.
Downloads
References
2. Frenks, B. (2014). Ukroshchenie bolshykh dannykh: kak izvlekat znaniia iz massivov informatsii spomo- shchiu hlubokoi analitiki [Taming the Big Data: How to extract knowledge from data arrays using deep analytics]. - Moscow: Mann, Yvanov i Ferber [in Russian].
3. Handbook on Data Quality Assessment Methods and Tools (2007). European Commission, Eurostat. ec.europa.eu. Retrieved from http://ec.europa.eu/eurostat/ramon/statmanuals/files/Handbook_on_data_ qual_assess_tools.pdf [in English].
4. Pfeffermann, D. (2015). Methodological Issues and Challenges in the Production of Official Statistics. Journal of Survey Statistics and Methodology, Vol. 3, 4 425-483 [in English].
5. Endo P. T, Rodrigues, M., Goncalves, G. E., Kelner, J., Sadok, D. H., Curescu, C. et al. (2016). High availability in clouds: systematic review and research challenges. Journal of Cloud Computing: Advances, Systems and Applications. journalofcloudcomputing.springeropen.com. Retrieved from http://journalofcloudcom- puting.springeropen.com/articles/10.1186/s 13677-016-0066-8 [in English].
6. Sarndal, C.-E., Swensson, B., & Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer [in English].
7. Google Trends, www.google.com. Retrieved from https://www.google.com/trends/ [in English].
8. Barcaroli, G., Scannapieco, M., Scarno, M., & Summa, D. (2015). Using Internet as a Data Source for Official Statistics: a Comparative Analysis of Web Scraping Technologies, www.academia.edu. Retrieved from http://www.academia.edu/20268756/Forecasting_skyrocketing_unemployment_with_big_data [in English].




