The Method for Comprehensive Quality Evaluation of Tests. Part 2
In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6.
The fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient.
At the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students.
At the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves.
By the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform.
The prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.
Kukharenko, V. M., Perkhun, L P., & Tovmachenko, N. M. (2018). Metodyka kompleksnoho otsiniuvannia testiv. Ch. 1 [The Method for Comprehensive Quality Evaluation of Tests. Part 1]. Statystyka Ukrainy – Statistics of Ukraine, 3, 40–48 [in Ukrainian].
Morozov, S. M. (1994). Zasoby kontrolju diagnostychnych jakostej psychologichnych testiv [Means of control of diagnostic properties of psychological tests]. Kyiv: ISDO. Retrieved from https://psyfactor.org/lib/morozov3.htm [in Ukrainian].
Kroker, L., & Algina J. (2010). Vvedenie v klassicheskuiu i sovremennuiu teoriiu testov [Introduction to Classical and Modern Test Theory]. Moscow: Logos [in Russian].
Sinytskyi, M. Ye. (2015). Statystychni insrumenty vymiriuvannia yakosti osvity. Ch. 2. Klassychnyi pidchid [Statistical tools for measuring the quality of education. Part 2. Classical approuch]. Naukovyi visnyk Natsionalnoi akademii statystyky, obliku ta audytu. – Scientific Bulletin of the National Academy of Statistics, Accounting and Audit, 1, 75–86 [in Ukrainian].
Kovalchuk, Yu. O. (2012) Teoriia osvitnich vumiriuvan [Theory of educational measurements]. Nizhyn: Vydavets PP Lysenko M. M. [in Ukrainian].
Kukharenko, V. M., Perkhun, L P., & Tovmachenko, N. M. (2018). Testovyi kontrol znan: instrumenty intelektualnoho analizu ta Item Response Theory [Test Knowledge Control: Tools for Intellectual Analysis and Item Response Theory]. Proceedings from Innovative Computer Technologies in Higher School: Desiata naukovo-praktychna konferentsiia (21–23 lystopada 2018 hoda) – Tenth Scientific and Practical Conference. (pp. 71–78). Lviv: Lviv Polytechnic Publishing [in Ukrainian].
Fedorchuk, P. I. (2007). Adaptyvni testy: statystychni metody analizu rezultativ testovogo kontroliu znan [Adaptive tests: statistical methods for analyzing results of the test knowledge control]. Matematychni mashyny i systemy – Mathematical Machines and Systems, 3, 122–138. Retrieved from http://www.immsp.kiev.ua/publications/articles/2007/2007_3,4/Fedoruk_034_2007.pdf [in Ukrainian].
Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: The University of Chicago Press [in English].
Kim, V. S. (2007). Testirovanie uchebnych dostizhenii [Learning Achievement Testing]. Ussuriisk: UGPI. Retrieved from http://clipperkim.narod.ru/test/monotest/index.html [in Russian].
Shelyschkova, M. B. (2002). Teoriia i praktika konstruirovaniia pedahgohgicheskich testov [Theory and practice of constructing pedagogical tests . Moscow]: Logos [in Russian].
Abstract views: 14 PDF Downloads: 23