In a course, we have a final quiz of 24 random questions. These questions are drilled from six different question categories (4 questions x 6 categories = 24 random questions).
After the end of the course, we want to see which were the most difficult questions in order to improve our course.
How we’ll find these?
Don’t rush to answer… If we go through the Attempts link, we are going to see the attempts taken by each user, only though, the questions are not the same for every one of them. Q3 may be A question for Paul, but Q3 ma be Z question for Simon… remember the questions are random, so each attempt is different.
So what about this…
- Quiz administration -> Results -> Statistics
Now, according to Moodle.docs “This report gives a statistical (psychometric) analysis of the quiz, and the questions within it.” http://docs.moodle.org/26/en/Quiz_statistics_report
WOW! I have been really impressed, with the psychometric analysis.
Only though I would need some psychometrician, or staticologist to translate the data…
Lickyly Moodle.docs are really powerful, so, with some more research I found this article http://docs.moodle.org/dev/Quiz_statistics_calculations and this one http://docs.moodle.org/dev/Quiz_report_statistics
Scanning and skipping the mathematics here are the gems I collected, that make sense to me:
Facility index: This is the average score on the item, expressed as a percentage (the mean score of students on the item.). The higher the facility index, the easier the question is (for this cohort of students).
|5 or less||Extremely difficult or something wrong with the question.|
|35-64||About right for the average student.|
Intended question weight: How much this question was supposed to contribute to determining the overall test score.
Random guess score (RGS): This is the mean score students would be expected to get for a random guess at the question. Random guess scores are only available for questions that use some form of multiple choice. All random guess scores are for deferred feedback only and assume the simplest situation e.g. for multiple response questions students will be told how many answers are correct. Values above 40% are unsatisfactory – and show that True/False questions must be used sparsely in summative tests.
Discrimination index: This is the correlation between the weighted scores on the question and those on the rest of the test. It indicates how effective the question is at sorting out able students from those who are less able. The results should be interpreted as follows…
|50 and above||Very good discrimination|
|30 – 50||Adequate discrimination|
|20 – 29||Weak discrimination|
|0 – 19||Very weak discrimination|
|-ve||Question probably invalid|
Discrimination efficiency: This statistic attempts to estimate how good the discrimination index is relative to the difficulty of the question.
An item which is very easy or very difficult cannot discriminate between students of different ability, because most of them get the same score on that question. Maximum discrimination requires a facility index in the range 30% – 70% (although such a value is no guarantee of a high discrimination index).
The discrimination efficiency will very rarely approach 100%, but values in excess of 50% should be achievable. Lower values indicate that the question is not nearly as effective at discriminating between students of different ability as it might be and therefore is not a particularly good question.
Well, I think I could spend days after days investigating these data. Actually I am going to check this in practice, by comparing a number of quizzes from different runs (same course, same quiz, different class) to see if it’s the same questions that students get wrong… Yay! Love this work!