1
|
|
2
|
- by
- Mohammad H. Omar, Ph.D.
- November 29, 2005
|
3
|
- Theories of Educational Measurement
- Classical Test Theory (CTT)
- Item Response Theory (IRT)
- Item Response Data Matrix
- Important Statistics in Item Response Theory
- Methods for Estimating these parameters
- Research areas
|
4
|
- Educational Measurement: A field of inquiry on carefully developing
instruments that can effectively measure student educational achieve=
ment
- A rapidly growing scientific field and in high demand in the US
|
5
|
- Theories of
- How educational measurement tools such as tests are built
- How student responds to these educational measurement tools
- And what results from these educational measurement tools mean
- In educational measurement, examinee’s true ability needs to be
estimated. So are true parameters (difficulty, discrimination, etc) =
of
test items that measure these abilities.
|
6
|
- In CTT, student true abilities are given by a true number right scor=
e
- True item parameters (item proportion correct and item discriminatio=
n)
are obtained as examinee sample sizes grow large (Central limit Theo=
rem
idea).
- However, the item proportion correct is defined as the proportion of=
examinees
who correctly endorsed the item. So, items are not totally free of t=
he
examinees that it measures.
|
7
|
- IRT sees that as a disadvantage and provides an alternative
conceptualization of the examination process.
- In IRT, students true abilities are given by a latent (unobserved)
trait. Can be unidimensional or multidimensional trait. But, most co=
mmon
use assumes a unidimensional trait such as ability on algebra conten=
t.
- Furthermore, in IRT, test items can measure examinee abilities with<=
/li>
- different difficulty levels
- different discrimination levels and
- different pseudo-guessing levels
- Dependent on the trait measured, these item parameters can be
multidimensional or unidimensional.
- But, most common uses in IRT assume a unidimensional trait. Thus, a
unidimensional IRT model to measure student abilities is commonly u=
sed.
|
8
|
- Observed Number Right Score =3D True Score + Measurement Error
-  =
; &n=
bsp;  =
;
X
=3D  =
;
T +
E
- Some Potential Sources of Error
- Errors pulling scores downwards
- Student feeling sick on the test day
- Errors pushing scores upwards
- Students guessing questions correctly
- Assumptions
- Error score is not correlated with anything else
- Typical Statistics
- Reliability (consistency index) of the Test
- Standard Error of Measurement
- Test Completion rate (Speededness measure)
- Test Difficulty Index (Test Mean)
- Test Variability Index (Standard Deviation, etc)
|
9
|
- How does item relates to examinee’s abilities?
- In IRT, the main focus is measuring examinee’s abilities with =
the
measuring instruments (test items) by
- modeling student’s probabilities of correctly responding to t=
his
measuring instrument (test items).
- Main question
- What is student’s probability of responding to examination
questions given their ability level?
|
10
|
- IRT attempts at answering this question by the following assumptions=
:
- local item independence
- dimensionality of student ability space
- a model for probability of responses to item conditioned on examinee
ability
- Different Models
- One frequently used IRT model is the 3-parameter model
|
11
|
- One frequently used IRT model is the 3-parameter model.
- The assumptions of this model is as follows:
- local item independence
- student ability is one-dimensional (Nx1 vector)
- the model for probability of correct responses to an item condition=
ed
on examinee ability is a 3-parameter logistic model and is given as
follows;
- Where
- Uik=3D examinee kth response to item i
- qk=3D ability for the kth examinee
- bi =3D difficulty parameter for item i
- ai =3D discrimination parameter for item i
- ci =3D pseudo-guessing parameter for item I
- Some Potential Sources of Error
- Measurement Error, Estimation error
|
12
|
- Typical Statistics
- Statistics on
- Examinee Ability parameters
- Item parameters
- Difficulty (bi)
- Discrimination (ai)
- Pseudo-guessing (ci)
- Probabilities of correct responses conditional on examinee ability=
li>
- Item Characteristic Curve (ICC) for 2 items
|
13
|
- Information Functions
- Item Information function
- &nb=
sp;
where =
=3D 1st
derivative of
at q
- & max
information is at
- Test (Exam) Information function (diff for diff tests)
- Conditional Standard Error of Measurement
- Relative Efficiency of Exams
|
14
|
- Example data
- Need to estimate these examinee ability q and Test Item parameters (=
ai,
bi, ci)
|
15
|
- Need likelihood equation to maximize for the data
- We need to maximize this equation with respect to both examinee abil=
ity
parameters and item parameters.
- Unfortunately, there is no closed form solution to the zeroes of the
partial derivatives of the likelihood function. The log likelihood
function has to be numerically optimized.
- Different Approaches to Numerical optimization
|
16
|
- Different Approaches
- Joint Maximum Likelihood Estimation (JMLE) Method
- Estimates both ability and item parameters simultaneously
- LOGIST (ETS proprietary software)
- Conditional Maximum Likelihood Estimation (UCON) Method
- Conditioned out the ability parameters when estimating the item
parameters
- Applies only to the one-parameter logistic (Rasch) model (ai=
=3Da,
ci=3D0 for all i) where a sufficient statistics for th=
eta
exists. Sufficient statistics is the number right score.
- WINSTEPS (commercial)
- Marginal Maximum Likelihood Estimation (MMLE) Method
- Integrates out the ability parameters when estimating the item
parameters with gaussian quadrature points.
- BILOG (commercial)
- Empirical Bayes Estimation Method
- Expected A posteriori (EAP) estimation
- Bayes Modal – Maximum A Posteriori (MAP) estimation
- Markov-Chain Monte Carlo (MCMC) using Gibbs sampling
|
17
|
- Item Banking
- Are the same items from different administrations significantly
different in their statistical properties?
- Need Item Response Theory to calibrate all items so that there̵=
7;s
one common scale.
- Advantage: Can easily build test forms with similar test difficulty=
or
build optimum tests.
- Computer Adaptive Testing
- increase measurement precision (test information function) by allow=
ing
students to take only items that are at their own ability level.
|
18
|
|