OverviewThe item response theory (IRT), also known as the latent response theory refers to a family of mathematical models that attempt to explain the relationship between latent traits (unobservable characteristic or attribute) and their manifestations (i.e. observed outcomes, responses or performance). They establish a link between the properties of items on an instrument, individuals responding to these items and the underlying trait being measured. IRT assumes that the latent construct (e.g. stress, knowledge, attitudes) and items of a measure are organized in an unobservable continuum. Therefore, its main purpose focuses on establishing the individual’s position on that continuum. DescriptionClassical Test Theory Item Response Theory vs. Classical Test Theory IRT Assumptions 1) Monotonicity – The assumption indicates that as the trait level is increasing, the probability of a correct response also increases2) Unidimensionality – The model assumes that there is one dominant latent trait being measured and that this trait is the driving force for the responses observed for each item in the measure3) Local Independence – Responses given to the separate items in a test are mutually independent given a certain level of ability.4)Invariance – We are allowed to estimate the item parameters from any position on the item response curve. Accordingly, we can estimate the parameters of an item from any group of subjects who have answered the item. If the assumptions hold, the differences in observing correct responses between respondents will be due to variation in
their latent trait. IRT models predict respondents’ answers to an instrument’s items based on their position on the latent trait continuum and the items’ characteristics, also known as parameters.Item response function characterizes this association.The underlying assumption is that every response to an item on an instrument provides some inclination about the individual’s level of the latent trait or ability. The ability of the person (θ) in simple terms is the probability of endorsing the correct answer for that item.As such, the higher the individual’s ability, the higher is the probability of a correct response. This relationship can be depicted graphically and it’s known as the Item Characteristic Curve. As is shown in the figure, the curve is S-shaped (Sigmoid/Ogive). Furthermore, the probability of endorsing a correct response monotonically increases as the ability of the respondent becomes higher. It is to be noted that theoretically, ability (θ) ranges from -∞ to +∞, however in applications, it usually ranges between -3 and + 3. Item Parameters As people’s abilities vary, their position on the latent construct’s continuum changes and is determined by the sample of respondents and item parameters. An item must be sensitive enough to rate the respondents within the suggested unobservable continuum. Item Difficulty (bi) is the parameter that determines the manner of which the item behaves along the ability scale. It is determined at the point of median probability i.e. the ability at which 50% of respondents endorse the correct answer. On an item characteristic curve, items that are difficult to endorse are shifted to the right of the scale, indicating the higher ability of the respondents who endorse it correctly, while those, which are easier, are more shifted to the left of the ability scale. Item Discrimination (ai) determines the rate at which the probability of endorsing a correct item changes given ability levels. This parameter is imperative in differentiating between individuals possessing similar levels of the latent construct of interest. The ultimate purpose, for designing a precise measure is to include, items with high discrimination, in order to be able to map individuals along the continuum of the latent trait. On the other hand, researchers should exercise caution if an item is observed to have a negative discrimination because the probability of endorsing the correct answer shouldn’t decrease as the respondent’s ability increases. Hence, revision of these items should be carried out. The scale for item discrimination, theoretically, ranges from -∞ to +∞ ; and usually doesn’t exceed 2; therefore realistically it ranges between (0,2) Guessing (ci) Item guessing is the third parameter that accounts for guessing on an item. It restricts the probability of endorsing the correct response as the ability approaches -∞. Population Invariance In simple terms, the item parameters behave similarly in different populations. This is not the case when following the CTT in measurement. As the unit of analysis is the item in IRT, the location of the item (difficulty) can be standardized (undergo linear transformation) across populations and thus items can be easily compared. An important note to add is that even after linear transformation, the parameter estimates derived from two samples will not be identical, the invariance as the name states refers to population invariance and so it applies to item population parameters only. IRT Model Types Unidimensional ModelsUnidimensional models predict the ability of items measuring one dominant latent trait. The 1- Parameter logistic model The model is the simplest form of IRT models. It is comprised of one parameter that describes the latent trait (ability – θ) of the person responding to the items as well as another parameter for the item (difficulty). The following equation represents its mathematical form: The model represents the item response function for the 1 – Parameter Logistic Model predicting the probability of a correct response given the respondent’s ability and difficulty of the item. In the 1-PL model, the discrimination parameter is fixed for all items, and accordingly all the Item Characteristic Curves corresponding to the different items in the measure are parallel along the ability scale. The figure shows 5 items, the one on the furthest right is the hardest and would be probably endorsed correctly by those with a higher ability. Test Information Function § It is the sum of probabilities of endorsing the correct answer for all the items in the measure and therefore estimates the expected test score. § In this figure, it the red line depicts the joint probability of all 5 items (black) The Item Information Function It is to be noted that the amount of information at a given ability level is the inverse of its variance, hence, the larger the amount of information provided by the item, the greater the precision of the measurement. As item information is plotted against ability, a revealing graph depicts the amount of information provided by the item. Items measured with more precision, provide more information and are graphically depicted to be longer and narrower, compared to their counterparts that provide lesser information. The apex of the curve corresponds with the value of bi – the ability at the point of median probability. The maximum amount of information provided would be given when the probability of answering correctly or wrongly are equal, i.e. 50%. Items are most informative among respondents that represent the entire latent continuum and especially among those who have a 50% chance of answering either way. Estimating Ability The Rasch Model vs. 1- Parameter Logistic Models The 2- Parameter Logistic Model The two parameter logistic model predicts the probability of a successful answer using two parameters (difficulty bi & discrimination ai). The discrimination parameter is allowed to vary between items. Henceforth, the ICC of the different items can intersect and have different slopes. The steeper the slope, the higher the discrimination of the item, as it will be able to detect subtle differences in the ability of the respondents. The Item Information Function As is the case with the 1-PL Model, the information is calculated as the product between the probability of a correct and an incorrect response. However, the product is multiplied by the square of the discrimination parameter. The implication is that, the larger the discrimination parameter, the greater the information provided by the item. As the discriminating factor is allowed to vary between items, the item information function graphs can look different too. Estimating Ability The 3 – Parameter logistic model The Model predicts the probability of a correct response, in the same manner as the 1 – PL Model and the 2 PL – Model but it is constrained by a third parameter called the guessing parameter (also known as the pseudo chance parameter), which restricts the probability of endorsing a correct response when the ability of the respondent approaches -∞. As respondents reply to an item by guessing, the amount of information provided by that item decreases and the information item function peaks at a lower level compared to other functions. Additionally, difficulty is no longer demarcated at median probability. Items answered by guessing, indicate that the respondent’s ability is lesser than its difficulty. Model Fit Other IRT Models Include models that handle polytomous data, such as the graded response model, and the partial credit model. These models, predict the expected score for each response category. On the other hand, other IRT models like the nominal response models, predict the expected scores of individuals answering items with unordered response categories (e.g. Yes, No, Maybe). In this brief summary, we focused on unidimensional IRT models, concerned with the measurement of one latent trait, however these models wouldn’t be appropriate in the measurement of more than one latent construct or trait. In the latter case, use of multidimensional IRT models is advised. Please see the resource list below for more information about these models. Applications IRT models can be applied successfully in many settings that apply assessments (education, psychology, health outcome research, etc.). It can also be utilized to design and hone scales/measures by including items with high discrimination that add to the precision of the measurement tool and lessens the burden of answering long questionnaires. As IRT model’s unit of analysis is the item, they can be used to compare items from different measures provided that they are measuring the same latent construct. Furthermore, they can be used in differential item functioning, in order to assess why items that are calibrated and test, still behave differently among groups. This can lead research into identifying the causative agents behind differences in responses and link them to group characteristics. Finally, they can be used in Computerized Adaptive Testing. ReadingsTextbooks & Chapters
These three books (Item response theory principles and applications, Item response theory and Handbook of modern item response theory) provide the reader with the fundamental principals of IRT models. However, they don’t include recent updates and IRT software packages.
In 138 pages, DeMars C. has succeeded in producing a succinct yet extremely informative resource that doesn’t fail to demystify the hardest of the IRT concepts. The book is an introductory book that addresses IRT assumptions, parameters and requirements and then proceeds to explain how results can be described in reports and how researchers should consider the context of test administration, respondent population and the effective use of scores.
The theory and practice of item response theory is an applied book that is practitioner oriented. It provides a thorough explanation of both unideminsional and multidimensional IRT models, highlighting each model’s conceptual development, and assumptions. It then proceeds to demonstrate the underlying principles of the model through vivid examples.
The book was developed with behavioral research practitioners in mind. It is provides help for them to navigate statistical methods using R. Chapter 8, focuses on Item Response Theory and offers a set of notes and a plethora of annotated examples.
As the name suggests, the guide provides a visual representation of the basic concepts in IRT. Java applets permeate the text and make it easier to follow along while these basic concepts are explained. Excellent resource, and I would recommended reading it a couple of times and practicing on the applets!
A one of a kind book, that focuses on offering the reader the joy of acquiring the basics of IRT theory without delving into mathematical complexities.
Methodological Articles
Application Articles
SoftwareFor the complete list, please click on the following
link:http://www.umass.edu/remp/software/CEA-652.ZH-IRTSoftware.pdf Websites
Youtube tutorials (Extremely useful and informative)
CoursesCourses offered at Mailman School of Public Health
Is the level of measurement where outcomes can be rank ordered?In ordinal measurement the attributes can be rank-ordered.
What is the outcome variable in an analysis called?The outcome variable is also called the response or dependent variable, and the risk factors and confounders are called the predictors, or explanatory or independent variables. In regression analysis, the dependent variable is denoted "Y" and the independent variables are denoted by "X".
Which level of measurement provides the most information or detail about a concept being measured?Each of the levels of measurement provides a different level of detail. Nominal provides the least amount of detail, ordinal provides the next highest amount of detail, and interval and ratio provide the most amount of detail.
Which of the following levels of measurement provides the most information about a variable?Interval/ordinal measurements provide the most information about any variable.
|