Which method is a list of questions developed by a researcher administered in paper and pencil form?

To accommodate the needs of patients with diverse linguistic, cultural, educational, and functional skills, clinicians and researchers require some flexibility in choosing appropriate methods and modes of questionnaire administration for PROMs.109 Numerous issues complicate scoring and analyzing PROM response data. We first describe these methods issues (Table 3)—sources of reports, modes of administration, methods of administration, settings, and scoring—and then discuss barriers.

Show

Which method is a list of questions developed by a researcher administered in paper and pencil form?

Table 3

PRO methods: characteristics, strengths, and limitations.

As with the earlier descriptions of core PRO categories such as health-related quality of life, we highlight in this section the critical issues for measurement methods—i.e., the advantages or drawbacks that users would most need to take into account. This information reflects standard measurement theory (classical or contemporary) and is based on decades of published research and theoretical papers and inputs from experts involved with projects such as PROMIS.

Modes and Methods Issues

Administering PRO instruments (PROMs) requires users to make decisions about three aspects of data collection (Figure 1):

Which method is a list of questions developed by a researcher administered in paper and pencil form?

Figure 1

Types of respondent sources of data and modes and methods of administration

  • Data source—i.e., the source of the PRO (the patient or, in some cases, a proxy or other reporter)

  • Mode by which information was recorded—i.e., self-administered or interviewer-administered

  • Method used to capture the information (such as paper-and-pencil questionnaire or telephone- or computer-assisted technologies).

Each of these aspects is described below. These three aspects can be combined in various ways. For example, a patient might use the telephone to self-administer a PROM, or an interviewer might use a computer to read questions and record answers.

The patient’s perspective is the focal point of PRO assessment. In some circumstances, directly obtaining this perspective may be difficult or impossible. In adults, cognitive and communications deficits and burden of disease, for example, can limit potential subjects’ ability to complete PROMs.110 This is especially likely to occur with the elderly and with people of any age who have severe disease or suffer from neurological disorders. Children’s participation can be limited by these same factors plus issues specific to their age and developmental level.110–112

Failing to include these populations can result in potentially misleading interpretations of results. Thus, attempting to include them in PRO assessment efforts is crucial. Using all possible mechanisms for obtaining self-reports is a high priority, but accomplishing this may be out of the question for some populations.

Proxy Report as a Substitute for Self-Report

One way to include the greatest number of patients is to use proxy respondents to obtain PRO information for patients who are unable to respond. Using either significant others (e.g., parents, spouses or other family members, friends) or formal caregivers (physicians, nurses, aides, teachers) as proxies can provide many potential benefits. It not only allows inclusion of a broader and more representative range of patients in the entire measurement effort, but it can also help minimize missing data and increase the feasibility of longitudinal assessment.

The usefulness of proxy responses as substitutes for patient responses depends on the validity and reliability of proxy responses compared with those attributes for patient responses. When evaluating the quality of proxy responses, analysts usually compare proxy responses with patient responses. This is a reasonable approach, when proxy responses are being used to replace patient responses.

Agreement between the proxy and patient is typically assessed at either the subscale level, via the intraclass correlation coefficient (ICC), or the item level, by the kappa statistic, although other types of analyses have been advocated.113 Patient and proxy responses are also often compared at the group level by comparing mean scores. Group comparisons help detect the magnitude and direction of any systematic bias that might be present.

Both the adult and pediatric literatures suggest that agreement between proxy and patient ratings is higher when rating observable functioning or HRQL dimensions such as physical and instrumental activities of daily living, physical health, and motor function. Agreement is typically lower for more subjective dimensions such as social functioning, pain, cognitive status or function, and psychological or emotional well-being.112,114–118 Using continuous rather than dichotomous ratings improves agreement.119 Extent of disagreement increases with increasing age of adolescents120 and as the severity of patient illness, cognitive impairment, or disability rises.121–124 Type of proxy (e.g., parent versus caregiver) and proxy characteristics such as age, education, and level of stress may also affect agreement.125,126 In terms of direction of disagreement, proxies for adults tend to rate them as having more symptoms, functional difficulties, emotional distress, and negative quality of life; the main exception is pain, about which proxies tend to under-report.114 Patterns of disagreement for child- versus proxy-reported outcomes are inconsistent.127 Even when self- and proxy reports disagree for either children or adults, differences tend to be small.127,128

Proxy Report as a Complement to Self-Report

Proxy assessment may substitute for patient assessment where needed, but it may also complement it. Proxies can be asked to assess the patient as they think the patient would respond (i.e., proxy-patient perspective), or they can be asked to provide their own perspective on the patient’s functioning or HRQL. This type of additional rating may be better described as either an external or “other” rating129 for the sake of clarity. An important consideration is that the measure make clear which perspective is desired.127

The external (i.e., “other”) perspective may provide particularly relevant information when the person is unable to provide any self-assessment, but it can be important even when the patient can give his or her own answers. In such cases, patient-other agreement may not necessarily be desirable. For example, patients in the earlier stages of dementia may be able to provide responses to PROMs but fail to recognize the extent of their impaired well-being and physical role functioning. In such cases, a next-of-kin caregiver such as a spouse could provide an external assessment that indicates that the patient has some degree of problems in functioning, such as getting the groceries from car to kitchen or being comfortable in a social setting. In these circumstances, external (proxy) respondents can clearly introduce clinically important information.

Mode: Self-Administration Versus Interviewer Administration

Self-administration of PROMs is neither expensive nor influenced by interviewer effects; for these reasons, this mode of administration has traditionally been preferred. However, self-administration is not feasible for some patient populations, such as those who may be too ill to self-administer a questionnaire. In these cases, interviewer administration is often required. Until recently, interviewer administration was also required for those with low literacy; however, new multimedia methods are now available to overcome this barrier.

Main Advantages and Disadvantages of Different Modes of Administration

Table 3 summarized the principal benefits and drawbacks of different modes of administration, based on authoritative sources.130,131 Self-administered instruments are more cost-effective from a staffing perspective, and they may yield more patient disclosure, especially when collecting sensitive information.132 Disadvantages include the potential for more missing data and the inability to clarify any misunderstandings in questions or response options.

By contrast, interviewer-administered instruments allow for probes and clarification, and they permit more complexity in survey design (e.g., the use of complicated skip patterns or open-ended questions). This mode is also useful for persons with reading, writing, or vision difficulties. Disadvantages include the costs required to hire, train, and supervise interviewers and the potential pressure on respondents to answer quickly, rather than letting them proceed at their own pace. The potential for interviewer bias cannot be overlooked. It may arise from systematic differences from interviewer to interviewer or, occasionally, systematic errors on the part of many or even all interviewers.133

Additional Concerns About Sources of Bias

Other sources of bias for both administration modes include social desirability response set (the tendency to give a favorable picture of oneself) and acquiescent response set (the tendency to agree or disagree with statements regardless of their content).134,135

Legitimate concerns arise about the potential biasing effects of mode of administration on data quality and interpretation.136 Overall, evidence supports high reliability for instruments administered with different modes, but response effects have varied and have not been consistently in the same direction.121–124 For example, some studies have reported more favorable reports of well-being on self-administered questionnaires,137 whereas others have found the opposite effect.138–140 Still other studies reported mixed results141 or found no important differences attributable to mode of administration after adjusting for other factors.130,142,143 Fortunately, many types of error and bias can be overcome by appropriate selection and training of interviewers. Effects of different modes can also be evaluated with various psychometric and statistical techniques and models to determine the potential impact of response effects.144–148

Method of Administration

Advances in technology have changed the face of PROM assessment, increasing the number of administration options available. Multiple methods of self-administration currently exist, and the different methods may have different effects on the quality of the data.136 Although diverse administration methods provide more options for researchers and clinicians, they require different skills and resources of people being asked to respond to the questionnaire. This means that the choice of method of administration may pose differing levels of respondent burden.136

Several factors may account for differences in data quality across methods of administration: impersonality of the method, cognitive burden on the respondent, ability to establish the legitimacy of the reasons for which patients or others are even being asked to complete a questionnaire, control over the questionnaires, and communication style.136 Thus, when users are deciding on one (or more) appropriate methods of administration for a given PROM, they must give these factors due consideration.

Historically, paper-and-pencil administration served as the primary method of PROM assessment. Many PROMs were originally developed with the intention of paper-based administration, but they may be (and typically are) amenable to an electronic-based administration.149 Paper-and-pencil remains a widely used PROM administration method, with its primary advantage being cost-effectiveness in situations in which users face few mailing and follow-up costs.

However, the paper-and-pencil method has disadvantages. For example, it may require that a person’s responses be manually entered into a database for scoring purposes, raising the possibility of data entry errors that threaten the integrity of the results. Similarly, the need for manual data entry and scoring can be time-intensive. Although the availability of optical mark recognition and optical character recognition allow scanning of paper-and-pencil PROMs, this process still requires an extra step on the part of staff and may limit the acceptability of paper-and-pencil administration for purposes in which timely scoring and interpretation are important.

Advances in technology and the increasingly widespread availability of electronic resources have provided several alternatives to paper-and-pencil administration. Improved telephone technology has enabled the use of interactive voice response to administer PROMs. Interactive voice response involves a computer audio-recording of PROM questions administered via telephone to which people indicate their responses by selecting the appropriate key.136,149

In addition, computer-based administration methods have emerged as feasible alternatives to paper-and-pencil, such as web-based platforms, touchscreen computers, and multimedia platforms that can accommodate people with a range of literacy and computer skills (e.g., Talking Touchscreen/la Pantalla Parlanchina, audiovisual computer-assisted self-interviewing).136,149–151 Newer mobile forms of technology such as tablet computers and smartphones also offer promise as methods of PROM administration.

Electronic administration methods have advantages that contribute to their increasingly widespread adoption. For example, because patients or respondents enter the data themselves, the opportunity for data entry errors is minimal compared with paper-and-pencil administration with separate data entry. These electronic methods also typically allow for immediate scoring and feedback, which enhances applications requiring timely results. Furthermore, electronic PROM administration has been shown to be practical, acceptable, and cost-effective.60 Electronic methods may also provide people with increased comfort when responding to questions about socially undesirable behaviors.152

Nonetheless, these advantages must be considered in light of several important disadvantages. First, the cost of purchasing technology-based platforms may exceed that of traditional paper-and-pencil methods. Additionally, some patients may experience discomfort with technology or lack the skills necessary to navigate electronic administration methods. Moreover, reliance upon methods such as web-based platforms or smartphones raises questions about people’s access to these technologies, if they are not provided in the relevant settings as part of clinical practice, quality improvement, or other assessment efforts.

The availability of multiple methods of PROM administration highlights the importance of measurement equivalence across methods.149 Measurement equivalence is determined by comparing the psychometric properties of data obtained via paper-based administration and data collected through electronic administration.149 It can be assessed via cognitive testing, usability testing, equivalence testing, or psychometric testing (or various combinations of these techniques).149 A growing body of research documents the equivalence of electronic and paper-and-pencil administration of PROMs.153–155 These findings support the viability of electronic PROM administration as an alternative to paper-and-pencil methods.

In addition to measurement equivalence, patient privacy is another concern that cuts across both paper-and-pencil and electronic administration methods, albeit in differing ways. For paper-based PROMs, physical transfer of the PROM from patient to provider, as well as the physical existence of the completed PROM, may pose risks to the privacy and confidentiality of patients’ responses. Privacy also emerges as a concern about electronic methods, given potential security breaches related to transfer of data, computer errors, or unauthorized access to patient-reported data. These threats underscore the need for reliable and secure electronic platforms to protect patients’ privacy in the context of PROM assessment.

Patient-Reported Outcome Measures in the Clinical Setting

Collecting PRO data as part of clinical care has become common.2,156–158 Facilitating introduction of these PROMs into clinical practice and decision making promises many benefits. Advocates for using PROMs in clinical care propose that the results assist clinical providers in managing their patients’ care,159 enhance the efficiency of clinical practice,155,160 improve patient-provider communication,155,160-162 identify patient needs in a timely manner,155, 163 and facilitate patient-centered care.155 Other findings, however, suggest regional variation in perceived health and no positive effect of feedback via PROMs on care, even when combined with guideline-recommended interventions.164–166 As PROMs are used more in clinical practice, some methodological issues pertaining to the settings in which they are administered merit consideration.

A growing number of studies have investigated the use of PROMs in the clinic setting.155,160,162,163,167–170 When selecting PROMs for administration in clinical practice, users need to consider the efficiency of PROM administration, scoring, and interpretation. These factors are especially important because of the time-sensitive nature of the clinic workflow.155,167 In addition, acceptability of both the PROM and the data collection process for both patients and clinic staff is essential.155,167,171

Historically, several barriers have impeded widespread implementation of PRO data collection in clinical settings of all sorts, but especially for smaller or private practices. Many drawbacks are associated with paper-and-pencil administration of PROMs. One such barrier involves concerns about the potential disruption to the clinical workflow if patients are asked to complete PROMs.159 In addition, staff burden and clinician disengagement may hamper obtaining PRO data in clinical settings.159

Fortunately, technology advances, and the increased opportunities for methods of PROM administration that they afford, may help to overcome some barriers to PRO data collection in clinical practices and settings.167 For example, research supports the feasibility of using tablet computers155,163 and touchscreen computers for these purposes.150,151,162,167,168 Employing computers to administer PROMs may streamline and expedite the process and minimize staff burden and impact on clinic flow.

Conversely, concerns arise regarding the impact of clinic flow on the integrity of data collection, given the potential for patients to be interrupted while completing PROMs, which could potentially result in missing data.159 Another potential barrier involves the possibility that patients may experience anxiety in completing PROMs in the clinical settings before their appointments.159 Similarly, a possible lack of privacy when completing PROMs in waiting rooms or similar circumstances poses another potential obstacle to adequate PROM administration. Many of these concerns can be addressed by incorporating PROMs into the clinical workflow. This may also enhance completion rates. Both patients and providers will then be more likely to see this effort as integral to patient care.

Completing PROMs from home before or between medical appointments has been proposed as one strategy to overcoming the problems outlined above.159,172,173 Both web-based PROM administration and interactive voice response constitute possible methods for at-home PRO data collection.151,161,162 Although the home may serve as a feasible alternative to the clinical practice setting for various reasons, those considering implementing home-based PRO data collection need to consider several factors.159,172 First, for patients to be able to complete PROMs at home, they must have access to the type of technology by which the PROM is administered (e.g., Internet). Second, patients must find completing PROMs at home acceptable. Third, users should have a plan in place to address situations in which home-based PROM responses suggest critical or acute problems. This may pose a logistical challenge in comparison with PROMs completed in-clinic, where medical providers and access to intervention are readily available.

As with any setting, health information privacy is paramount; therefore, one barrier to home-based PROMs is the availability of secure data collection platforms.159,174 Finally, an especially difficult issue may be clinician acceptability of home-based PRO data collection. The problems include reimbursement for clinicians’ time using a website to address outcomes that patients report, rather than meeting directly with patients to discuss questions or problems that their patients raise through answers to the PROMs.159,174

Implementing PRO data collection in other settings, such as rehabilitation or skilled nursing facilities, may also yield valuable clinical information and guide interventions. Less research has addressed the issues in administering PROMs in these settings. However, handheld technology may offer a means of facilitating collection of PRO data in the rehabilitation setting following orthopedic surgery.175

Apart from technology per se, other issues in such facilities include the varying level of patients’ acuity status and levels of cognitive capacity to complete PROMs. In these cases, users may need to consider whether using proxy reports may be beneficial. In any case, the potential strengths and weaknesses of different modes and methods of administration still need to be taken into account.

Scoring: Classical Test Theory Versus Modern Test Theory

Many PROMs involve the measurement of latent (not directly observable) variables; examples might include symptoms (not signs) of gastrointestinal disease or pain. The only way to estimate a person’s level on a particular attribute is by asking questions that represent the attribute in question. Most PROMs comprise multiple items that are aggregated in some way to produce an overall score. The most common multi-item instruments are designed to reflect a single underlying construct. The item responses either are caused by or are manifestations of the underlying latent attribute, and the items are expected to correlate with one another.176–179

In some other kinds of multi-item measures, the items may cluster together but would not be expected to correlate. A common example of this latter measure is a comorbidity index comprising various health conditions, e.g., diabetes, asthma, and heart disease. Another example might be a measure of access to care consisting of problems with paying for care, having a regular provider, ease of transportation to care, and ease of making an appointment. Although such items would not necessarily be correlated, together they might form an adequate measure of access. The discussion on scoring below refers to the former type of instrument reflecting underlying constructs with items expected to correlate with one another.

Scoring is based on classical test theory (raw scores) or modern test theory (item response theory [IRT]).180–189 Multiple items are preferred because a response to a single item provides only limited information to distinguish among individuals.190 In addition, measurement error (the difference between the true score and the observed score) tends to average out when responses to individual items are summed to obtain a total score.190–192

Classical test theory estimates the level of an attribute as the sum, perhaps weighted, of responses to individual items, i.e., as a linear combination.13,190,193–196 This approach requires all items on a particular PROM to be used in every situation for it to be considered valid. Hence, the instrument is test-dependent.194,196–198

IRT, by contrast, enables test-free measurement; i.e., the latent trait can be estimated using different items as long as their locations (difficulty levels) have been calibrated on the same scale as the patients’ ability levels.13,190,196–201 IRT allows computer-adaptive testing (CAT) in which the number, order, and content of the questions are tailored to the individual patient. This approach has two distinct advantages: (1) questionnaires can be shorter, and (2) the scale scores can be estimated more precisely for any given test length. This also means that different patients do not need to complete the same set of items in every situation.13

Using IRT poses nontrivial challenges, however. Understanding the assumptions and the psychometric jargon—e.g., “calibration,” “difficulty levels”—is not easy. The methodology and software are complex. IRT is also not appropriate for causal variables and complex latent traits.13,196,197,202 Overall, however, IRT offers a very convenient and efficient framework for PRO measurement, and it is becoming increasingly well understood and easier to adopt.

Linking or Cross-Talk Between Different Measures of the Same Construct

A common problem when using an array of health-related outcomes for diverse patient populations and subgroups is establishing the comparability of scales or units on which the outcomes are reported.203,204 Typically the metric has been emphasized more than the measure. Equating is a technique to convert the system of units of one measure to that of another. Analysts have successfully used this process of deriving equivalent scores in educational testing to compare test scores obtained from parallel or alternate forms that measure the same characteristic with or without having common anchor items.

Theoretically (and in practice when certain conditions are met), different age-specific measures could be linked, thus placing child, adult, and geriatric estimates on a common metric. For example, the many items that constitute a condition-specific (e.g., cancer) quality of life scale could be incorporated into a single shared bank and linked through a common-anchor design.203 The methods of establishing comparable scores—often called linking—vary substantially depending on the definition of comparability. For that reason, standardization is critical in comparing PROMs across studies. Two measures may be considered linked if they produce scores that match the first two moments of their distributions (i.e., mean and standard deviation for a specific group of examinees or two randomly equivalent groups). Another definition may involve matching scores with equal percentile ranks based on a single sample of examinees or random samples drawn from the same population.

Addressing Barriers to Patient-Reported Outcomes Measurement

Users need to address yet other barriers to PRO measurement. These include administering PROMs in vulnerable populations; literacy, health literacy, and numeracy; language and cultural differences; differences in functional abilities; response shift; use of different methods and modes of administration; and the impact of nonresponders to items and questionnaires. In discussing these issues below, we also note best practices and recommendations for addressing them.

Vulnerable Populations

Recognition is growing that some population subgroups are particularly vulnerable to receiving suboptimal health care and to failing to achieve health outcomes equivalent to those experienced by the general population.205–207 Vulnerability is multifaceted. It can arise from age, race, ethnicity, or sex (or gender); health, functional, or developmental status; financial circumstances (income, health insurance); place of residence; or ability to communicate effectively.205 Moreover, many of these factors are synergistic, so that vulnerability has many sources that present a complicated picture for persons in these groups. This definition encompasses populations who are vulnerable because of a chronic or terminal illness or disability and those with literacy or language difficulties.150,206 It also includes people residing in areas with health professional shortages.168

Administration of PROMs is usually performed with paper-and-pencil instruments, and multilingual versions of questionnaires are often not available. Interviewer administration is labor-intensive and cost-prohibitive in most health care settings. Therefore, patients with low literacy, those with certain functional limitations, and those who do not speak English are typically excluded, either explicitly or implicitly, from any outcome evaluation in a clinical practice setting in which patient-reported data are collected on forms.

As PROs continue to play a greater role in medical decision making and evaluation of the quality of health care, sensitive and efficient methods of measuring those outcomes among underserved populations must be developed and validated. Minority status, language preference, and literacy level may be critical variables in differentiating those who receive and respond well to treatment from those who do not. These patients may experience different health outcomes because of disparities in care or barriers to care. Outcome measurement in these patients may provide new insight into disease or treatment problems that may have gone undetected simply because many studies have not been able to accommodate the special needs of such patients.206,208

Literacy

Low literacy is a widespread but neglected problem in the United States. The 1992 National Adult Literacy Survey (NALS)209 and the 2003 National Assessment of Adult Literacy (NAAL)210 measured three kinds of English language literacy tasks that adults encounter in daily life (prose literacy, document literacy, quantitative literacy). Almost half of the adult population experiences difficulty in using reading, speaking, writing, and computational skills in everyday life situations. An additional seven million adults in the US population were estimated to be nonliterate in English. Generally speaking, health literacy problems complicate matters of both health care delivery and PRO measurement.211,212

Health literacy is “the degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions.”213 This involves using a range of skills (e.g., reading, listening, speaking, writing, numeracy) to function effectively in the health care environment and act appropriately on health care information.214, 215 Limited health literacy is widespread214,216 and is associated with medication errors, increased health care costs, hospitalizations, increased mortality, decreased self-efficacy, and inadequate knowledge and self-care for chronic health conditions.211,214,217–219 Health literacy may be more limiting than functional literacy because of the unfamiliar context and vocabulary of the health care system.212,214,220

Contributing to poor understanding of the importance of literacy skills is the fact that low literacy is often underreported. The NALS reported that 66 percent to 75 percent of adults in the lowest reading level and 93 percent to 97 percent in the second-lowest reading level described themselves as being able to read or write English “well” or “very well.”209 In addition, low-literacy individuals are frequently ashamed of their reading difficulties and try to hide the problem, even from their families.221,222 Lack of recognition and denial of reading problems create a barrier to health care. Some low-literacy patients have acknowledged avoiding medical care because they are ashamed of their reading difficulties.221,222 In addition, because everyday life may place only moderate reading demands on people, individuals may not even be aware of their reading problems until a literacy-challenging event occurs (e.g., reviewing treatment options, reading a consent document, completing health assessment forms).221,222

A reader’s comprehension of text depends on the purpose for reading, the ability of the reader, and the text that is being read. Two important factors in the readability of text are word familiarity and sentence length.223 Unfamiliar words are difficult when first encountered. Long sentences are likely to contain more phrases or clauses. Although longer sentences may communicate more information and more ideas, they are more difficult for readers to manage than more, but shorter, sentences that convey the same information. Moreover, longer sentences may also require the reader to retain more information in short-term memory.224–227

Addressing health literacy is now recognized as critical to delivering person-centered health care.228 It is an important component of providing quality health care to diverse populations, and it will be incorporated into the National Standards for Culturally and Linguistically Appropriate Services.229 For example, translating highly technical medical and legal language into easily understood language is challenging, whether into English or another language. Health literacy practices are also included in the National Quality Forum 2010 updated set of safe practices.76 A recent discussion paper summarized 10 attributes that exemplify a “health literate health care organization.”228 These attributes cover practical strategies across all aspects of health care, from leadership planning and evaluation, to workforce training, to clear communication practices for patients.

Language and Culture

The availability of multiple language versions of PROMs has enabled users to administer them relatively routinely in diverse research and practice settings. For various purposes, doing analyses on data that have been pooled across all patients is desirable. Yet concern is often voiced about combining data from different cultures or languages.10 In some research and practice-based initiatives, evaluating cross-cultural differences in PROMs is of interest. In all these applications, researchers must use unbiased questionnaires that can detect important differences among patients.206,230,231

Possible cultural differences in interpreting questions and in response styles may limit data pooling or may constrain comparisons across members of different cultural groups.232–234 Similarly, poor quality translations can produce noncomparable language versions of PROMs.233,235,236 For a questionnaire to be suitable for use as an unbiased measure of a PRO, items in the questionnaire must perform similarly across different groups (i.e., they must be cross-culturally or cross-linguistically equivalent).231,237–248 Without assurances that the PROM is culturally and linguistically “fair,” detected treatment differences caused by items that function differently across groups could incorrectly be interpreted to reflect real treatment differences. Similarly, differences in questionnaire performance may mask true treatment differences, especially when language or cultural groups are not balanced across the populations, practices, or settings to be compared.

Functional Abilities

Ideally, PROMs that are intended to be used in performance measurement applications can be completed by all patients in the target populations. Otherwise, if a significant proportion of the population is left out, the remaining individuals being assessed may be unrepresentative of the whole practice or setting. This problem can (and probably will) compromise the validity of the performance measure.

Functional limitations associated with disability are one type of potential barrier to PRO assessment that could affect PRO use in performance measurement. The prevalence of disability, defined as specific functional or sensory limitations, is estimated at 47.5 million Americans, or 22 percent of the US population.249 People with a disability are more likely to develop health conditions and be consumers of health care than those with no disabilities of these types. Thus, they are an important group to include when evaluating health care, but one that is frequently not included in such clinical, quality improvement, or simulation initiatives.250,251

Common disabilities that can affect PROM assessment include problems with vision (e.g., decreased visual acuity, color-blindness), hearing, motor skills (e.g., upper extremity limitations), and cognitive deficits (e.g., impaired comprehension, reading). Fortunately, to address many of these barriers, those administering such measures have a variety of techniques: choosing appropriate methods and modes of data collection, enabling use of assistive devices and technology, and using principles of universal design when developing instruments.201–202

Universal design refers to designing products and environments in such a way that all people can use them, to the greatest extent possible, without adaptation or specialization.252,253 A well-known example of universal design is the use of curb cuts. Initially intended to facilitate the use of wheelchairs, curb cuts have also benefited bicycle riders and people pushing children in strollers, among others. An exhaustive examination of how to apply the principles of universal design to PROM assessment is beyond the scope of this paper, and those developing or modifying measures according to the principles of universal design are encouraged to consult with relevant experts. Also, if developers are creating an instrument based on information technologies, using the standards in Section 508 of the Rehabilitation Act Amendments of 1998 can maximize flexibility.254 Although we cannot list all potential ways to address functional limitations, we identify below some common ways to do so. Harniss and colleagues describe how PROMIS is taking a systematic approach to enhancing accessibility.255

In general, providing multiple means of understanding and responding to measures is important. These include visual, voiced, and tactile mechanisms. The specific means may differ depending on the method and mode of administration.

For instance, for people with impaired vision, one might consider using in-person or telephone interviews (advantages and disadvantages discussed in an earlier section), an interative voice response system, Braille responses for Braille users, or touchscreen with tactile or audio cues. Information technology-based systems should accommodate assistive devices such as screen readers and screen-enlargement software. For patients with hearing impairments, options include providing visual presentation of words or images, using TTY (text telephones) or a video relay service, and allowing the user to adjust the sound level. For persons with motor limitations, response modes that are easier to manipulate (track ball) or are nonmotoric (e.g., using voice recognition software) can be helpful. For those with certain types of cognitive deficits (e.g., limited reading comprehension), the methods to address literacy described earlier should be considered. However, if cognitive deficits are severe, a proxy respondent may be more appropriate.

Allowing for multiple response modes or methods may lead to measurement error. In a later section, we discuss the potential impact of different methods and modes on response rate, reliability, and validity. The risk of introducing measurement error seems outweighed by the risk of excluding a significant segment of the population.

Response Shift, Adaptation, and Other Challenges to Detecting True Change

The ability to detect true change over time in PROMs poses another barrier to the integrity of valid PRO assessment. Often, detecting true change is associated with the phenomenon of response shift. This has been defined as “a change in the meaning of one’s self-evaluation of a target construct as a result of: (a) a change in the respondent’s internal standards of measurement (i.e., scale recalibration); (b) a change in the respondent’s values (i.e., the importance of component domains constituting the target construct); or (c) a redefinition of the target construct (i.e., reconceptualization).”256,p.1532 A change in perspective over time may result in patients’ attending to PROMs in a systematically different way from one time point to another.257

Response shift serves as a barrier to PRO assessment for several important reasons. For example, it threatens longitudinal PRO assessment validity, reliability, and responsiveness.257–260 Response shift can complicate the interpretation of PROM scores; a change in a PROM may occur because of response shift, an effect of treatment, or both.261

Monitoring for response shift can aid PROM users in interpreting longitudinal PRO data.259 Several strategies have been proposed to identify response shift, although each has limitations. The “then test” compares an actual pre-test rating and a retrospective pre-test rating to assess for shift, but it is less robust than other methods of detecting response shift257 and it is confounded with recall bias.260 Structural equation modeling has also been proposed as a way to identify response shift, but it is sensitive only if most of the sample is likely to make response shifts.262 Finally, growth modeling creates a predictive growth curve model to investigate patterns in discrepancies between expected and observed scores, thus assessing response shift at the individual level.263 Although growth modeling enables users to detect both the timing and shape of response shift,259 it cannot differentiate between random error and response shift.260

Implications of the Different Methods and Modes for Response Rate, Reliability, and Validity

Implementing Data Collection Methods

Users of PROMs must make a variety of decisions about the data collection method and the implications of those decisions on costs and errors in surveys.132 Two basic issues underlie these decisions: What is the most appropriate method to choose for a particular question, and What is the impact of a particular method on survey errors and costs?

Methods differ along a variety of dimensions.132 These include, although are not limited to, the degree of interviewer involvement and the level of interaction with the respondent. Channels of communication (sight, sound, touch) may prompt different issues of comprehension, memory stimulation, social influence affecting judgment, and response hurdles. Finally, the degree of technology use is a major consideration.

Using Different Method or Mode Than the One Originally Validated

Considering the implications of using a different method or mode than the one on which the PROM was originally validated is also important. Many existing PROMs were initially validated in paper-and-pencil form. However, potential differences exist between paper-and-pencil and electronic-based PROM administration, ranging from differences in how items and responses are presented (e.g., items presented one at a time, size of text) to differences in participant comfort level in responding (e.g., ability to interact with electronic-based platforms).153

As noted earlier, a growing body of research suggests measurement equivalence between paper- and computer-administered PROMs.153,264 However, the effect of a particular data collection method on a particular source of error may depend on the specific combination of methods used.132 Thus, as new methods are developed, studies comparing them with the methods they may replace must be done.

In framing expectations about the likely effect of a particular approach, developers need to invoke theories about that approach. Theory is informed by past mode-effects literature and by an understanding of the features or elements of a particular design.132 Similarly, mode choices involve trade-offs and compromises. Therefore, the choice of a particular approach must be made within the context of the particular objectives of the survey and the resources available.132

Using Multiple Methods and Modes

The implications of using multiple methods and modes also warrant consideration. One might choose to blend methods for one or more reasons: cost reduction, faster data collection, optimization of response rates.132 When combining methods or modes (or both), users must ensure that they can disentangle any effects of the method or mode from other population characteristics. This is especially true when respondents choose which method or mode they prefer or when access issues determine the choice of method or mode.132 As in the case of using a different method or mode than the one in which the PROM was originally validated, instruments and procedures should be designed with an eye to ensuring equivalence across both methods and modes.265

Accounting for the Impact of Nonresponders

Difficulties with data collection and questionnaire completion are major barriers to the successful implementation of PRO assessment. The principal problem is that missing data can introduce bias in analyses, findings, and conclusions or recommendations.13 The choice of mode and method of questionnaire administration can affect nonresponse rates and nonresponse bias.132 In addition, often the timing of the assessment can be very important, e.g., just before or just after surgery.

Missing data may be classified as either item nonresponse (one or more missing items within a questionnaire) or unit nonresponse (the whole questionnaire is missing for a patient). Evaluating the amount of, reasons for, and patterns of missing data is important.266–269 Some common strategies to evaluate nonresponse bias include

  • conducting an abbreviated follow-up survey with initial nonrespondents132

  • comparing characteristics of respondents and nonrespondents270,271

  • comparing respondent data with comparable information from other sources272

  • comparing on-time vs. late respondents.273

When dealing with missing data, analysts can use various statistical methods of adjustment. For item nonresponse in multi-item scales, several useful techniques tend to yield unbiased estimates of scores: simple mean imputation, regression imputation, and IRT models. For both item and unit nonresponse, it is important to determine whether missing data are considered to be missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR).266,267 For unit nonresponse, users can implement a range of statistical techniques, depending on the reason for missing data.274–278

Selection of Patient-Level PROMs

Patient-Centered Outcomes Research

An essential aspect of patient-centered outcomes research (PCOR) is the integration of patient perspectives and experiences with clinical and biological data collected from the patient to evaluate the safety and efficacy of an intervention. Such integration recognizes that although traditional clinical endpoints such as laboratory values or survival are still very important, we also need to look at how disease and treatment affects patients’ health-related quality of life (HRQL). For such HRQL endpoints, in most cases, the patients are the best source for reporting what they are experiencing. The challenge is how best to capture patient data in a way that maximizes our ability to inform decision making in the research, health care delivery, and policy settings.

Access to psychometrically sound and decision-relevant PROMs will allow clinicians, investigators, administrators, and others to collect empirical evidence on the differential benefits and harms of a health-related intervention.279–282 Those obtaining such information can then disseminate findings to patients, clinicians and health care professionals, payers or insurers, and policy makers. Doing so may provide a richer perspective on the net impact of interventions on patients’ lives using endpoints that are meaningful to the patients.283

Increasingly, longitudinal observational and experimental studies have included PROMs. To optimize decision making in clinical care, users must assess these PROMs in a standardized way, using questionnaires that demonstrate specific measurement properties.279,282,284–287 Our group recently identified minimum standards for the design or selection of a PROM for use in PCOR activities.288 Central to this work was understanding which attributes would make a PROM appropriate or inappropriate for such purposes. We identified these standards through two complementary approaches. The first was to conduct an extensive review of the literature including both published and unpublished guidance documents. The second was to assemble a group of international experts in PROMs and PCOR efforts to seek consensus on the minimum standards.288

Attributes of PROMs

Many documents summarize attributes of a good HRQL measure. They include (an illustrative list) guidance documents from the FDA;289–292 the 2002 Medical Outcomes Trust guidelines on attributes of a good HRQL measure;293 the extensive, international expert-driven recommendations from COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments);285,294–298 the EORTC (European Organization for Research and Treatment of Cancer) guidelines for developing questionnaires;299 the Functional Assessment of Chronic Illness Therapy (FACIT) approach;36 the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) task force recommendation documents;149,241,300,301 and several others.245,284,302–304 Since 2010, ISOQOL (the International Society of Quality of Life) has completed two important guidance documents on use of PROMs in comparative effectiveness research and on integrating PROMs in health care delivery settings.284,305 Finally, the NIH PROMIS network released a standards document in 2012 that is useful for informing the minimal and optimal standards for designing PROMs.306

Table 4 presents long-established criteria to consider in selecting PROMs for research, quality improvement activities, and now performance measurement. It specifies issues that PROM users need to consider when contemplating incorporating PROMs into performance measures and offers some best practices for evaluating PROMs in this context.

Table 4

Primary criteria for evaluating and selecting patient-reported outcome measures (PROMs) for use in performance measurement.

The eight primary criteria are the following: (1) conceptual or measurement model; (2) reliability and its subparts (e.g., internal consistency reliability); (3) validity and its subparts (e.g., content validity); (4) how scores are interpreted; (5) burden placed on respondents; (6) alternative modes and methods of administration; (7) cultural and language adaptations; and (8) use of electronic health records (EHRs). The table does not specify key issues and best practices for reliability or validity; that information is given only for the subcriteria. We illustrate these points with selected information pertaining to the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC).307

Important Differences in PROM Attributes

Selecting PROMs for use in performance measurement and related activities such as quality improvement programs raises the question of what are the key differences, if any, when selecting PROMs for research purposes rather than these other nonresearch purposes. Generally speaking, the factors to consider when selecting PROMs for performance measurement and quality improvement activities are more similar than different. Thus, we focus here more on the differences that users will need to take into account.

Instrument Length

One key difference involves the length of the PROM. Longer questionnaires may be better tolerated in the context of research than in clinical practice settings; thus, to facilitate widespread adoption, PROMs for performance measurement should be short surveys. Addressing the need for shorter PROMs may, however, compromise other important measurement characteristics, such as reliability (i.e., precision and reproducibility).

Implications of PRO Data for Action

Another key difference in factors to consider when selecting PROMs for clinical practice quality improvement, or performance measurement and accountability efforts, is the implications or consequences of the PRO data. Specifically, using PROMs for these purposes carries the expectation that important consequences will arise in terms of accountability for health care professionals, health care systems and plans, and clinical settings. Therefore, the stakes of PROMs are higher in the performance measurement context than in research applications.

The problem lies, in part, in the constraints to the quality of the measurement level arising from factors unique to performance measurement. These can include instrument length or representativeness of the patient or consumer populations surveyed. These considerations highlight the importance of emphasizing responsiveness and sensitivity to change when considering PROMs for use in the ways envisioned for NQF-endorsed measures.

History of Successful Use of PROMs

In selecting a PROM for these various purposes, a logical first step involves reviewing what measures have already been used successfully. Using PROMs for these programs remains an understudied area, but several examples of PROMs used as indexes of performance measurement provide an initial foundation upon which the field can expand.

The Veterans Health Study assessed PROs within the Veterans Health Administration (VA) system.326 In response to the VA’s incorporation of patient-reported functional status as a domain of interest in their performance measurement system, the Veterans RAND 36-Item Health Survey (VR-36) and the Veterans Rand 12-Item Health Survey (VR-12) have been administered within the VA system to evaluate veterans’ needs and to assess outcomes of clinical care at the hospital, regional, and health care system levels.326,327 The Centers for Medicare & Medicaid Services (CMS) and its Medicare Advantage Program328 have applied these methods for similar purposes, and CMS has also designated the VR-12 as the principal outcome measure of the Medicare Health Outcomes Survey (HOS).329

Research examining the VR-36 and SF-36 in such uses does inform the selection of PROs for performance measurement. Nevertheless, limitations remain to use of these measures as indicators of high-quality care and as sources of information for holding practices, providers, hospitals, health plans, or others accountable for their results. These limitations include the “static” nature of these measures, meaning that for analysts to be able to obtain an individual’s score, all items must be administered—even those items that add little to the precision of measurement. In addition, content is fixed by the composition of the scale. Therefore, attention has turned to alternative PRO tools and “dynamic” instruments with clear potential for these types of uses (i.e., as patient-reported performance measures).

PROMIS constitutes arguably the best example of a future direction of PROs that will be acceptable for use in practice, quality improvement, or performance measurement programs. Developed using IRT methodology, PROMIS offers a new generation of PROMs with better reliability, validity, precision, and other attributes than is typically true for so-called legacy instruments. These measures have the important attribute of being shorter than such older instruments as well.187 PROMIS measures form a hybrid between static generic PROMs and more flexible adaptive measures. They comprise items that are specific to the overall content of the measure but that are also applicable across the diverse spectrum of health status.

Although a growing body of literature provides preliminary evidence supporting the psychometric quality of the PROMIS measures, future work needs to explore applying PROMIS measures as tools for assessing the performance of health care organizations. Nevertheless, the PROMIS system provides a robust model by which the use of PROMs as performance measures can be expanded and elaborated upon, owing to its rigorous methodological characteristics.

Documentation of Particular Attributes of PROMs

Documentation, in peer-reviewed literature or on publicly accessible websites (or both), of the evidence of a PROM to reflect all of these measurement properties will improve acceptance of the PROM for use as a performance measure. To the extent that the evidence came from populations similar to the studies’ target populations, the more confidence clinicians, analysts, administrators, and policy makers can have in the PROM to capture patients’ experiences and perspectives.

Applying any set of selection standards for PROs calls for attention to several considerations. One key issue is that the populations involved in these efforts will likely be quite heterogeneous. This population heterogeneity should be reflected in the people selected to participate in the various pilot tests or studies that are part of the evaluation of the measurement properties for the PROM. For example, both qualitative and quantitative studies may require quota sampling based on race and ethnicity that reflects the prevalence of the condition in the study target population. Additionally, patients must be actively engaged as stakeholders in identifying the domains most important to measure and in selecting specific PROMs for use in performance measurement.

Participants’ literacy is another important consideration for use of PROMs. Data collected from PROMs are valid only if the participants in a study can understand what is asked of them and can provide a response that accurately reflects their experiences or perspectives. Developers of PROMs must ensure that the questions and response options are clear and easy to understand. Pretesting of the instrument (e.g., cognitive testing) should include individuals with low literacy to evaluate the questions.330

Response burden must be considered when selecting a PROM. The instrument must not be overly burdensome for patients, as they are often sick and cannot be expected to tolerate completing lengthy questionnaires.

Finally, researchers must carefully consider the strength of evidence for the measurement properties. No threshold exists to indicate that an instrument is (or is not) valid for any or all populations or applications. In addition, no single study can confirm all the measurement properties for all contexts. Like any scientific discipline, measurement science relies on an iterative, accumulating body of evidence examining key properties in different contexts. Thus, it is the weight of the evidence that informs the evaluation of the appropriateness of a PROM. More established PROMs will have the benefit of having accrued more evidence than more recent entries; however, more recent entries tend to have improved measurement properties that warrant attention.

PROM Characteristics for Consideration

Generic Versus Condition-Specific Measures

One factor to consider when selecting a patient-level PROM is whether to use a generic instrument or a condition-specific instrument. Several considerations can inform this choice.331 First, the specific population of interest may guide whether one opts to use a generic or condition-specific PRO. For example, if the target population comprises mainly healthy individuals, or people with multiple comorbidities, a generic measure is the preferred choice. Conversely, if the goal is to examine a specific subset of patients with a particular diagnosis or receiving a common treatment, then a condition-specific measure may be more appropriate, but this is ideally evaluated in context.

In addition, outcomes of interest may guide the selection process. Generic measures may capture a different category of outcomes when compared with a condition-specific PROM. For example, a generic measure may assess domains of general function, well-being, or quality of life, whereas a condition-specific PRO may measure symptoms expected to be directly addressed by a condition-specific intervention. The more focused the interest in a specific symptom or set of symptoms that are unique to the condition, the more likely a condition-specific instrument will be preferred.332

Generic PROMs have some important advantages. They allow for comparability across patients and populations,331 although they are more suitable for comparison across groups than for individual use.333 Global PROMs also allow assessments in terms of normative data that can be used to interpret scores.331 This enables evaluation against population norms or comparison with information about various disease conditions. They can also be applied to individuals without specific health conditions, and they can differentiate groups on indexes of overall health and well-being.331

Generic PROMs also have some disadvantages. They may tend to be less responsive than condition-specific measures to focal changes that are better detected with a condition-specific measure. For that reason, they may underestimate health changes in specific patient populations.334 Additionally, they may fail to capture important condition-specific concerns.334

Condition-specific PROMs are an alternative to generic PROMs. One advantage of condition-specific PROMs is the possibility for improved relevance and responsiveness.331 They also enable differentiation of groups at the level of specific symptoms or patient concerns.331 However, the condition-specific focus introduces the notable difficulty of making comparisons across patient populations with different diseases or health conditions.331

Given their respective benefits and limitations, we recommend that a combination of generic and condition-specific measures is likely to be the best choice for the performance measurement purposes that those assessing or reporting on quality of care in this country, such as the NQF, have most in mind. Generic and condition-specific PROMs may measure different aspects of HRQL when administered in combination,335 resulting in more comprehensive assessment. Consequently, hybrid measurement systems have emerged to facilitate combining them. For example, the FACIT system consists of a generic HRQL measure plus condition-specific subscales. PROMIS, which was developed to create item banks that are appropriate for use across common chronic disease conditions,336 represents another example of a hybrid system of PROMs that combines both global and targeted approaches.

Measurement Precision

Another factor to consider when selecting a patient-level PROM is measurement precision. Measurement precision refers to the level of variation in multiple measurements of the same factor; measures with greater precision vary less across assessment time points. PROMs with greater measurement precision also demonstrate greater sensitivity to change.337 Given that most PROMs were originally developed as research tools, they may lack the level of precision necessary for assessing individuals on these types of outcomes.338 Although performance measures will aggregate to practice, provider, or organization levels, adequate measurement precision at the patient level is still needed.

Regarding measurement precision, measures based on IRT tend to have greater precision than measures based on classical test theory.338 Specifically, computerized adaptive tests (CATs) offer greater precision than static short-forms derived from item banks; however, short forms are an acceptable alternative when CAT approaches are infeasible.339,340 Although CATs include a greater number of items in an item bank, they allow tailored measurement, resulting in shorter instruments and better precision. Consequently, using PROMs derived from IRT techniques is recommended to achieve the greatest measurement precision.

Sensitivity to Change, or Responsiveness

Sensitivity to change (also referred to as responsiveness) is another important factor to consider when selecting a PROM because the ability to detect a small, but important, change is necessary when monitoring patients and implementing clinical interventions.38 Sensitivity to change is a type of validity characterized by within-subject changes over time following an intervention.341,342

Responsiveness is conceptualized in many ways, which leads to different findings and interpretations.343 Definitions of sensitivity to change range from the ability to detect any kind of change, regardless of meaningfulness (e.g., a statistically significant change post-treatment), to the ability to detect a clinically important change. To be clinically useful, PROMs must demonstrate sensitivity to change both when individuals improve and when they deteriorate.342

Methods for assessing responsiveness vary markedly as well. These methods differ primarily in terms of whether they are intended to demonstrate statistically significant changes to quantify the magnitude of change.343 The lack of equivalence across methods for detecting change can be problematic for interpretation, given that the different methods for detecting responsiveness produce different classifications of who is improved or not.344 Indeed, relying solely on statistical tests of responsiveness is not recommended, given that such findings may not accurately reflect what is meaningful to patients or clinicians.345

Several factors can limit responsiveness to change. First, multi-trait scales containing items that are not relevant to the population being assessed may fail to capture change over time.346 The responsiveness of a PROM may also be constrained by using scales that offer categorical or a limited range of response options.346 PROMs that specify an extensive timeframe for reporting also will not be likely to demonstrate change, particularly when administered regularly over a brief period of time.346 The responsiveness of a PROM is also limited when it includes items that reflect stable characteristics that are unlikely to change. Scales that contain items with floor or ceiling effects are also problematic.346 A PROM sensitivity to change may depend upon the direction of the change. For example, Eurich and colleagues found that PROMs were more responsive to change when patients got better clinically than when they got worse.38

In addition to these factors, a growing body of research suggests that condition-specific PROMs can be more sensitive to change than generic PROMs.38,40,347–349 Responsiveness to change is likely influenced by the purpose for which the measure was originally developed.349 For example, measures developed to emphasize specific content areas would be expected to show greater post-treatment change in those content areas.342 The greater sensitivity to change in condition-specific PROMs may be attributed to the strong content validity inherent in condition-specific measures.38 As a result, using a combination of condition-specific and generic PROM may yield the most meaningful data.38,40

Minimally Important Differences

The difference between clinical versus statistical significance also merits consideration when selecting a PROM. Historically, research has relied upon tests of statistical significance to examine differences in scores between patients or within patients over time. However, concerns arise regarding whether statistically significant differences truly reflect differences that would be perceived as important to the patient or the clinician. Consequently, attention has shifted to the concept of clinically significant differences in PROM scores.

Experts have proposed a variety of approaches to determining clinical significance. For example, clinically significant change has been defined as “changes in patient functioning that are meaningful for individuals who undergo psychosocial or medical interventions.”350 Similarly, meaningful change is defined (from the patient perspective) as “one that results in a meaningful reduction in symptoms or improvement in function . . . .”351

Minimally important differences (MIDs) represent a specific approach to clinical significance. They are defined as “the smallest difference in score in the outcome of interest that informed patients or informed proxies perceive as important.”352 Minimum clinically important differences (MCIDs) constitute an even more specific category of MID. MCIDs are defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management.”353

Examining clinically significant differences poses several important implications.352 First, investigating clinically significant (versus statistically significant) differences in scores aids users in interpreting PROMs. Second, focusing on clinically significant differences also emphasizes the importance of the patient perspective, which may not be adequately captured when looking mainly at statistically significant differences. Third, the ability to look at clinically significant differences in scores informs the evaluation of the success of a clinical intervention. Finally, in the context of clinical research, clinically significant differences can assist with sample size estimation.

Currently, no methodological gold standard exists for estimating MIDs.351,354 Two primary methods are currently in use: the anchor-based method and the distribution-based method.

The anchor-based method of establishing MIDs assesses the relationship between scores on the PROM and some independent measure that is interpretable.352 Evaluators have several options for the type of anchor they might select when using an anchor-based method. For instance, clinical anchors that are correlated with the PROM at the r ≥ 0.30 level may serve as appropriate anchors.304,317 Clinical trial experience can inform the selection of these clinical anchors,355 including the use of multiple clinical anchors.356

Transition ratings represent another potential source of anchors when establishing MIDs. Transition ratings are patients’ within-person ratings of change.317,357 However, because of concerns about validity, experts recommend that researchers or other users examine the correlation between pre- and post-test scores and the transition rating.358 Patients’ between-person differences can also be used as anchors when establishing MIDs for PROMs.314,317 Additional sources for anchors when establishing MIDs include HRQL-related functional measures used by clinicians317,357 and objective standards (e.g., hospital admissions, time away from work).358

Although the anchor-based method offers promise for establishing MIDs in PROMs, several limitations should be considered. First, the transition rating approach to anchor selection is subject to recall bias on the part of the patient.351 Second, global ratings may account for only some variance in scores.351 Third, the anchor-based method does not take into consideration the measurement precision of the instruments being used.351

The distribution-based method represents the second method of establishing MIDs in PROMs. The distribution-based method uses the statistical characteristics of the scores when establishing MIDs.352 Specifically, the distribution-based approach evaluates change in scores in relation to the probability that the change occurred at random.351

As in the case of the anchor-based method, several methods are available when applying a distribution-based approach to establishing MIDs. First, the t-test statistic has been used to establish MIDs when examining change over time.351 However, given that this relies solely on statistical significance, it may not reflect change that is clinically meaningful, and it is also subject to variation due to sample size.351 Second, distribution-based methods may also be grounded in measurement precision and the standard error of the mean (SEM).351 Specifically, the 1 SEM criterion can be used as an alternative to MID when assessing the magnitude of PROM score changes.359 Sample variation, such as effect size and standardized response mean, constitutes another method for establishing MIDs using the distribution-based method.351 When using this method, it is recommended that the effect size be specific to the population being studied.357 Evidence suggests that MID estimates using sample variation are approximately one-half of a standard deviation.360

Finally, reliable change constitutes another method of using the distribution-based approach to establish MIDs.351 Reliable change is based on the standard error of measurement difference (SEMD); it indicates how much the observed change in an imprecise measure exceeds fluctuations that are random in nature.351 Although the distribution-based approach serves as a possible alternative to the anchor-based methods, little consensus exists on the benchmarks for establishing changes that are clinically significant.351

Given limitations of the anchor- and distribution-based approaches, experts recommend that users apply multiple methods and triangulation to determine the MID.304,351,360 Moreover, the final selection of MID values should be based on systematic review and an evaluation process such as the Delphi method.304 MID values should also be informed by a stakeholder consensus, which includes patient engagement and input, about the extent of change considered to be meaningful. For example, in some cases, the desired outcome may be scores over time, such as in the case of interventions designed to preserve and prevent declines in functioning. Consequently, the specific application of the PRO will inform the MID values, particularly when considering the contrasts between interventions for acute clinical conditions and interventions or support for long-term or chronic conditions.

When considering MIDs for PROMs, evaluators should not apply a single MID to all situations. MIDs may vary by population and by context.304 Consequently, those reporting such data should provide a range around the MID, rather than just a single MID value.356 Finally, because the criteria for assessing clinically important change in individuals do not directly translate to evaluating clinically important group differences,317 a useful strategy is to calculate the proportion of patients who experience a clinically significant change.280,317

Essential Conditions to Integrate PROMs Into the Electronic Health Record

General Considerations for Health Information Technology

Health information technology (HIT) has the potential to enable dramatic transformation in health care delivery. To date, however, the empirical research evidence base supporting its benefits is limited.361

E-health refers to health-related Internet applications that deliver a range of content, connectivity, and clinical care.11 This includes health information, online formularies, prescription refills, appointment scheduling, test results, advance care planning and health care proxy designation, and physician-patient communication.362 Patient-centered e-health (PCEH) is an emerging discipline that is defined as the combination of three themes:363

  • Patient focus: PCEH applications are developed primarily based on needs and perspectives of patients.

  • Patient activity: PCEH application designs assume that patients can participate meaningfully in providing and consuming information about, and of interest to, them.

  • Patient empowerment: PCEH applications assume that patients want to, and are able to, control far-ranging aspects of their health care via a PCEH application.

Although e-health applications have become common, they tend to focus on the needs of health care providers and organizations. Patients desire a range of services to be brought online by their own health care providers.364 However, little evidence is available as to whether the services offered by providers are services that patients desire.12 One important consideration is that providers attend to patient acceptability factors.12,365

Measuring PROMs will constitute an important aspect of future stages of “meaningful use” of electronic health records (EHRs).366,367 Access can be enhanced by allowing entry directly from commonly used devices such as smartphones. Enabling clinical decision support by providing structured data directly into EHRs will permit PROMs to be used for (1) tracking patient progress over time or (2) through individual question responses, driving change in care plans or care processes concurrently, thus improving outcomes over time. The use of a standardized instrument registered in an established code system (e.g., LOINC [Logical Observation Identifier Names and Codes]) enables EHRs to incorporate the instrument as an observation with a known set of responses using standard terminology (SNOMED-CT [Systematized Nomenclature of Medicine—Clinical Terms]) or numerical responses. Each question in the standardized instrument can also be coded (structured) to drive changes based on those responses. Unfortunately, in an updated systematic review of HIT studies published between 2004 and 2007, PROMs were not mentioned at all.362

The passage of the Health Information Technology for Economic and Clinical Health (HITECH) Act creates a mix of incentives and penalties that will induce a large proportion of physicians and hospitals to move toward EHR systems by the end of the 2010s.368 The discussion should now focus on whether HIT will support the models of care delivery that will help achieve broader policy goals: safer, more effective, and more efficient care.

Three features of EHRs are critical to enable accountable care organizations to succeed: interoperability and widespread health information exchange; automated, real-time quality and cost measurement; and smarter analytic capacities. Having a complete picture of the patient’s care is a critical start, yet most EHRs are not interoperable and have limited data-sharing capabilities.369 In summary, important issues include (1) the patient perspective (patients want to be involved “as a participant and partner in the flow of information” relating to their own health care);370 (2) clinical buy-in; (3) compatibility with clinical flow; and (4) meaningful use.

Examples of PROMs in Electronic Health Record Applications

Health care centers are beginning to implement ways to use patient-reported information (the voice of the patient) to provide higher quality care.371 Three recent case studies (two in the United States and one in Sweden) are particularly informative, because they illustrate lessons learned about such initiatives.371

The Dartmouth Spine Center collects health survey data from patients before each visit, either at home or in the clinic. Analysts summarize the data in a report and make it available for use by patients and clinicians to develop or modify care plans and to monitor results over time to guide treatment decisions. Longitudinal changes are incorporated into the report with each new assessment. At Group Health Cooperative in the State of Washington, an electronic health risk assessment has been integrated with the EHR. Patients can complete PROMs, make appointments, fill prescriptions, review health benefits, communicate with their providers, and get vetted health information. Customized reports are available to patients and providers. The Karolinska University Hospital in Stockholm, Sweden, developed a Swedish Rheumatology Quality registry in 1995 to improve the quality and value of care for people suffering from arthritis and other rheumatic diseases. Beginning in 2003, its web-based system replaced paper forms. The system uses real-time data provided by patients, clinicians, and diagnostic tests. Longitudinal summaries of PROMs and other health information are incorporated into graphical reports that are available to patients and providers.

Both patients and clinicians have generally favorable reactions to the patient-reported measurement systems implemented in these three very different health care settings. The information gathered helps to support patient-centered care by focusing attention on the health issues and outcomes that are important to patients. Although both patients and clinicians acknowledge that using PROMs takes extra time for data collection, both groups report that it makes the care more effective and efficient. Key design principles to successful use of patient-reported measurement systems include fitting PROMs into the flow of care, designing the systems with stakeholder engagement, merging data with other types of data (clinician reports, medical records, claims), and engaging in continuous improvement of the systems based on users’ experiences and new technology.

Other examples include use of PROMs in managing advanced cancer where the primary goals of care are to maximize symptom management and minimize treatment toxicity. Clinicians and patients often base treatment decisions on informal assessments of HRQL. Integrating formal HRQL assessment into treatment decision making can improve patient-centered care for cancer patients with advanced disease. Computer-based assessment can reduce patient and administrative burden while enabling real-time scoring and presentation of HRQL data. Two pilot studies conducted with patients with advanced lung cancer reported that the computer technology was acceptable and feasible for patients and physicians.167,372 Patients felt that the HRQL questionnaire helped them focus on issues to discuss with their physicians, and physicians indicated that the HRQL report helped them to evaluate patient responses over time.

A new initiative in the Robert H. Lurie Comprehensive Cancer Center at Northwestern University involves developing and implementing patient-reported symptom assessment in gynecologic oncology clinics. Before their clinic visits, outpatients complete instruments measuring fatigue, pain, physical function, depression, and anxiety through the EHR patient communication portal at home or in the clinic using an iPad. Results immediately populate the EHR. Severe symptoms trigger EHR notifications to providers. The EHR also provides automated triage for psychosocial and nutritional care when indicated.

Throughout this monograph, we have recommended several criteria that researchers and evaluators can use when assessing the appropriateness of a PROM for measuring quality of care and performance; Table 4 summarized critical points. Given that PROMs are not yet in widespread use in clinical practice, little is known about how best to aggregate these patient-level outcomes for measuring the quality of care or performance of the health care entity. Despite this limitation, accommodating the needs of patients with diverse linguistic, cultural, educational, and functional skills calls for evidence about the equivalence of multiple methods and modes of questionnaire administration. Additionally, scoring, analyzing, and reporting PRO response data all need to be user-friendly and understandable to clinicians for real-time use in clinical settings. Moreover, the timing of measurement must include administration before therapeutic interventions to allow for measuring responsiveness to change, doing risk adjustment, and screening patients for clinical intervention

To illustrate the application of these recommended characteristics when evaluating the appropriateness of a PROM for these purposes, Table 4 included one illustration of these points related to determining the success of total hip arthroplasty. Total hip arthroplasty has emerged as an acceptable surgical treatment for individuals experiencing intractable pain and severe functional impairments for whom conservative treatment has yielded minimal improvement.373–376 The most common indication for total hip arthroplasty is joint deterioration secondary to osteoarthritis.377 Consequently, the aging of the population is likely to raise demand for both primary total hip arthroplasty and revision procedures.378–380

PROs have increasingly been included alongside more traditional indices of surgical outcome such as morbidity and mortality when evaluating the success of total hip arthroplasty. With the expanding focus on patient-reported outcomes, such as functioning and quality of life, numerous, diverse PROMs have been developed and applied in measuring total hip arthroplasty outcomes.377 Thus, this intervention provides a relevant context in which to review the use of recommended characteristics in the selection of PROMs, specifically with the characteristics of the WOMAC, a PROM developed to examine pain, stiffness, and physical function in individuals with osteoarthritis.307

What is a list of questions developed by a researcher that can be administered in paper or online forms?

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

Which research method uses past records or data sets to answer various questions or to search for interesting patterns or relationships?

1.5: Archival Research Instead, they use existing records to answer various research questions. This type of research approach is known as archival research. Archival research relies on looking at past records or data sets to look for interesting patterns or relationships.

What type of research method studies participants over a period of time?

A longitudinal study is a type of correlational research study that involves looking at variables over an extended period of time. This research can take place over a period of weeks, months, or even years. In some cases, longitudinal studies can last several decades.

Which research method studies behavior of one person or a few people in depth?

In observational research, scientists are conducting a clinical or case study when they focus on one person or just a few individuals.