An impact evaluation provides information about the impacts produced by an intervention - positive and negative, intended and unintended, direct and indirect. This means that an impact evaluation must establish what has been the cause of observed changes (in this case ‘impacts’) referred to as causal attribution (also referred to as causal inference). Show
If an impact evaluation fails to systematically undertake causal attribution, there is a greater risk that the evaluation will produce incorrect findings and lead to incorrect decisions. For example, deciding to scale up when the programme is actually ineffective or effective only in certain limited situations, or deciding to exit when a programme could be made to work if limiting factors were addressed. 1 What is impact evaluation?An impact evaluation provides information about the impacts produced by an intervention. The intervention might be a small project, a large programme, a collection of activities, or a policy. Many development agencies use the definition of impacts provided by the Organisation for Economic Co-operation and Development – Development Assistance Committee: “positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended.” (OECD-DAC 2010). This definition implies that impact evaluation:
2 Why do impact evaluation?An impact evaluation can be undertaken to improve or reorient an intervention (i.e., for formative purposes) or to inform decisions about whether to continue, discontinue, replicate or scale up an intervention (i.e., for summative purposes). While many formative evaluations focus on processes, impact evaluations can also be used formatively if an intervention is ongoing. For example, the findings of an impact evaluation can be used to improve implementation of a programme for the next intake of participants by identifying critical elements to monitor and tightly manage. Most often, impact evaluation is used for summative purposes. Ideally, a summative impact evaluation does not only produce findings about ‘what works’ but also provides information about what is needed to make the intervention work for different groups in different settings. 3 When to do impact evaluation?An impact evaluation should only be undertaken when its intended use can be clearly identified and when it is likely to be able to produce useful findings, taking into account the availability of resources and the timing of decisions about the intervention under investigation. An evaluability assessment might need to be done first to assess these aspects. Prioritizing interventions for impact evaluation should consider: the relevance of the evaluation to the organisational or development strategy; its potential usefulness; the commitment from senior managers or policy makers to using its findings; and/or its potential use for advocacy or accountability requirements. It is also important to consider the timing of an impact evaluation. When conducted belatedly, the findings come too late to inform decisions. When done too early, it will provide an inaccurate picture of the impacts (i.e., impacts will be understated when they had insufficient time to develop or overstated when they decline over time).
4 Who to engage in the evaluation process?Regardless of the type of evaluation, it is important to think through who should be involved, why and how in each step of the evaluation process to develop an appropriate and context-specific participatory approach. Participation can occur at any stage of the impact evaluation process: in deciding to do an evaluation, in its design, in data collection, in analysis, in reporting and, also, in managing it. Being clear about the purpose of participatory approaches in an impact evaluation is an essential first step towards managing expectations and guiding implementation. Is the purpose to ensure that the voices of those whose lives should have been improved by the programme or policy are central to the findings? Is it to ensure a relevant evaluation focus? Is it to hear people’s own versions of change rather than obtain an external evaluator’s set of indicators? Is it to build ownership of a donor-funded programme? These, and other considerations, would lead to different forms of participation by different combinations of stakeholders in the impact evaluation. The underlying rationale for choosing a participatory approach to impact evaluation can be either pragmatic or ethical, or a combination of the two. Pragmatic because better evaluations are achieved (i.e., better data, better understanding of the data, more appropriate recommendations, better uptake of findings); ethical because it is the right thing to do (i.e., people have a right to be involved in informing decisions that will directly or indirectly affect them, as stipulated by the UN human rights-based approach to programming). Participatory approaches can be used in any impact evaluation design. In other words, they are not exclusive to specific evaluation methods or restricted to quantitative or qualitative data collection and analysis. The starting point for any impact evaluation intending to use participatory approaches lies in clarifying what value this will add to the evaluation itself as well as to the people who would be closely involved (but also including potential risks of their participation). Three questions need to be answered in each situation: (1) What purpose will stakeholder participation serve in this impact evaluation?; (2) Whose participation matters, when and why?; and, (3) When is participation feasible? Only after addressing these, can the issue of how to make impact evaluation more participatory be addressed. For more information, see:
5 How to plan and manage an impact evaluation?Like any other evaluation, an impact evaluation should be planned formally and managed as a discrete project, with decision-making processes and management arrangements clearly described from the beginning of the process. Planning and managing include:
Determining causal attribution is a requirement for calling an evaluation an impact evaluation. The design options (whether experimental, quasi-experimental, or non-experimental) all need significant investment in preparation and early data collection, and cannot be done if an impact evaluation is limited to a short exercise conducted towards the end of intervention implementation. Hence, it is particularly important that impact evaluation is addressed as part of an integrated monitoring, evaluation and research plan and system that generates and makes available a range of evidence to inform decisions. This will also ensure that data from other M&E activities such as performance monitoring and process evaluation can be used, as needed. For more information, see:
6 What methods can be used to do impact evaluation?Framing the boundaries of the impact evaluationThe evaluation purpose refers to the rationale for conducting an impact evaluation. Evaluations that are being undertaken to support learning should be clear about who is intended to learn from it, how they will be engaged in the evaluation process to ensure it is seen as relevant and credible, and whether there are specific decision points around where this learning is expected to be applied. Evaluations that are being undertaken to support accountability should be clear about who is being held accountable, to whom and for what. Evaluation relies on a combination of facts and values (i.e., principles, attributes or qualities held to be intrinsically good, desirable, important and of general worth such as ‘being fair to all’) to judge the merit of an intervention (Stufflebeam 2001). Evaluative criteria specify the values that will be used in an evaluation and, as such, help to set boundaries. Many impact evaluations use the standard OECD-DAC criteria (OECD-DAC accessed 2015):
The OECD-DAC criteria reflect the core principles for evaluating development assistance (OECD-DAC 1991) and have been adopted by most development agencies as standards of good practice in evaluation. Other, commonly used evaluative criteria are about equity, gender equality, and human rights. And, some are used for particular types of development interventions such humanitarian assistance such as: coverage, coordination, protection, coherence. In other words, not all of these evaluative criteria are used in every evaluation, depending on the type of intervention and/or the type of evaluation (e.g., the criterion of impact is irrelevant to a process evaluation). Evaluative criteria should be thought of as ‘concepts’ that must be addressed in the evaluation. They are insufficiently defined to be applied systematically and in a transparent manner to make evaluative judgements about the intervention. Under each of the ‘generic’ criteria, more specific criteria such as benchmarks and/or standards* – appropriate to the type and context of the intervention – should be defined and agreed with key stakeholders. The evaluative criteria should be clearly reflected in the evaluation questions the evaluation is intended to address. *A benchmark or index is a set of related indicators that provides for meaningful, accurate and systematic comparisons regarding performance; a standard or rubric is a set of related benchmarks/indices or indicators that provides socially meaningful information regarding performance. Defining the key evaluation questions (KEQs) the impact evaluation should addressImpact evaluations should be focused around answering a small number of high-level key evaluation questions (KEQs) that will be answered through a combination of evidence. These questions should be clearly linked to the evaluative criteria. For example:
A range of more detailed (mid-level and lower-level) evaluation questions should then be articulated to address each evaluative criterion in detail. All evaluation questions should be linked explicitly to the evaluative criteria to ensure that the criteria are covered in full. The KEQs also need to reflect the intended uses of the impact evaluation. For example, if an evaluation is intended to inform the scaling up of a pilot programme, then it is not enough to ask ‘Did it work?’ or ‘What were the impacts?’. A good understanding is needed of how these impacts were achieved in terms of activities and supportive contextual factors to replicate the achievements of a successful pilot. Equity concerns require that impact evaluations go beyond simple average impact to identify for whom and in what ways the programmes have been successful. Within the KEQs, it is also useful to identify the different types of questions involved – descriptive, causal and evaluative.
Impact evaluations must have credible answers to all of these questions. For more information, see:
Defining impactsImpacts are usually understood to occur later than, and as a result of, intermediate outcomes. For example, achieving the intermediate outcomes of improved access to land and increased levels of participation in community decision-making might occur before, and contribute to, the intended final impact of improved health and well-being for women. The distinction between outcomes and impacts can be relative, and depends on the stated objectives of an intervention. It should also be noted that some impacts may be emergent, and thus, cannot be predicted. For more information, see:
Defining success to make evaluative judgementsEvaluation, by definition, answers evaluative questions, that is, questions about quality and value. This is what makes evaluation so much more useful and relevant than the mere measurement of indicators or summaries of observations and stories. In any impact evaluation, it is important to define first what is meant by ‘success’ (quality, value). One way of doing so is to use a specific rubric that defines different levels of performance (or standards) for each evaluative criterion, deciding what evidence will be gathered and how it will be synthesized to reach defensible conclusions about the worth of the intervention. At the very least, it should be clear what trade-offs would be appropriate in balancing multiple impacts or distributional effects. Since development interventions often have multiple impacts, which are distributed unevenly, this is an essential element of an impact evaluation. For example, should an economic development programme be considered a success if it produces increases in household income but also produces hazardous environment impacts? Should it be considered a success if the average household income increases but the income of the poorest households is reduced? To answer evaluative questions, what is meant by ‘quality’ and ‘value’ must first be defined and then relevant evidence gathered. Quality refers to how good something is; value refers to how good it is in terms of the specific situation, in particular taking into account the resources used to produce it and the needs it was supposed to address. Evaluative reasoning is required to synthesize these elements to formulate defensible (i.e., well-reasoned and well evidenced) answers to the evaluative questions. Evaluative reasoning is a requirement of all evaluations, irrespective of the methods or evaluation approach used. An evaluation should have a limited set of high-level questions which are about performance overall. Each of these KEQs should be further unpacked by asking more detailed questions about performance on specific dimensions of merit and sometimes even lower-level questions. Evaluative reasoning is the process of synthesizing the answers to lower- and mid-level questions into defensible judgements that directly answer the high-level questions. For more information, see:
Using a theory of changeEvaluations produce stronger and more useful findings if they not only investigate the links between activities and impacts but also investigate links along the causal chain between activities, outputs, intermediate outcomes and impacts. A ‘theory of change’ that explains how activities are understood to produce a series of results that contribute to achieving the ultimate intended impacts, is helpful in guiding causal attribution in an impact evaluation. A theory of change should be used in some form in every impact evaluation. It can be used with any research design that aims to infer causality, it can use a range of qualitative and quantitative data, and provide support for triangulating the data arising from a mixed methods impact evaluation. When planning an impact evaluation and developing the terms of reference, any existing theory of change for the programme or policy should be reviewed for appropriateness, comprehensiveness and accuracy, and revised as necessary. It should continue to be revised over the course of the evaluation should either the intervention itself or the understanding of how it works – or is intended to work – change. Some interventions cannot be fully planned in advance, however – for example, programmes in settings where implementation has to respond to emerging barriers and opportunities such as to support the development of legislation in a volatile political environment. In such cases, different strategies will be needed to develop and use a theory of change for impact evaluation (Funnell and Rogers 2012). For some interventions, it may be possible to document the emerging theory of change as different strategies are trialled and adapted or replaced. In other cases, there may be a high-level theory of how change will come about (e.g., through the provision of incentives) and also an emerging theory about what has to be done in a particular setting to bring this about. Elsewhere, its fundamental basis may revolve around adaptive learning, in which case the theory of change should focus on articulating how the various actors gather and use information together to make ongoing improvements and adaptations. A theory of change can support an impact evaluation in several ways. It can identify:
The evaluation may confirm the theory of change or it may suggest refinements based on the analysis of evidence. An impact evaluation can check for success along the causal chain and, if necessary, examine alternative causal paths. For example, failure to achieve intermediate results might indicate implementation failure; failure to achieve the final intended impacts might be due to theory failure rather than implementation failure. This has important implications for the recommendations that come out of an evaluation. In cases of implementation failure, it is reasonable to recommend actions to improve the quality of implementation; in cases of theory failure, it is necessary to rethink the whole strategy for achieving impacts. For more information, see:
Deciding the evaluation methodologyThe evaluation methodology sets out how the key evaluation questions (KEQs) will be answered. It specifies designs for causal attribution, including whether and how comparison groups will be constructed, and methods for data collection and analysis. ➟ Strategies and designs for determining causal attributionCausal attribution is defined by OECD-DAC as: “Ascription of a causal link between observed (or expected to be observed) changes and a specific intervention.” (OECD_DAC 2010) This definition does not require that changes are produced solely or wholly by the programme or policy under investigation (UNEG 2013). In other words, it takes into consideration that other causes may also have been involved, for example, other programmes/policies in the area of interest or certain contextual factors (often referred to as ‘external factors’). There are three broad strategies for causal attribution in impact evaluations:
Using a combination of these strategies can usually help to increase the strength of the conclusions that are drawn. There are three design options that address causal attribution:
Some individuals and organisations use a narrower definition of impact evaluation, and only include evaluations containing a counterfactual of some kind. These different definitions are important when deciding what methods or research designs will be considered credible by the intended user of the evaluation or by partners or funders. For more information, see:
➟ Data collection, management and analysis approachWell-chosen and well-implemented methods for data collection and analysis are essential for all types of evaluations. Impact evaluations need to go beyond assessing the size of the effects (i.e., the average impact) to identify for whom and in what ways a programme or policy has been successful. What constitutes ‘success’ and how the data will be analysed and synthesized to answer the specific key evaluation questions (KEQs) must be considered up front as data collection should be geared towards the mix of evidence needed to make appropriate judgements about the programme or policy. In other words, the analytical framework – the methodology for analysing the ‘meaning’ of the data by looking for patterns in a systematic and transparent manner – should be specified during the evaluation planning stage. The framework includes how data analysis will address assumptions made in the programme theory of change about how the programme was thought to produce the intended results. In a true mixed methods evaluation, this includes using appropriate numerical and textual analysis methods and triangulating multiple data sources and perspectives in order to maximize the credibility of the evaluation findings. Start the data collection planning by reviewing to what extent existing data can be used. After reviewing currently available information, it is helpful to create an evaluation matrix (see below) showing which data collection and analysis methods will be used to answer each KEQ and then identify and prioritize data gaps that need to be addressed by collecting new data. This will help to confirm that the planned data collection (and collation of existing data) will cover all of the KEQs, determine if there is sufficient triangulation between different data sources and help with the design of data collection tools (such as questionnaires, interview questions, data extraction tools for document review and observation tools) to ensure that they gather the necessary information. Evaluation matrix: Matching data collection to key evaluation questions
There are many different methods for collecting data. Although many impact evaluations use a variety of methods, what distinguishes a ’mixed methods evaluation’ is the systematic integration of quantitative and qualitative methodologies and methods at all stages of an evaluation (Bamberger 2012). A key reason for mixing methods is that it helps to overcome the weaknesses inherent in each method when used alone. It also increases the credibility of evaluation findings when information from different data sources converges (i.e., they are consistent about the direction of the findings) and can deepen the understanding of the programme/policy, its effects and context (Bamberger 2012). Good data management includes developing effective processes for: consistently collecting and recording data, storing data securely, cleaning data, transferring data (e.g., between different types of software used for analysis), effectively presenting data and making data accessible for verification and use by others. The particular analytic framework and the choice of specific data analysis methods will depend on the purpose of the impact evaluation and the type of KEQs that are intrinsically linked to this. For answering descriptive KEQs, a range of analysis options is available, which can largely be grouped into two key categories: options for quantitative data (numbers) and options for qualitative data (e.g., text). For answering causal KEQs, there are essentially three broad approaches to causal attribution analysis: (1) counterfactual approaches; (2) consistency of evidence with causal relationship; and (3) ruling out alternatives (see above). Ideally, a combination of these approaches is used to establish causality. For answering evaluative KEQs, specific evaluative rubrics linked to the evaluative criteria employed (such as the OECD-DAC criteria) should be applied in order to synthesize the evidence and make judgements about the worth of the intervention (see above). For more information, see:
7 How can the findings be reported and their use supported?The evaluation report should be structured in a manner that reflects the purpose and KEQs of the evaluation. In the first instance, evidence to answer the detailed questions linked to the OECD-DAC criteria of relevance, effectiveness, efficiency, impact and sustainability, and considerations of equity, gender equality and human rights should be presented succinctly but with sufficient detail to substantiate the conclusions and recommendations. The specific evaluative rubrics should be used to ‘interpret’ the evidence and determine which considerations are critically important or urgent. Evidence on multiple dimensions should subsequently be synthesized to generate answers to the high-level evaluative questions. The structure of an evaluation report can do a great deal to encourage the succinct reporting of direct answers to evaluative questions, backed up by enough detail about the evaluative reasoning and methodology to allow the reader to follow the logic and clearly see the evidence base. The following recommendations will help to set clear expectations for evaluation reports that are strong on evaluative reasoning:
For more information, see:
ResourcesOverviews / introductions to impact evaluationUNICEF Impact Evaluation Methodological Briefs and Videos: Overview briefs (1,6,10) are available in English, French and Spanish and supported by whiteboard animation videos in three languages; Brief 7 (RCTs) also includes a video. UNICEF-BetterEvaluation Impact Evaluation Webinar Series Throughout 2015, BetterEvaluation partnered with the UNICEF Office of Research – Innocenti to develop eight impact evaluation webinars for UNICEF staff. The objective was to provide an interactive capacity-building experience, customized to focus on UNICEF’s work and the unique circumstances of conducting impact evaluations of programs and policies in international development. The webinars were based on the Impact Evaluation Series – a user-friendly package of 13 methodological briefs and four animated videos – and presented by the briefs' authors. Each page provides links not only to the eight webinars, but also to the practical questions and their answers which followed each webinar presentation.
Discussion papers
GuidesInterAction Impact Evaluation Guidance Notes and Webinar Series:
Methods Lab Publications The Methods Lab was an action-learning collaboration between the Overseas Development Institute (ODI), BetterEvaluation (BE) and the Australian Department of Foreign Affairs and Trade (DFAT) conducted during 2012-2015. The Methods Lab sought to develop, test, and institutionalise flexible approaches to impact evaluations. It focused on interventions which are harder to evaluate because of their diversity and complexity or where traditional impact evaluation approaches may not be feasible or appropriate, with the broader aim of identifying lessons with wider application potential. See more information here. The Methods Lab produced several guidance documents including: Realist impact evaluation: an introduction - This guide explains when a realist impact evaluation may be most appropriate or feasible for evaluating a particular programme or policy, and outlines how to design and conduct an impact evaluation based on a realist approach. Read more. Addressing gender in impact evaluation - This paper is a resource for practitioners and evaluators who want to include a genuine focus on gender impact when commissioning or conducting evaluations. Read more. Evaluability assessment for impact evaluation - This guide provides an overview of the utility of and specific guidance and a tool for implementing an evaluability assessment before an impact evaluation is undertaken. Read more. When and how to develop an impact-oriented monitoring and evaluation system - Many development programme staff have had the experience of commissioning an impact evaluation towards the end of a project or programme only to find that the monitoring system did not provide adequate data about implementation, context, baselines or interim results. This guidance note has been developed in response to this common problem. Read more. Additional guidance documents can be found here Page contributorsThe content for this page was compiled by: Greet Peersman The content is based on ‘UNICEF Methodological Briefs for Impact Evaluation’, a collaborative project between the UNICEF Office of Research – Innocenti, BetterEvaluation, RMIT University and the International Initiative for Impact Evaluation (3ie).The briefs were written by (in alphabetical order): E. Jane Davidson, Thomas de Hoop, Delwyn Goodrick, Irene Guijt, Bronwen McDonald, Greet Peersman, Patricia Rogers, Shagun Sabarwal, Howard White. ReferencesBamberger M (2012). Introduction to Mixed Methods in Impact Evaluation. Guidance Note No. 3. Washington DC: InterAction. See: https://www.interaction.org/blog/impact-evaluation-guidance-note-and-webinar-series/ Funnell S and Rogers P (2012). Purposeful Program Theory: Effective Use of Logic Models and Theories of Change. San Francisco: Jossey-Bass/Wiley. OECD-DAC (1991). Principles for Evaluation of Development Assistance. Paris: Organisation for Economic Co-operation and Development – Development Assistance Committee (OECD-DAC). See: http://www.oecd.org/dac/evaluation/50584880.pdf OEDC-DAC (2010). Glossary of Key Terms in Evaluation and Results Based Management. Paris: Organisation for Economic Co-operation and Development – Development Assistance Committee (OEDC-DAC). See: http://www.oecd.org/development/peer-reviews/2754804.pdf OECD-DAC (accessed 2015). Evaluation of development programmes. DAC Criteria for Evaluating Development Assistance. Organisation for Economic Co-operation and Development – Development Assistance Committee (OECD-DAC). See: http://www.oecd.org/dac/evaluation/daccriteriaforevaluatingdevelopmentassistance.htm Stufflebeam D (2001). Evaluation values and criteria checklist. Kalamazoo: Western Michigan University Checklist Project. See: https://www.dmeforpeace.org/resource/evaluation-values-and-criteria-checklist/ UNEG (2013). Impact Evaluation in UN Agency Evaluation Systems: Guidance on Selection, Planning and Management. Guidance Document. New York: United Nations Evaluation Group (UNEG) . See: http://www.uneval.org/papersandpubs/documentdetail.jsp?doc_id=1434 Cite this page |