Tracking and Evaluation of Research, Development, and Demonstration Programs at the US Department of Energy (2024)

Program tracking and evaluation are carried out by all agencies to varying degrees, partly based on legislative directives, such as the Evidence Act, specific programs, such as the Small Business Innovation and Research (SBIR) Program, Small Business Reauthorization Act of 2000, HR 5667. Pub. L. 106-554, 106th Congress, https://www.congress.gov/bill/106th-congress/house-bill/5667. the executive branch efforts, and an agency’s own commitment to success and accountability. This section discusses the evaluation landscape, including recent efforts, such as the Evidence Act and the policy response from DOE and, specifically, its Office of Energy Efficiency and Renewable Energy (EERE). We also discuss the work of peer agencies, such as the Department of Health and Human Services (HHS) and Environmental Protection Agency (EPA), which both have a long history of conducting evaluations. Finally, we extract lessons learned from these peer agencies and consider how DOE can work to institutionalize a culture of evaluation.

3.1. The Evidence Act and Administration Support

The Evidence Act US Congress, The Foundations for Evidence-Based Policymaking Act of 2018, HR 4174, Pul L 115-435, 115th Congress, signed into law in January 2019 is a recent installment in a series of legislative and executive branch efforts to promote program evaluation and evidence-building across the federal government (GAO 2021). Passed in 2019, the act is still being implemented. Each agency has to designate three senior officials in charge of promoting evaluation and data governance: an evaluation officer, a chief data officer, and a statistical official. All agencies must produce an annual evaluation plan and develop a strategic approach to evidence-building. Recognizing that capacity differs by agency, the act also requires publishing capacity assessments, which allow researchers and decisionmakers to take stock of the evaluation efforts and drive further research.

The current administration, through OMB (represented by Danielle Berman, who discussed OMB’s role in implementing this Title I of the Evidence Act at our workshop), has supported evidence-based policymaking by publishing guidance documents for agencies (OMB Circular No. A-11 Section 290, M-21-27, M-20-12, M-19-23, and M-22-12). These documents describe the value and purpose of agency-wide learning agendas and annual evaluation plans and encourage using the most rigorous methods appropriate for the evidence need. The Biden–Harris administration is especially dedicated to evidence-based policymaking through government-wide support to facilitate developing an evaluation culture, including professional development programs, technical assistance, and community-building efforts.

Based on comments by speakers and documents from the following agencies, we consider the evaluation efforts by the EERE because it is widely considered to be the agency’s leading office for evaluation. We also consider HHS and EPA evaluation efforts because their relatively well-developed culture could provide lessons for DOE. Finally, we discuss building an evaluation culture at DOE in the last part of this section.

3.2. DOE and EERE Efforts

In DOE’s fiscal year (FY) 2024 Evaluation Plan (DOE 2022), required under the Evidence Act and OMB guidelines, program evaluation is presented as key to managing a large portfolio of dissimilar programs and informing crucial decisions on planning and budget. However, DOE does not have an all-of-agency evaluation strategy. Rather, its plan focuses on processes and support at the agency level and delegates evaluation to functional offices and program managers.

DOE already uses a variety of evaluations to assess different aspects of its programs and offices. Although peer reviews (a form of process evaluation) have become common practice in most offices, we find this not true for systematic impact evaluations of RD&D programs’ effectiveness and efficiency, and agency-wide capacity is insufficient to conduct them. Between 2016 and 2021, 156 process evaluations (an average of 27 per year) and only 16 impact evaluations were completed (Dowd 2023). As new offices are created, such as the Office Clean Energy Demonstrations and Office of Manufacturing and Energy Supply Chains, it is important that programs are managed with future evaluations in mind.

Jeff Dowd, EERE’s lead program evaluator, described its efforts at the workshop. According to members of the evaluation community we contacted in other agencies, EERE has a relatively good record among federal agencies, particularly technology agencies, for tracking and evaluating projects and programs. According to EERE, key projects must be peer-reviewed every two years by an independent panel of experts. EERE also encourages technology offices to conduct impact evaluations assessing the causal effects of programs and outcomes against planned goals. However, offices have varied track records in conducting impact evaluations. EERE has been performing more impact evaluations over time with increasing rigor, including use of peer-reviewed methods. Most use quasi-experimental or nonexperimental methods, with some based on expert elicitations

Participants from across DOE discuss these efforts in monthly Evaluation Community of Practice meetings that share best practices among evaluators and program managers. According to Dowd, EERE also plans to increase its institutional capacity for impact evaluation by hiring new federal staff with relevant expertise and incentivizing technology offices to improve their capacity. EERE’s strategy also has an increased focus on embedding evaluation in program planning, execution, and decisionmaking by allocating more funds for impact evaluations, establishing new guidelines for quasi-experimental methods, and communicating the results. This strategy intends to build an evidence culture within EERE and could be spread to other parts of DOE. Although EERE has extensive data systems for tracking funding, project awards, and progress during the contract period, evaluators still face significant constraints when capturing outputs and outcomes, especially in the longer term, and these are critical to determine impacts. Establishing a data infrastructure to better support evaluation efforts is one element of EERE’s broader evaluation capacity-building strategy. Multiple efforts are ongoing to track programs’ outputs and outcomes from EERE’s investments; the data collected would help conduct and improve impact evaluations. Notable efforts include developing an evaluation data platform and tracking patent data and commercial technologies enabled by EERE investments.

3.2.1. EERE Evaluation Data Platform

Since 2022, EERE has been developing a new data infrastructure to directly support impact evaluation. The EERE Evaluation Data Platform Overview of Evaluation Data Platform, 2023, presentation available upon request from EERE. is a data repository and impact metrics reporting system that combines internal funding and project data (on awardees, contracts, project performance, etc.), external data (from non-DOE sources), and analytical tools to generate output, outcome, and impact metric reports and dashboards for EERE business users. It will also deliver project output and outcome data to commissioned evaluators for impact evaluations. The metrics will broaden the coverage of the EERE data systems by building on more than 300 data elements related to intellectual property, inventions and commercial technologies, attracted follow-on funding, energy savings and avoided emissions, and diversity and workforce, among other metrics. To the extent possible, these data collection and integration processes would be automated to better enable EERE to calculate impact metrics or provide data to evaluators to allow them to estimate impacts via appropriate experimental/quasi-experimental research designs. However, data from awardees outside of the contract period may still suffer from a low-response-rate bias that increases with time. By deploying this improved evaluation data infrastructure across the division, EERE hopes to be able to evaluate its results more efficiently and communicate the impacts that its funded RD&D is having across the US economy.

3.2.2. Tracking Commercial Technologies

For several decades, the Pacific Northwest National Lab (PNNL) has been tracking technologies for several EERE technology offices to provide EERE management with information about supported commercial and emerging technologies (Steele and Agyeman, 2021). A PNNL database documents the time to market of commercialized EERE-funded technologies by regularly soliciting projects’ points of contact and collecting data from companies’ websites, scientific publications, and interviewed technology developers after projects are completed, (Steele and Weakley, 2020). This effort, although long-standing and sometimes extensive has yet to be adopted by all EERE offices. The number of technologies tracked depends on the interest of EERE technology offices for information on their performance on commercialization metrics, which leads to some technologies being underrepresented in the database. The version we were able to access represented only a subset of the projects funded by the six technology offices PNNL has worked with: Advanced Manufacturing, Bioenergy Technologies, Buildings Technologies, Fuel Cell Technologies, Geothermal Technologies, Vehicles Technologies, and Wind Energy Technologies. For instance, in 2011, the geothermal office had a portfolio of more than 270 RD&D projects (DOE EERE 2011), but the version of the database we accessed only contained 15 commercial technologies. However, it should be noted that many projects are intended to advance research progress in the field and do not directly result in a commercial technology.

This type of data collection work, in some cases (e.g., when using surveys) can be very resource-intensive, and can suffer from a low-response-rate bias. Since the pandemic, the PNNL team has seen response rates dwindling from close to 100 percent to 75–80 percent when reaching out to projects’ points of contact for updates. This can be explained by difficulties in locating them due to retirements or changes in company staff or an increase in innovators’ confidential business information concerns. Besides, outside of the contracted period, data is provided voluntarily.

Although this data collection effort was not intended to provide data for impact evaluations, the derived commercialization metrics could help answer research questions about the effects of EERE’s innovation programs. The aim was to comply with the Government Performance and Results Act US Congress, Government Performance and Results Act of 1993, HR 826, Pul L 103-62, 103rd Congress, signed into law in August 1993. and the GPRA Modernization Act of 2010 US Congress, GPRA Modernization Act of 2010, HR 2142, Pul L 111-352, 111th Congress, signed into law in January 2011. and assess outcomes of EERE RD&D investments related to technology commercialization. Thus, this data has been leveraged to analyze EERE-funded technologies but not in the context of impact evaluations. One potential improvement to make the data more relevant would be to follow “losing” applicants (e.g., those that were selected but below the funding threshold), which is important for controlling for all the nonprogrammatic reasons for projects’ successes and failures. Another improvement would be to standardize the type of data collected on commercialization and market penetration across projects to facilitate aggregations and comparisons.

3.2.3. Tracking Patent Data

Patents are also targets of data collection efforts as a well-known but partial metric for technological innovation. In a study covering EERE-funded patents, researchers constructed a database containing all DOE grantees’ patents using tools such as the DOE Patents Database and iEdison, an interagency reporting system for recipients of federal funding agreements (DOE EERE 2022). Information on patents from DOE-funded projects can also be retrieved from the Government Interest section of the US Patent and Trademark Office (USPTO) and linked to project data. However, this data can require lengthy processing and verification to link patents with specific programs and funding contracts within DOE, which represents a barrier to a broader use of this information in impact evaluations.

3.3. Evaluation at HHS

Although no other agency is a perfect analogue to DOE, evaluation activities in other agencies might be relevant for clean energy technology innovation programs. For instance, evidence-building activities are well integrated in some agencies, such as HHS, which has an overarching evaluation strategy across programs carried out by multiple independent divisions and offices.

HHS, represented at our workshop by its evaluation officer, Susan Jenkins, has had a strong evaluation culture for decades, having established systemic agency-wide practices, committed to building adequate capacity, and implemented accountability to the evidence produced. Although HHS is decentralized and has varied operating divisions, similar to DOE’s organization, division liaisons attend the agency-level Evidence and Evaluation Council to ensure agency-wide coordination and feedback from the operational offices. This council predates the Evidence Act and includes senior evaluation staff and subject matter experts from each agency. The Office of the Assistant Secretary for Planning and Evaluation (ASPE) is responsible for understanding evaluation, research, and analysis efforts, collating the wide range of such efforts.

The evaluation officer has monthly meetings with strategic planning and performance management staff. Staff in all divisions also have ample training opportunities, ranging from gaining a basic understanding of the “how” and “why” for evaluation to conducting and interpreting the results. Nevertheless, Jenkins identified areas to improve the agency’s evaluation strategy, with some divisions still working to build up the necessary capacity (in both resources and workforce), improve data quality and access, and engage department leaders in the process.

HHS’s evaluation efforts under the Evidence Act encompass its whole organization. Operating divisions formulate individual evaluation plans based on a centralized template and guidance. These plans highlight priority questions, specific programs to be analyzed, data, and methods. Leadership within each of the divisions is responsible for approving the plans before they are compiled into the department-wide evaluation plan by ASPE.

In the department’s latest evaluation plan (HHS ASPE 2022), operating divisions provided examples to the Evidence and Evaluation Policy Council of significant evaluations planned across five priority areas: health care, public health, human services, research and evidence, and strategic management. These evaluations span all the divisions and are in differing stages of execution, yielding 23 evaluations, some in multiple priority areas.

Beyond the evaluation plan, the department is still implementing its learning agenda (HHS 2023), which is a four-year plan outlining the priorities and the methods and data required to answer them. It is also finalizing an update to its FY2023–2026 Capacity Assessment (HHS ASPE 2023), which assesses the “coverage, quality, methods, effectiveness, objectivity, scientific integrity, and balance” of the evaluation portfolio, and, more broadly, of their activities (OMB 2019).

The longstanding focus on evaluations at HHS, which began before the Evidence Act, provides several useful lessons that are relevant to DOE. Most importantly, robust department-wide evaluation work requires buy-in from all operating divisions, especially from agency leadership. Having a template and best practices from the department gives divisions the necessary guidance to craft their individual evaluation plans. Finally, agency-level coordination (i.e., ASPE) can be a valuable resource for taking stock of the agency’s evaluation efforts and identifying overlapping priority areas between divisions and evaluations that could address multiple priority areas.

3.4. Evaluation at EPA

Lessons about program evaluation and evidence-building activities can also be learned from EPA, which has some similar functions to DOE. At EPA, implementation of the Evidence Act is primarily through several priority areas (EPA 2022); the one most similar to DOE’s activities is evidence-building for the grantmaking process. EPA distributes over $4 billion into over 100 grant and other assistance programs each year, with 1,400 employees responsible for managing and tracking these. Similar to DOE, EPA’s system lacks comprehensive tracking, which inhibits its ability to evaluate program’s environmental outcomes.

At the workshop, Katherine Dawes, EPA’s acting evaluation officer, spoke about these challenges in data collection. Indeed, data can be challenging to standardize and consolidate across programs, making it difficult to track collective progress for the organization. In many cases, data may be provided in a variety of ways, including Word documents, PDFs, forms, or emails. Beyond data collection and the challenges in gathering confidential business information and personally identifiable information, the data quality itself may be poor, including issues such as missing or hard-to-find data and differing variable definitions. EPA has begun a three-year process to better understand the challenges with evaluating grants. The initial phase (Year 1) established a baseline to understand the existing grant award and reporting systems. In the second year, the Grant Commitments Workgroup examined specific practices and tools that could effectively track progress toward meeting workplan grant commitments.

Tracking and Evaluation of Research, Development, and Demonstration Programs at the US Department of Energy (1)

The Year 2 report (EPA 2023) has produced several recommendations, some of which may be applicable to DOE. For example, the workgroup suggested categorizing grant programs based on their anticipated results, such as a focus on achieving long-term environmental or human health outcomes or whether these outcomes are predictable (see Table 1). Based on this categorization, the agency can provide additional guidance or support for evaluation activities. Additionally, the Year 2 report recommends creating a flexible storage system to accommodate grants’ variety of data types and reporting schedules. Data types may be wide ranging, from quantitative (with standardized or nonstandardized metrics across grantees) to narrative (which may require an open field to tell a comprehensive story). The database should also allow for centralized document storage and be searchable across data metrics and document text.

In addition to data handling and individual grant programs, the Year 2 report also recommended greater guidance and templates from EPA headquarters on the administration’s priorities. This could range from setting the appropriate metrics for agency priorities to templates for data collection from grantees. Relatedly, the report also recommends greater internal communication. This would be most beneficial between administrative and technical staff to ensure quality and relevance of grantee-reported data and between HQ staff and employees who implement grants to improve understanding of how the data is used to communicate outcomes.

Beyond the recommendations themselves, creating a group at DOE similar to the EPA’s Grant Commitments Workgroup could be useful to improve data storage and reporting. The in-depth surveys and interviews conducted by the workgroup (described in its Years 1 and 2 reports) have helped EPA fine-tune its evaluation policies, creating greater cohesion across the agency. If DOE also seeks to build an evaluation culture across its offices, a similar department-wide initiative to understand inter- and intraoffice challenges could be helpful.

3.5. Institutionalization and Evaluation Culture

At the workshop, several ideas were suggested for institutionalizing evaluations at DOE (or other agencies) drawing from the agency’s own experiences and examples from other agencies, such as HHS and EPA, and guidance from the regulatory space. Joe Aldy (Harvard Kennedy School) presented a working paper on institutionalizing program evaluation by drawing lessons from regulatory agencies and past assessments of clean energy programs. Since 1981, they have been required to estimate benefits and costs of major regulatory proposals as part of their regulatory review process, with a majority of these from energy and environmental regulations. Policymakers can use the evidence generated to build a compelling argument for regulatory updates. Similarly, evaluations of tax incentives or spending programs, such as DOE’s RD&D grantmaking, can “enhance policymaker understanding of the most effective instruments for delivering on clean energy objectives” (Aldy 2022).

As highlighted, the DOE Evaluation Plan (DOE 2022) is limited in the breadth of its goals, as its four objectives mostly focus on the agency’s supporting activities and not the core programs. Yet, as highlighted by Aldy at the workshop, identifying priority outcomes to evaluate is one of the main steps to develop an evaluation strategy. Generally, DOE headquarters delegates evaluation to program managers. This does not build transparency in evaluation planning, which is needed because most of DOE’s technology offices do not have a good track record in evaluating program impact or making the analyses publicly available.

As discussed, DOE can learn from the practices established by other federal agencies, such as HHS and EPA. Both agencies have worked to build practices that standardize procedures for evaluations, and this has been strengthened by the requirement for a learning agenda and annual evaluation plan under the Evidence Act. Such requirements can guide an agency toward outlining the key priorities for evaluation and taking inventory of planned evaluations. For an organization with multiple operating divisions, such as HHS, a central council to encourage communication and information sharing has deepened the culture of learning, similarly to EERE’s Community of Practice. In outlining its evaluation priorities under the Evidence Act, EPA has initiated a multiyear plan to improve evidence-building and data collection around its grantmaking process, conducting organization-wide surveys and interviews. Not only are the recommendations from this process directly relevant to DOE, but a similar review process could identify and remediate the gaps in its own evaluation and data collection practices.

In addition to institutionalizing these practices, DOE could do more to promote a culture of evaluation, with a focus on retrospective analysis and iterative policymaking. Lessons can be learned from the practice of regulatory reviews in federal agencies (Aldy 2022). Regulatory agencies’ practices in this regard offer mixed results and a cautionary tale for DOE. Historically, they have been directed by the White House to review existing rules, but a “failure to meaningfully institutionalize retrospective review, build a culture of such review within agencies, and [allocate] appropriate monies” (Aldy 2022) have hampered their effectiveness. The best-laid plans can come to naught without a strong commitment from the top and middle management to see them through. The Evidence Act is a good start, as it endorses systematic evaluations and evidence-building activities, but history has indicated that such guidance may not be enough to overcome inertia within agencies. The administration’s budget proposal for 2024 (OMB 2023) affirms its commitment to evaluation, although no specific line item is dedicated to building evaluation capacity in DOE, where a large share of IIJA and IRA money will be spent. OMB supports the administration’s goal, but its authority is limited, so its guidance can only go so far in improving RD&D programs.

The barriers to developing evaluation capacities at DOE might be due to a lack of incentives. If the results of evaluations are not used to recommend or justify policy actions, process changes, or program improvements, program managers might see investing in them as a waste of their time and resources. Thus, it is important that DOE’s learning agenda and evaluation plan integrate the results into revisions and improvements. DOE leadership could spearhead communicating evaluations for “policymakers, stakeholders, the media, and the public,” which could create longstanding durable value and demand. Consolidating this information could provide support for annual budget requests to Congress and be a useful resource for funding on the scale of the IIJA and IRA once those spending provisions have sunset.

Another barrier might be the lack of qualified staff to conduct evaluations and initiate social experiments. One step that DOE could take to incentivize more evaluations of programs and processes at the agency level would be to create a position of chief economist, as suggested by Kyle Myers (Harvard Business School) in recommendations for the NIH (Myers 2023). They would have a quantitative background and oversee the overarching evaluation effort and social experiments for the different agency’s programs (we detail the importance of experimentation in the Research Design section). Having such a position at the agency or division level would create a gold standard for evaluations across the offices and programs. Most of these responsibilities could be carried out by the evaluation officer position, which is already required by the Evidence Act. Per OMB (2019) definition, they should have “substantive expertise in evaluation methods and practices” and “maintain principles of scientific integrity throughout the evaluation process.” The specific skillset in social science experiments that a chief economist would bring can be included under the purview of the evaluation officer if the employee has the appropriate expertise. For an organization of DOE’s size, qualified evaluators should be present throughout the agency rather than only at the top; those knowledgeable in social science methods are needed at the divisions level, where they can manage process and impact evaluations for the divisions’ programs. In addition, DOE could involve qualified social scientists to help build experiments and evaluation strategies for programs by leveraging the Intergovernmental Personnel Act (IPA) Mobility Program, as suggested by Myers in our workshop. The IPA allows federal agencies and other public bodies to hire from academia and other eligible institutions on a temporary basis.

Another significant barrier could be the cost and effort needed to access high-quality data for evidence-building activities. A prerequisite to RD&D program evaluation is access to data on applicants to measure outcomes and applicant characteristics. Yet, the FOA process is not designed to collect data for future impact evaluations. In addition, quantitative assessments are hampered by the lack of readily usable databases consolidating and organizing the data at the office or agency level and of available capacity to appropriately analyze the data and develop evidence-building activities. We discuss these data collection challenges in depth.