Executive Summary
This report provides an analysis of the integrity of public access to federal open government data assets during the disruptions to the federal data ecosystem during 2025 and early 2026. Here, data integrity is defined as the “maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle”, of which public access to open government data assets is assumed to be essential. The report clarifies the scale and mechanisms of data disruptions, discusses specific threats to data integrity, highlights exemplar cases of data disruption from federal agencies, and delivers a transparent methodology for reproducing auditing routines used in this assessment. The evidence used in this report was derived from multiple sources, including news reports, academic literature, materials from civic society and government oversight organizations, interviews with experts, archives, and government sources.
Primary findings
- As widely reported in the press and by advocacy organizations, there were 3,000 to 4,000 open government data assets removed from public access in the last year. However, the data relied upon by popular reporting (changes to the Federal Data Catalog (FDC) as provided by Data.gov) are not reliable. Rather, the findings of this report point to significant datasets that were removed that were not widely reported, including many that were not indexed in the FDC.
- The Trump Administration has engaged in large-scale dataset discontinuations using appropriate, lawful routes to end information collections. Between January 21st, 2025 and January 20th, 2026 this Administration had discontinued at least 562 information collections using the Paperwork Reduction Act (PRA) processes, which is 65% more than the Biden Administration had discontinued in the same interval the year prior. While documenting changes to information collections involving data on humans and organizations (i.e., subject to the PRA) is easily facilitated through the public docket and reporting by the Office of Management and Budget, there is no such equivalent docket to track changes to datasets not subject to the PRA (such as climate observation data, for example).
- High-value data tools were taken down by the Administration while public access to their underlying data were largely retained. While many of the data tools, which often provide greater utility to the general public than raw datasets, were restored by civic society groups or government contractors that retained access to them after contract cancellation, their removal from public access by the government adds friction to data users and breaks downstream workflows and applications that relied on the tools for information.
Figure: This figure summarizes the report’s findings on the scale, administrative mechanisms, and verification of disruptions to open government data. The leftmost panel describes the scale of public access loss and recovery of federal data, the central panel describes mechanisms and information gaps in the disruptions to federal data, and the third panel describes both litigation and forensic auditing. The figure is a custom modification of an image generated using Gemini Pro’s Nanobanana.
These findings are discussed in more detail below.
Actual Public Access Removal of Open Government Data
A primary finding of this report is that, while broadly reported estimates of 3,000 to 4,000 datasets being removed from public access are generally accurate in their total count, the evidence cited to support these figures is often technically flawed. Many observers relied on the topline dataset count displayed on Data.gov as a barometer for data integrity. The FDC is not a data repository and the topline count is not sufficient to surmise changes to underlying datasets. The dataset count fluctuates routinely due to normal harvesting cycles. A forensic audit conducted for this project indicates that the net change in the Data.gov topline count of assets in the FDC during the initial months of 2025 was approximately zero despite the observed fluctuations. There are almost 100,000 more datasets indexed in the FDC as of the publication of this report than there where at the height of reports of the changes to the Data.gov topline. Many of the new additions indexed in the FDC have high-value, such as the updated Census geographic shapefiles which represented more than a third of the growth of the catalog.
The estimate of roughly 3,000 to 4,000 datasets removed from public access is more accurately derived by aggregating documented actions across specific cases:
- Office of Management and Budget (OMB): The revocation of public access to the apportionments database resulted in the loss of approximately 1,700 individual apportionment files. These data assets have all been restored by court-order.
- United States Agency for International Development (USAID): The elimination of the Development Data Library removed public access to more than 2,000 data assets. These data assets are largely available through rescue efforts but their federal records lifecyle is currently unknown.
- Department of Health and Human Services (HHS): Approximately 100 datasets had been identified as removed from public access based on lawsuit filings and external monitoring. These data assets have been restored by court-order.
- Targeted Program Removals: Several dozen evaluation datasets removed due to shifts in administration priorities(e.g., Sustainable Development Goals, PEPFAR evaluation studies). The statuses of these data assets are unknown.
Distinction Between Data Tools and Source Data
The report identifies a significant distinction between the removal of public-facing data tools and the removal of the underlying data assets. Throughout 2025, several high-profile data tools and dashboards were eliminated from agency websites. These removals significantly reduced the accessibility and utility of the information for the general public and data users. However, forensic auditing confirms that much of the source data provided by these tools remained accessible to the public through the agency servers or through federal contractors that hold the data on behalf of the originating agencies. The loss of these tools represents a degradation of federal data utility, interpretability, and accessibility rather than data loss. Because of the value these data tools bring to the public good, many of them have been functionally restored by civic society organizations.
Risks to Public Access and Data Integrity
While the direct removal of federal open government data from public access was rare between 2025 and 2026, the findings underscore that the primary risks involved a combination of political intervention and diminished administrative capacity. These risks include the removal of public access to existing assets and tools, errors that lead to metadata, resource constraints, the discontinuation, change, or expiration of information collections through the Paperwork Reduction Act (PRA) process. While certain political and administrative actions that threaten the integrity of federal open government data have been challenged in court (typically, as violations of the Administrative Procedures Act (APA)), much of the disruptions are likely fully legal and represent immutable exercise of Executive authority. When it comes to administrative process to modify information collections under the PRA, for example, it’s likely that such changes were entirely proper, albeit unusual in scale: such as how the Trump Administration discontinued 562 information collections between January 2025 and January 2026, which is a 65 percent increase over the preceding year. Moreover, the lack of a transparent process for adequate public notice of changes to data collections that are exempt from the PRA or the APA (such as many non-human subjects research datasets, clinical research from the National Institutes of Health, and practically all other datasets from the Department of Health and Human Services associated with rulemaking or administrative procedures given the Richardson Waiver rescission) mean that oversight into future actions involving those federal data is potentially limited.
Need for Improved, Holistic Auditing
There are significant blindspots in the federal data ecosystem that obscure monitoring of open governement data assets. While the FDC provided by Data.gov has nearly 500,000 data assets, there are challenges in using that resource for timely auditing. Moreover, despite the large number of indexed data assets, the FDC is not exhaustive of all open government data assets (for instance, the OMB apportionments data and many NIH datasets are not listed in the FDC). Accompanying this report is a replicable methodology for auditing open government data assets and metadata from both agency and FDC sources, including code and associated data. While not an enterprise solution scalable for the entire federal data ecoystem, the auditing workflow provides a framework that could be generalized for such cases.
Recommendations
The report findings highlight that while direct data loss was relatively rare, the integrity of federal information was frequently compromised through administrative discontinuations, public access removals, and severe resourcing constraints. Moreover, forensic auditing revealed significant gaps in both the ability to holistically track federal data assets because of limitations in the FDC and a highly federated ecosystem. To safeguard the role of open government data as a vital public good, proactive measures are necessary across the legislative, executive, and civic sectors. The following recommendations provide a framework based on the findings of this report for Congress to strengthen statutory oversight, for federal agencies to improve transparency and metadata practices, and for outside stakeholders to adopt more rigorous monitoring and archival strategies.
Recommendations for Congress
- Reform Repository Requirements: Amend the Foundations for Evidence-Based Policymaking Act (Evidence Act) to require that the Federal Data Catalog (FDC) or a successor system supports the actual acquisition and storage of high-value datasets rather than merely indexing metadata links, which are currently susceptible to link rot.
- Address Oversight Blindspots: Close the Paperwork Reduction Act (PRA) exemption for the National Institutes of Health (NIH) and similar research entities to ensure all federal data collections adhere to government-wide standards, such as those regarding race and ethnicity data.
- Stabilize Statistical Agency Funding: Provide multi-year, protected funding for the 13 principal statistical agencies (such as BLS, NCES, and EIA) to prevent irrecoverable gaps in historic data series caused by government shutdowns and mass staff departures. These are the data that make the economy run.
- Mandate Transparency for Non-PRA Data: Establish a statutory requirement for a public docket to track changes, removals, or discontinuations of datasets not currently subject to the PRA or APA, such as climate observation and environmental monitoring data.
Recommendations for Federal Agencies
- Prioritize Metadata Fidelity: Ensure that Comprehensive Data Inventories (CDIs) follow DCAT-3.0 metadata schema and include individual downloadURL properties for all distributions rather than just landing pages. This practice reduces user friction and preserves access when website structures change. Mint DOIs for all datasets and ensure that their resolveable, persistent URLs link directly to the asset.
- Adhere to Notification Standards: Follow the guidance in OMB Circular A-130 to provide “adequate notice” before terminating any significant information product, even if the action is not strictly required by the Administrative Procedures Act or the Paperwork Reduction Act.
- Protect Confidential Data Access: Prioritize the retention of specialized staff who manage CIPSEA-protected data and secure enclaves (FSRDCs) to ensure that the paralysis of the restricted data ecosystem seen in 2025—which cut-off critical research—is not repeated.
- Standardize Data Asset Classification: Implement the guidance in M-25-05 to correctly distinguish between actual data assets and information products like infographics or reports, thereby improving the accuracy of the Federal Data Catalog.
Recommendations for Other Stakeholders
- Utilize Replicable Auditing: Employ and fund independent, code-based auditing workflows. Support and leverage tools like the Internet Archive’s Wayback Machine to verify agency data holdings rather than relying on the unreliable contemporary topline dataset counts provided by Data.gov.
- Engage in “Market Test” Advocacy: Continue using targeted litigation as a “market test” for data value; court rulings have proven to be an effective mechanism for restoring high-value data assets removed from public access.
- Coordinate Data Rescue Efforts: While redundancy and overlap is desirable, the ecosystem can suffer from competition and lack of coordination between data rescue and preservation efforts. Funders might consider conditioning projects on coordination and collaboration to ensure that redundant efforts work towards synergies and learn from one another.
- Formalize Public Comment: Use monitoring platforms (like dataindex.us) to keep track of changes to information collections that create federal data assets to organize and submit public comments during revision proposals.
Conclusion
Estimates of the scale of federal data loss and manipulation that occurred after the 2025 US Presidential Inauguration relied on largely on observing changes to the topline dataset count within the FDC on Data.gov or on changes to metadata associated with those data. By mid-2025, observers resolved on a figure of around 3,000 to 4,000 datasets removed from public access. However, the topline count on the FDC is not an appropriate barometer of disruptions to the integrity of federal data assets. Still, the estimate is approximately accurate when accounting for data losses resulting from political interference and infrastructural losses. Some of the data assets removed from public access were never indexed by the FDC and most have since been restored by court-order though the status of other assets is unknown. Moreover, information collection discontinuation and revisions to existing collections to conform to Administration priorities and directives were done at unprecedented scale through the PRA processes. Data tools, which provide data users accessibility and useability of underlying data assets, were significantly affected by disruptions during the last year. Because of the value of these resources to the public, many data tools have been the focus of restoration efforts by civic society and other special interest groups.
While political interference risks to the integrity of both data assets and data tools are substantial and real, the lawfulness of these activities is likely consistent with Executive authority - most lawsuits in this space focus on the administrative process involved in removal, manipulation, cessation or deresourcing, rather than the authorities for such actions themselves. Some legal challenges have been successful on their merits, while others have failed; no single case has wholly resolved the risks to public access to open government data. What both the lawsuits and the data tool restoration efforts reveal, however, is the magnitude of community-value of those public goods. Congress, the federal agencies, and civic society groups should work to improve the regulatory, administrative, and infrastructural durability of sustainable public access to federal data.