Open Evidence

Accelerating AI Innovation with Data Governance Principles for Federally Funded Research


These comments were presented during a lightning round at The Data Foundation’s The Genesis Mission: Universities’ Opportunity to Shape AI-Driven Scientific Discovery webinar on April 28th, 2026.

The Genesis Mission calls for an historic shift in how the Federal government has traditionally approached curation of federally-funded research data by moving away from siloed data repositories to an integrated “American Science and Security Platform” designed for the AI era. Because research data are multiplex and different disciplinary norms challenge such a unified strategy, to be successful, Genesis will require a robust data governance policy and implementation strategy.

Data is mentioned 83 times in the Genesis Mission’s critical science and technology (S&T) challenges report. That’s more mentions than both “physics” and “energy” combined, if you exclude “Department of Energy” from that calculus. Clearly, data is fueling the engine of this mission and its proposed compute platform. But, large-scale federal science initiatives require more than just throwing computational power at data to make use of it. They require a foundation of robust data governance that includes strong semantic ontologies, controlled vocabularies, and metadata standards that all enable interoperability and AI readiness.

The policy mechanisms guiding the data governance responsibilities of researchers and their institutions are clearly outlined in the DOE’s notice of funding opportunity. Participation requires a firm commitment to open science jointly with research security. Under the DOE’s Public Access Plan, full-text versions of scientific publications must be publicly accessible at no charge. And, the DOE expects a comprehensive Data Management and Sharing Plan to be implemented by grantees. The Mission expects that software and AI models developed through these grants should be made available using open-source licenses, complete with proper Software Package Data Exchange identifiers. Teams are expected to associate their data, models, and artifacts with high-quality metadata to maximize eventual integration into the platform, ensuring they are discoverable and reusable by other authorized researchers. Ideally, outputs created by researchers participating in the Mission should automatically comply with the Federal government’s open-by-default mandate from the Evidence Act.

For university provosts for research and administrators, planning for these requirements involves several concrete steps. First, institutions must invest in data infrastructure and personnel. The responsibility of data governance compliance cannot rest solely on individual principal investigators; universities need institutional data repositories, data librarians, and support frameworks to prepare multi-modal datasets for AI ingestion. Second, there needs to be a structural adjustment regarding data sharing. Curating high-quality, open-access datasets should be recognized by institutional guidelines as a valuable academic contribution alongside peer-reviewed papers. Finally, universities should proactively establish data governance standards that align with federal expectations. Because the Genesis Mission relies on platform integration, researcher output pipelines must be ready to interface smoothly with DOE National Laboratories and industry partners.

The Genesis Mission presents an opportunity to advance how we approach complex scientific challenges. However, the pace of that research will be directly tied to how well we manage the underlying data. By adhering to strong data governance, open science, and FAIR principles, we can ensure that the research ecosystem remains transparent, reproducible, and equipped to support these new initiatives.



Recommended Citation:
Christopher Steven Marcum (April 28, 2026). "Accelerating AI Innovation with Data Governance Principles for Federally Funded Research." Open Evidence. https://doi.org/10.59350/kfq7r-38070