The glossary provides select list of terms and definitions used in this project. When feasible, term definitions were drawn from U.S. Federal Government sources as federal definitions can sometimes deviate from concensus-based definitions in standard sources. Additional general resources for terms related to this project include the Open Data Handbook Glossary and the Turing Way Glossary. The code for this glossary was adapted from jekyll-glossary by the author.

A

Access Control

In physical security and information security, access control (AC) is the action of deciding whether a subject should be granted or denied access to an object (for example, a place or a resource). Source: Wikipedia

Administrative Procedures Act

A federal act that governs the procedures of administrative law, including notice and comment related to formal and informal rulemaking. Source: Cornell Legal Information Institute

Anonymization

The process of permanently and irreversibly transforming data so that it cannot be linked to any specific individual. Source: NIST.gov

Application Programming Interface (API)

A predefined protocol for reading and/or writing data using a filesystem, a database, or across a network. Source: Data.gov


B

Bulk Data

Data that is available for download in its entirety, allowing users to efficiently retrieve the complete dataset. Source: Open Data Handbook

Bureau of Economic Analysis (BEA)

An agency that produces economic accounts statistics, including the nation’s Gross Domestic Product (GDP). Source: BEA.gov

Bureau of Justice Statistics (BJS)

The principal statistical agency of the Department of Justice, providing information on crime and the justice system. Source: BJS.gov

Bureau of Labor Statistics (BLS)

The principal federal agency responsible for measuring labor market activity, working conditions, and price changes. Source: BLS.gov

Bureau of Transportation Statistics (BTS)

The principal statistical agency of the Department of Transportation providing transportation statistics for the nation and its various regions and sectors. Source: BTS.gov


C

CDO Council

A cross-agency council that coordinates data management policy and establishes government-wide best practices for data use. Source: CDO.gov

CKAN

An open-source data portal platform used to store, manage, and distribute data assets, powering Data.gov. Source: CKAN.org

Census Bureau

The principal statistical agency responsible for the decennial census and producing data about the American people and economy. Source: Census.gov

Chief Data Officer (CDO)

An agency official responsible for data management, governance, and implementing the Evidence Act at a federal agency. Source: 44 U.S.C. § 3520

Comma-Separated Values (CSV)

A common file format for storing tabular data in plain text, where each row is a record and columns are separated by commas. Source: Open Data Handbook

Comprehensive Data Inventory (CDI)

An inventory of all data assets held by a federal agency, typically stored in a JSON format. An agency comprehensive inventory is usually the source of data used by Data.gov harvesters to populate the Federal Data Catalog. Source: M-25-05 and 44 U.S.C. § 3502

Confidential Information Protection and Statistical Efficiency Act (CIPSEA)

Title III of the Evidence Act providing legal protections for confidential information collected by federal agencies for statistical purposes. Source: Congress.gov

Controlled Unclassified Information (CUI)

Information that requires safeguarding or dissemination controls pursuant to and consistent with law, regulations, and government-wide policies that is no classified. Source: Archives.gov


D

DCAT-US

The metadata schema used for the Federal Data Catalog, based on the W3C Data Catalog Vocabulary (DCAT). Source: Resources.data.gov

Data Asset

A collection of data elements or data sets that may be grouped together. Source: 44 U.S.C. § 3502

Data Catalog Vocabulary (DCAT)

An vocabulary designed to facilitate interoperability between data catalogs published on the Web. Source: Data.gov

Data Deletion

The process whereby data is removed from active files and storage structures and rendered inaccessible except through specialized data recovery tools. Source: Society of American Archivists Glossary

Data Dictionary

A data dictionary is a document that outlines the structure, content, and variable definitions for a dataset or collection of data. A data dictionary is a critical tool for reproducibility because it allows others to understand your data. Source: Harvard

Data Governance

Data governance is the set of principles, policies, and processes that guide the effective and responsible use of data within an organization. Source: Wikipedia

Data Integrity

The maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle. Source: Wikipedia

Data Inventory

A list of data assets and their metadata maintained by an agency to track the information it collects and produces. Source: 44 U.S.C. § 3511

Data Management and Sharing Plan (DMSP)

A document describing how data will be managed, stored, protected, and shared throughout the lifecycle of a research project. Source: NIH

Data Quality

The fitness for use of data, often measured by its accuracy, completeness, consistency, timeliness, and validity. Source: FCSM

Data Standard

A technical specification that describes how data should be stored or exchanged for consistent collection and interoperability. Source: Data.gov

Data.gov

The federal government’s federal data catalog indexing public data assets from across all agencies. Source: Data.gov

De-identification

The process of removing or masking identifying information from a dataset so that the individuals cannot be readily identified. Source: NIST.gov

Differential Privacy

A mathematical framework for sharing information about a dataset while providing strong, quantifiable privacy guarantees for individuals. Source: Census.gov

Discontinuation

The termination of an Information Collection Request (ICR), ending the legal authority for an agency to gather specific data from the public under the Paperwork Reduction Act. Source: Digital.gov


E

Economic Research Service (ERS)

One of two principal statistical agencies of the Department of Agriculture that provides economic and social statistical data and analysis for agriculture, food, and the environment. Source: USDA.gov

Energy Information Administration (EIA)

The principal statistical agency of the Department of Energy responsible for collecting and analyzing independent energy information to promote sound policymaking. Source: EIA.gov

Evaluation Officer

An official designated to coordinate evidence-building activities and provide leadership over an agency’s evaluation functions. Source: 5 U.S.C. § 313

Evidence

Information produced as a result of statistical activities conducted for a statistical purpose, used to inform policymaking. Source: DOL.gov


F

FAIR Principles

Findable, Accessible, Interoperable, and Reusable (FAIR) principles aim to improve data use and reuse. Source: Go-FAIR.org

Federal Committee on Statistical Methodology (FCSM)

An interagency committee dedicated to improving the quality of Federal statistics. Source: StatsPolicy.gov

Federal Data Catalog

The metadata schema used for the Federal Data Catalog, based on the W3C Data Catalog Vocabulary (DCAT). Source: M-25-05 and 44 U.S.C. § 3502

Federal Data Strategy

A framework for a consistent approach to federal data stewardship, use, and dissemination across the Executive Branch. Source: CIO.gov

Federal Information Security Modernization Act (FISMA)

A law that requires federal agencies to implement information security programs to protect their data and systems. Source: CISA.gov

Federal Statistical Research Data Center Program (FSRDC)

A network of 34 data centers across the United States working as a partnership between federal statistical agencies and research institutions that provides secure environments for authorized researchers to access confidential statistical data. Source: U.S. Census Bureau

Federal Statistical System (FSS)

The decentralized network of federal agencies that produce official statistics to inform the public and policy makers. Source: StatsPolicy.gov

Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act)

A 2018 law (Pub. L. 115–435) that requires federal agencies to modernize data management, increase data availability, and develop evidence to support policymaking. Source: Congress.gov

Freedom of Information Act (FOIA)

A law that provides the public the right to request access to records from any federal agency, subject to certain exemptions. Source: FOIA.gov


H

Harmonized Tariff Schedule of the United States (HTUS)

A federal data product by the USITC that provides the applicable tariff rates and statistical categories for all merchandise imported into the United States; it is based on the international Harmonized System, the global system of nomenclature that is used to describe most world trade in goods. Source: usitc.gov


I

Inter-university Consortium for Political and Social Research (ICPSR)

An American political science and social science research consortium, based at the University of Michigan, ICPSR maintains and provides access to a vast archive of social science data for research and instruction (over 16,000 discrete studies/surveys with more than 70,000 datasets). Source: Wikipedia

Interagency Council on Statistical Policy (ICSP)

A council chaired by the U.S. Chief Statistician that coordinates the federal statistical system and sets government-wide best practices for data. Source: StatsPolicy.gov


J

JavaScript Object Notation (JSON)

A lightweight, text-based, language-independent data interchange format that is easy for humans to read and machines to parse. Source: Wikipedia


L

Learning Agenda

A multi-year plan (Agency Evidence-Building Plan) that identifies priority policy questions and the data/methods needed to answer them. Source: Evaluation.gov

The phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address or becoming permanently unavailable. Source: Wikipedia


M

Machine-Readable Format

A format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost. Source: 44 U.S.C. § 3502

Metadata

Data that defines and describes the characteristics of other data. Source: Wikipedia


N

National Agricultural Statistics Service (NASS)

One of two principal statistical agencies of the Department of Agriculture responsible for providing timely, accurate, and useful statistics in service to U.S. agriculture. Source: USDA.gov

National Center for Education Statistics (NCES)

The principal statistical agency of the Department of Education, housed within the Institute of Education Sciences, collecting and analyzing statistical data related to education in the United States. Source: NCES.ed.gov

National Center for Health Statistics (NCHS)

The nation’s principal health statistics agency, providing data to guide actions and policies to improve American health. Source: CDC.gov

National Center for Science and Engineering Statistics (NCSES)

The principal statistical agency of the National Science Foundation, providing statistics regarding the U.S. science and engineering, and research and development enterprise. Source: NSF.gov


O

OMB Circular A-130

OMB policy titled ‘Managing Information as a Strategic Resource’ which establishes general policy for information governance, data sharing, and privacy. Source: CIO.gov

OPEN Government Data Act (OGDA)

Title II of the Evidence Act which requires federal agencies to publish information online as open data using standardized, machine-readable formats. Source: Congress.gov

Office of Information and Regulatory Affairs (OIRA)

A statutory component of the Office of Management and Budget (OMB) that reviews federal regulations and oversees the implementation of the Paperwork Reduction Act. Source: National Archives:

Office of Research, Evaluation, and Statistics (ORES)

The principal statistical agency of the Social Security Administration responsible for statistical data on social security programs and the beneficiaries they serve. Source: SSA.gov

Open Data Plan

A mandatory annual plan describing an agency’s progress in making its public data assets available as open data. Source: GSA.gov and M-25-05

Open Format

A file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. Source: Wikipedia

Open Government Data Asset

A public data asset that is machine-readable, available in an open format, based on an open standard, and not encumbered by restrictions that impede use. Source: 44 U.S.C. § 3502

Open License

A legal guarantee that a data asset is made available at no cost and with no restrictions on copying, publishing, distributing, transmitting, citing, or adapting such asset. Source: 44 U.S.C. § 3502

Open Source Software

Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Source: Wikipedia


P

Paperwork Reduction Act (PRA)

A law governing how federal agencies collect information from the public. Source: Digital.gov

Personally Identifiable Information (PII)

Information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other information. Source: NIH Privacy Glossary

Pregnancy Risk Assessment Monitoring System (PRAMS)

A site-specific population-based surveillance system designed to identify groups of women and infants at high risk for health problems, to monitor changes in health status, and to measure progress towards goals in improving the health of mothers and infants. Source: cdc.gov

Privacy Act of 1974 (Privacy Act)

A law that establishes a code of fair information practices governing the collection, use, and dissemination of information about individuals. Source: Justice Department Office of Privacy and Civil Liberties

Privacy Impact Assessment (PIA)

An analysis of how information is handled to ensure compliance with privacy requirements and evaluate risks to PII. Source: OMB Circular A-130

Public Access Removal

A disruption in federal data availability where proactive disclosure or dissemination is halted, often resulting in datasets or tools being withdrawn from public-facing portals like Data.gov. Source: Congressional Research Service (CRS) Report R48889

Public Data Asset

A data asset maintained by the Federal Government that has been, or may be, released to the public. Source: 44 U.S.C. § 3502 and M-25-05


R

Reproducible Research

Reproducible research is work that can be independently recreated from the same data and the same code that the original team used. Source: The Turing Way


S

Schema

A data model or database structure that defines the relationships between different pieces of information. Source: Resources.data.gov

Standard Application Process (SAP)

The centralized portal and process required by the Evidence Act for researchers to apply for access to confidential statistical data from federal statistical agencies, often through an FSRDC. Source: ResearchDataGov

Statistical Official

A designated agency official with expertise in statistics who advises on statistical policy, techniques, and procedures. Source: 5 U.S.C. § 314

Statistical Purpose

The description, estimation, or analysis of the characteristics of groups without identifying the individuals or organizations in those groups. Source: 44 U.S.C. § 3561

Statistics of Income Division (SOI)

The principal statistical agency of the Internal Review Service that compiles and publishes statistcal data on the operation of the U.S. tax system. Source: IRS.gov

Synthetic Data

Information that is generated by a computer model that mimics the statistical properties of a real-world dataset but contains no real records. Source: Census.gov

System of Records Notice (SORN)

A public notice required by the Privacy Act that informs the public of the existence and character of a system of records. Source: GSA


U

United States Agency for International Development (USAID)

The USAID is a de jure agency of the executive branch of the United States federal government that, until its effective shuttering by the Trump Administration in 2025, served as the world’s largest funder of direct foreign assistance. Source: Wikipedia

United States Chief Statistician

A position within OMB responsible for coordinating the federal statistical system, US official international statistical activities, and setting government-wide statistical standards. Source: StatsPolicy.gov

United States International Trade Commission (USITC)

The USITC is an independent, nonpartisan, quasi-judicial federal agency that fulfills a range of trade-related mandates. The agency provides high-quality, leading-edge analysis of international trade issues to the President and the Congress. The USITC produces the HTUS dataset. Source: usitc.gov


V

Vocabulary

A set of standardized terms with consistent semantic definitions, typically constrained to a particular namespace or domain. Source: Resources.data.gov


W

Wayback Machine (WBM)

A digital archive of the World Wide Web provided by the Internet Archive that captures and preserves snapshots of websites to prevent the loss of information when pages are changed or removed. Source: Internet Archive


X

XML (eXtensible Markup Language)

A markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Source: W3C


This site uses Just the Docs, a documentation theme for Jekyll.