Building the CID

The fundamental insight of the CID is that no single data source on its own can provide a full or accurate measure of well-being. But when multiple data sources are linked together, the strengths of each data source can be harnessed while overcoming its limitations. Combining our unprecedented set of data sources in a way that maximizes the accuracy of our income measures requires rigorous research and the application of cutting-edge statistical techniques.

Three main types of data sources

The CID relies on three main types of data sources—household surveys, tax records, and federal and state administrative program data on government benefits. Each data source has unique strengths. Surveys provide rich demographic information that allows for analysis by race, educational attainment and other characteristics. Tax data contain highly accurate information on certain income sources such as earnings and have near universal coverage, including many non-filers whose tax forms are supplied by employers and government agencies. Administrative data from government programs provide income information that is not captured well or at all by surveys or tax data.

Combining data sources to create a highly accurate income measure

We link all data sources at the individual level using anonymized identification codes created by the Census Bureau to ensure the confidentiality of personal data. We conduct rigorous research and apply cutting edge statistical techniques to impute missing data, as well as to inform broader conceptual decisions about how to optimally combine data sources. For example, we are pioneering a new methodology that uses a novel set of dozens of material hardship measures—such as housing quality problems, food insecurity and mortality patterns—to validate decisions on how to construct a comprehensive measure of income. This evidence-based approach for constructing income measures represents a major step forward for the income measurement field, and it will ensure that our comprehensive income and poverty measures are as accurate and useful as possible.

Our progress

The CID Project has to date linked together four household surveys with an extensive set of tax records and twelve sources of federal and state administrative program data—to our knowledge the most comprehensive set of linked income-related data ever created for the United States. Nonetheless, we are committed to pushing the frontier as far as possible by incorporating ever richer data and developing new income concepts and statistical techniques to maximize the accuracy of CID-based income measures.

The table below lists all data sources linked or planned to be linked to the CID.

Data Sources Linked or Planned to be Linked to the Comprehensive Income Dataset

Surveys Tax data Administrative program data Other data sources

Current Population Survey  – Annual Social and Economic Supplement

Survey of Income and Program Participation

American Community Survey

Decennial Census

Consumer Expenditure Survey*

Current Population Survey – Basic Monthly*

Basic data

1040 forms

W-2 forms

1099-R forms

Tax Credits (e.g., EITC, CTC)

Unemployment Insurance*

Extensive data

1098 Forms: 1098, 1098-E, 1098-T

1099 Forms: 1099-C, 1099-MISC, 1099-G, 1099-DIV, 1099-INT, 1099-LTC, 1099-OID, 1099-PATR, 1099-Q, 1099-S, SSA-1099

1040 Forms: 1040-A, 1040-C, 1040-D, 1040-E, 1040-F, 1040-SE

Other Forms: 5498, K-1

Federal sources (agency)

Old Age and Survivors’ Insurance (SSA)

Social Security Disability Insurance (SSA)

Supplemental Security Income (SSA)

Public housing (HUD)

Housing Choice Vouchers (HUD)

Section 8 Project-based housing (HUD)

Medicare enrollment (HHS)

Medicaid enrollment (HHS)

Temporary Assistance for Needy Families (HHS)

Veterans Benefits (VA)

State sources

Public Assistance

Supplemental Nutrition Assistance Program

Special Supplemental Nutrition Program for Women, Infants and Children*

Low-income Home Energy Assistance Program*

Workers’ compensation*

Child support payments*

Social Security Numident file

Homeless Management Information System*

Notes: *Denotes data source not yet linked into the Comprehensive Income Dataset. Extensive tax data are available only for select years. EITC = Earned Income Tax Credit. CTC = Child Tax Credit. SSA = Social Security Administration. HUD = Department of Housing and Urban Development. HHS = Department of Health and Human Services. VA = Department of Veterans Affairs.

Scroll to Top