US Health Disparities Browser

Created with data from the All of Us Researcher Workbench

Search or browse diseases of interest to explore differences in prevalence among All of Us participant self - identified racial and ethnic groups

FAQ

Who are we?

We are a group of researchers at the National Institute on Minority Health and Health Disparities (NIMHD) and the Georgia Institute of Technology (GIT). We created this browser using data from the All of Us Researcher Workbench. The browser is not affiliated with, nor endorsed by, the All of Us Research Program, NIH, or HHS.

Vincent Lam | NIMHD

Leonardo Mariño-Ramírez | NIMHD

I. King Jordan | GIT



What is "All of Us"?

The NIH All of Us Research Program (All of Us) is a large progressive cohort study of US residents that combines participant genomic, phenotypic, and environmental data, with disease outcome data gleaned from electronic health records. All of Us has emphasized the recruitment of participants from underrepresented biomedical research groups, including minority racial and ethnic groups, in an effort to ensure that the benefits of precision medicine are shared equitably among all people.

All of Us participant data can be accessed and analyzed on the Researcher Workbench.



Who are the All of Us participants?

All of Us is made up of volunteer participants, all of whom provide informed consent. The participant inclusion criteria include adults 18 and older, with the legal authority and decisional capacity to consent, and currently residing in the US or a territory of the US. AoU exclusion criteria exclude minors under the age of 18 and vulnerable populations (prisoners and individuals without the capacity to give consent). Details on participant recruitment, informed consent, inclusion and exclusion criteria are available online.

Results reported on the browser comply with the All of Us Data and Statistics Dissemination Policy. The browser shows summary statistics only, does not reveal participant-level data in any way, and does not display any participant group count ≤20.



What are Health Disparities?

The US National Institute on Minority Health and Health Disparities (NIMHD) defines health disparities as “health differences that adversely affect disadvantaged populations”, and the US Centers for Disease Control and Prevention (CDC) defines health disparities as “preventable differences in the burden of disease, injury, violence, or opportunities to achieve optimal health that are experienced by socially disadvantaged populations”.

NIMHD is particularly interested in health disparities that burden racial and ethnic minority groups. The All of Us Health Disparities Browser operationally defines health disparities as differences in disease prevalence among All of Us participant self-identified race and ethnicity groups.



How are All of Us participant race and ethnicity defined?

All of Us participants completed a number of surveys, including “The Basics”. As part of this survey, participants were asked “Which categories describe you? Select all that apply. Note, you may select more than one group.

• American Indian or Alaska Native

• Asian

• Black, African American, or African

• Hispanic, Latino, or Spanish

• Middle Eastern or North African

• Native Hawaiian or other Pacific Islander

• White

Note that participant data on American Indian or Alaska Native identity are currently unavailable on the All of Us Researcher Workbench.

The source for this question is taken the original draft version race and ethnicity question for the 2020 US Census. In 2017, in preparation for the 2020 Census, the U.S. Census Bureau tested new methods for capturing more accurate data on race and ethnicity than had been captured in previous census years. In 2018, the Census Bureau decided not to use the detailed versions of these questions.

Participant answers to this survey question were subsequently converted into separate race and ethnicity variables to conform to the current Office of Management and Budget (OMB) standards on race and ethnicity.



How are participant diseases characterized?

All of Us participants provide access to their electronic health records (EHR), which contain their medical and treatment histories. Disease diagnoses are represented in participants’ EHR as International Classification of Diseases (ICD) codes (ICD-9cm and ICD-10cm). Participant ICD codes were converted into Phecodes for generating case-control cohorts for x diseases or health conditions.



How is disease prevalence calculated?

Case numbers represent the number of individuals in a cohort who have a particular disease. Control numbers represent the number of individuals in a cohort who do not have that particular disease. Prevalence percentages were calculated by dividing the total number of cases for a disease by the sum of the total number of cases and controls in the disease cohort. For instance, if Disease X’s disease cohort consisted of 15 cases and 85 controls, its estimated prevalence is 15/(15+85) = 15%. Such a value would be called a “raw” or “unadjusted” prevalence estimate, as it is the unmodified percentage of individuals in a cohort who have a particular disease.

The prevalence estimates that are displayed in this browser are adjusted estimates. We decided to apply adjustments to our prevalence estimates because of the composition of the All of Us participant body. All of Us participants tend to be older than the average U.S. citizen and most participants are female. As such, relying on unadjusted estimates may result in wildly inaccurate prevalence estimations for diseases that vary by age and sex. In our adjustments, we scaled prevalence percentages based on age distributions obtained from U.S. Census data. This was done according to the guidelines detailed here. For every racial category included in this browser, we sorted participants into different groups corresponding to different age/sex combinations (e.g. Males aged 20-24). Prevalence estimates were calculated for each such group, weighed by their corresponding census fractions, and summed.



What do the H-shaped symbols on the end of each plotted bar represent?

These symbols are referred to as “error bars”. The error bars displayed on each plot the browser generates represent 95% confidence intervals. They indicate the degree of uncertainty inherent in a particular prevalence estimate. Wider bars correspond to greater uncertainty. These intervals were calculated with a series of formulas devised by Dr. John Spouge from the National Library of Medicine.

The first formula, shown below, is used to calculate the standard error of a prevalence estimate for a particular racial group for a particular disease.

Standard error, in this case, is a rough measure of the statistical accuracy of the prevalence estimate. In this formula, f is equal to the census fraction of a particular age-sex group, K is equal to the number of cases present in the age-sex group, and n is equal to the number of individuals in the age-sex group. The inner term is applied to each age-sex group for a particular race and summed up before the square root is applied, yielding the standard error for a racial group’s prevalence estimate. This value is passed to the formula shown below to yield the lower and upper bounds for a prevalence estimate’s confidence interval:

In this case, if our standard error is 1%, our bounds would be 1.96% below and 1.96% above our prevalence estimate. For an estimate of 10%, for instance, we would have 95% confidence that our true prevalence estimate lies between 10 - 1.96 = 8.04% and 10 + 1.96 = 11.96%.

You may notice that the prevalence plots for certain diseases have prevalence estimates with no visible error bars, such as in the plot for cancer of the gums:

These estimates have 95% confidence intervals which, if calculated, would have a negative lower bound. As it would not make sense for a prevalence estimate’s true value to possibly be negative, such confidence intervals would be inaccurate and are thus not shown.



Acknowledgements

The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants.