Editor’s note: Michael Sanky is global industry lead of healthcare and life sciences at Databricks, an enterprise software company.
As pervasive health disparities in the U.S. continue to widen, data and artificial intelligence offer the potential to help close that gap. New technologies can analyze large, diverse data sets, informing the work of researchers, decision makers and policymakers across healthcare. If done correctly, AI can ultimately improve care delivery, advance proactive healthcare planning and predictive treatments, reduce clinician burnout and drive better patient outcomes.
To this end, we’ve already seen groundbreaking advancements that move the needle in healthcare. For example, more than 520 medical AI tools were approved by the Food and Drug Administration in November 2022 alone.
The challenges with using AI in healthcare services start with data sets that are not always representative of the people and communities being served. If AI systems are trained on data sets that under-represent certain populations, they will create bias in the algorithms and recommendations, that likely will exacerbate health disparities.
It’s up to us as healthcare and technology leaders to build equitable, fair AI systems to better support patients — and it starts with data. Here’s how:
Trust and governance are essential
Due to geographic sprawl and the absence of a universal healthcare system, U.S. data sets are single-center and difficult to access. As a result, bias crops up. Research in 2020 found that data cohorts from just three states — California, Massachusetts and New York — were disproportionately used to train clinical deep learning algorithms, “with little to no representation from the remaining 47 states.” That means that a lot of the data technologists and healthcare professionals are relying on to deliver care isn’t representative of the makeup of their local populations.
This problem only worsens when you consider the broader picture. Social determinants of health greatly impact health equity, accounting for between 30% to 55% of health outcomes. These non-medical socioeconomic factors like where people are born, grow up, work and live, are central to understanding gaps, needs and opportunities for improvement.
While some of this data does exist in various state and county departments, decision makers and researchers are often unable to access it in a way that allows for effective analysis. They still need computing infrastructure and power, access to a data platform, data engineering tools and compute and analytic tools to perform the analysis. But, more often than not, data is disaggregated (i.e. not specific enough to subpopulations), low-quality or based on small sample sets.
Trust is a critical element to the equation, as patient engagement can bolster our data sets. Participation from users of both open source and commercial data platforms can help build high-quality real-world data so AI models can learn from unbiased data. Genetic biomarkers, for example, are vital data to help develop drugs that can prevent and treat disease. But many patient populations don’t have sufficient trust in the system to make this happen; we must build trust by communicating how data is used, safeguarding data privacy and investing in education around the importance of good data.
Governance is another important component of building equitable AI systems. Models should be as transparent as possible, with a “glass box” approach and a high degree of explainability. Technology can support this process by logging the relative importance of features impacting a model’s output, and visually explaining the results through graphs.
Teams building AI systems need robust documentation around the models and proposed usage of models to prevent misuse. Model performance should be evaluated across different demographic groups to understand potential bias. Mature organizations should equip data scientists with appropriate training, and programmatically support responsible use of AI.
Where we go from here
Implemented at scale, AI has vast potential. According to Harvard’s School of Public Health, AI can save the medical industry upward of $150 billion in costs by 2025. Other estimates show that improving outcomes through reducing health disparities can drive $3 trillion in annual incremental GDP, per McKinsey.
But perhaps most poignant is AI’s potential to reframe healthcare relationships from reactive to proactive, or transforming the healthcare system from “sick care” to true healthcare, thanks to AI and data’s potential to help with proactive care.
To make this a reality, everyone in the industry has a role to play. Stakeholders and leaders must work to accelerate data literacy across the healthcare industry, and advocate for better national data standards. Technology platforms must support the secure transfer and access of data in a way that protects patient privacy, and offer AI tools that increase transparency and explainability.
Organizations must support data scientists with governance and empower them with training. Furthermore, we must drive initiatives that boost patient and industry trust: trust in community health centers, in pharmacies, in hospitals and in data at large. In doing so, we can improve data quality to lead AI systems into a better healthcare system.