Updating the State of the Art
Marzyeh Ghassemi is an assistant professor at MIT in the Department of Electrical Engineering and Computer Science and at the Institute for Medical Engineering and Science. She leads the Healthy ML Group at MIT, where she creates and applies machine learning to understand and improve health in ways that are novel, robust, private, and fair.
For the better part of a decade, we have been hearing about the power of machine learning to transform healthcare. There is no shortage of studies to show that machines, with their superhuman ability to recognize patterns in data, can outperform their human counterparts in the context of healthcare: from identifying skin cancer to predicting whether someone will wake up from a coma. It is an exciting field that has the potential to deliver highly personalized medicine, fast and efficient diagnoses, and rapid drug development, among other things. But the devil is in the details. For starters, health data can be unwieldy; it's heterogeneous, messy even, there is a lot of noise in the system. Just ask Marzyeh Ghassemi, an assistant professor at the Institute who heads up a team of researchers in the Healthy ML group at MIT.
A few years ago, Ghassemi was nearing the end of her PhD candidacy at the MIT Computer and Science Laboratory, when one of her committee members asked if she had ever looked at the performance of her machine learning models across different types of patients. For close to seven years, much of Ghassemi’s research had focused on whether high-capacity neural network models could be used to understand when human physiology would break down, thereby anticipating the need for specific clinical interventions. Knowing ahead of time, for example, when a patient in septic shock needs a vasopressor to prevent a dangerously low dip in blood pressure could mean the difference between life and death. Up until that point, like most machine learning practitioners, Ghassemi had been reporting performance in aggregate. Her approach was, as she says, “a general machine learning thing.”
When Ghassemi examined the stratification of these state-of-the-art models across different subsets of patients, she couldn’t help but notice that they performed significantly worse in minoritized groups. It was an eye-opening experience that inspired her to refine her research methods and goals. These days, Ghassemi is still leveraging healthcare data with machine algorithms to improve patient outcomes. But, to great effect, she has brought questions around fairness, robustness, and privacy to the fore.
In healthcare, we know that social biases often result in decreased access or poorer standards of care. If that bleeds into our training of deep neural models, and a recommendation is given to a doctor with a sheen of objectivity, we risk propagating these problems further.
Looking back on that revelatory moment during her PhD program, Ghassemi says, “In healthcare, we know that social biases often result in decreased access or poorer standards of care. If that bleeds into our training of deep neural models, and a recommendation is given to a doctor with a sheen of objectivity, we risk propagating these problems further.” With that in mind, she and her research team, along with colleagues from the University of Toronto, recently evaluated a group of machine learning methods known as deep metric learning (DML), which are widely used for tasks related to image recognition. Any machine learning model depends heavily upon the information that it is trained on, so bias in a model is often tied to an underlying bias in the data. But practitioners like Ghassemi can only work with the data at hand, and there will always be minority populations in any data set. In a recently published paper, “Is fairness only metric deep?” Ghassemi describes a novel solution to the problem—which she calls PARADE (partial attribute decorrelation)—that reduces bias in a DML model even if it is trained on unbalanced data, thereby instilling fairness and improving performance for minority groups.
But fairness is not the only variable to consider, not when you are dealing with the intersection of machine learning technologies and healthcare. Rather, issues of privacy and utility must enter the equation. As Ghassemi puts it, “You can’t blindly throw state-of-the-art machinery at a healthcare problem. You can’t naively try to apply these large levers that we have at our disposal—there are always going to be trade-offs.” This point is emphasized in her group’s recent work on the failings of explainability metrics in “The road to explainability is paved with bias.”
You can’t blindly throw state-of-the-art machinery at a healthcare problem. You can’t naively try to apply these large levers that we have at our disposal—there are always going to be trade-offs.
In a similar vein, Ghassemi studied the interaction between differential privacy, fairness, and utility in a pair of papers titled “Can you fake it until you make it?” and “Chasing your long tails.” Considered the gold standard for gleaning useful information from personal data while maintaining privacy, differential privacy has proven useful for a wealth of real-world applications. Google uses a version of it to let users know how busy a business is throughout the day; it is employed by Apple to determine text and emoji suggestions; and Uber uses it when calculating average trip distance by users. But it turns out that when adding differential privacy to standard prediction tasks in a hospital, a model performs quite poorly. Specifically, Ghassemi found that models using differential privacy performed significantly worse for Black patients. In essence, adding differential privacy in a medical setting leads to a severe drop in utility.
Most people working in her field are driven by real-world problems, says Ghassemi, which means there is tremendous potential for academic labs to work closely with industry. Currently, the Healthy ML group is collaborating with Microsoft, Takeda, and IBM AI. “It’s important for us to understand real problems in a hospital setting in order to incorporate those ideas into our research,” she says.
Ghassemi believes she works in a space of good intentions. Stakeholders at every level—from hospital systems and doctors to patient advocacy groups and startups—want to improve health and the lives of human beings. But there is still so much unknown that good intentions may not be enough. “Tight collaboration in this space is important,” says Ghassemi. “Even if you are not doing a deployment, somebody else could read your paper and deploy it. Understanding a healthier life cycle of data, technology, and deployments is critical for the field.”