Skip to main content
MIT Corporate Relations
MIT Corporate Relations
Search
×
Read
Watch
Attend
About
Connect
MIT Startup Exchange
Search
Sign-In
Register
Search
×
MIT ILP Home
Read
Faculty Features
Research
News
Watch
Attend
Conferences
Webinars
Learning Opportunities
About
Membership
Staff
For Faculty
Connect
Faculty/Researchers
Program Directors
MIT Startup Exchange
User Menu and Search
Search
Sign-In
Register
MIT ILP Home
Toggle menu
Search
Sign-in
Register
Read
Faculty Features
Research
News
Watch
Attend
Conferences
Webinars
Learning Opportunities
About
Membership
Staff
For Faculty
Connect
Faculty/Researchers
Program Directors
MIT Startup Exchange
Back to Faculty/Researchers
Prof. Stephen Bates
X-Window Consortium Career Development Assistant Professor of Electrical Engineering and Computer Science
Primary DLC
Department of Electrical Engineering and Computer Science
MIT Room:
32-D758
(617) 253-4600
stephenbates@mit.edu
https://stephenbates19.github.io/
Areas of Interest and Expertise
Artificial Intelligence and Machine Learning
Information Science and Systems
Optimization and Game Theory
Systems Theory, Control, and Autonomy
Research Summary
Bates uses data and AI for reliable decision-making in the presence of uncertainty. In particular, he develops tools for statistical inference with AI models, data impacted by strategic behavior, and settings with distribution shift. Bates also works on applications in life sciences and sustainability. He previously worked as a postdoc in the Statistics and EECS departments at the University of California at Berkeley (UC Berkeley). Bates received a B.S. in statistics and mathematics at Harvard University and a Ph.D. from Stanford University.
Professor Bates believes that the conceptual, algorithmic, and mathematical advances enable us to use data and AI models to better understand complex patterns in the physical and social world and to build reliable automated systems. To this end, he focuses on developing statistical principles and formal frameworks to understand challenging types of data that are increasingly important. In particular, Professor Bates works on:
(*) Statistical inference with AI systems. AI models based on deep neural networks are increasingly used in real-world systems. Their use is motivated by the fact that they have the best performance with high-dimensional data, such as image and natural language data. However, the standard statistical toolbox does not apply here; users seeking assurances about the reliability of these models, such as confidence intervals on predictions or bounds on the false discovery rate across multiple decisions, are left with little recourse based on the existing literature. He seeks to build out a rich statistical toolbox for AI models, so that researchers can use these powerful systems while remaining on solid statistical ground. Work in this theme builds on core statistical techniques such as resampling methods, multiple hypothesis testing, and empirical process theory.
(*) Data impacted by strategic behavior and information asymmetry. Data emerging from systems with human decision-makers is increasingly important, and the possible strategic behavior raises new inferential challenges. For example, profit-sensitive pharmaceutical companies sponsor clinical trials -- which are then analyzed according to some statistical protocol -- and are heavily rewarded for drugs that are approved. Correctly analyzing data affected by strategic agents is critical, and I am building methods for this, building on concepts from decision theory, game theory, and statistics.
(*) Shifting distributions and feedback loops. More broadly, data are increasingly collected from dynamic environments with shifting distributions, and these shifts can be caused by changes made to the system or policy. Bates works to extend statistical methods in such non-I.I.D. settings. For example, consider protein design, where the analyst has access to some set of proteins and an associated fitness score. The goal is to design a new protein that has higher fitness than those seen previously. The analyst might fit a model predicting fitness from protein structure, and then chooses a good candidate protein to synthesize and measure the fitness of in a wet-lab experiment. This process is repeated several times, so there is a feedback loop; the model the analyst fits affects the subsequent data collection. Such, non-I.I.D. settings with shifting distributions are increasingly relevant to modern data analysis, and it is essential to create techniques to address this.
He is especially interested in applications in the life sciences and sustainability.
Recent Work
Related Faculty
Steven M Bauer
Lecturer
Claudia Perez D'Arpino
Graduate Student
Jiajun Wu
Graduate Student