Principal Investigator Stefanie Jegelka
Project Website https://stat-mit-edu.ezproxy.canberra.edu.au/research/combinatorial-learning-with-set-functions/
Learning problems that involve combinatorial objects are ubiquitous – they include the prediction of graphs, assignments, rankings, trees, groups of discrete labels or preferred sets of a user; the expression of prior structural knowledge for regularization, the identification of sets of important variables, or inference in discrete probabilistic models.
Many of these problems can be phrased in the language of set functions that express valuations, costs or probabilities. Viewed through this lens, it becomes apparent that a surprisingly wide spectrum of combinatorial learning problems share interesting mathematical properties. A prominent example is the combinatorial property of submodularity, also referred to as a discrete analog of convexity.
Submodular functions are characterized by the notion of diminishing marginal returns (or costs). Many combinatorial functions are known to be submodular, ranging from graph cuts to rank functions or connectivity. More recently, their importance in statistical learning has been emerging.Viewed as valuations, submodular functions naturally express diversity, coverage, repulsion and information. Examples include the influence of a set of nodes in a social network, the information of a set of variables about an unknown quantity of interest, greedy variable selection, valuation functions for summarization, and rank functions.Viewed as costs, these functions lead to grouping and the selection of coherent items, and occur when modeling spatial or temporal coherence, or latent group structure.
While the rich concept of submodularity offers close connections to both convexity and concavity, appropriate models and computationally efficient methods for learning with submodular functions on large modern data sets remain an ongoing research challenge.By carefully exploiting properties of submodular functions, and the discrepancy between average- and worst-case instances, we have been able to design accurate, scalable and parallelizable algorithms for submodular optimization and learning. Our analysis identifies properties that determine complexity and help explain empirical successes despite harsh theoretical worst-case results.