Article Source
Bayesian symbolic regression and the learnability of closed-form mathematical models
Abstract
Symbolic regression aims to obtain closed-form mathematical models from data. The two main challenges of traditional symbolic regression, and especially of approaches based on genetic algorithms, are the need to balance model complexity and goodness of fit, and the need to explore the vast space of closed-form models rigorously. In this talk, we will discuss a novel Bayesian approach to symbolic regression, which helps to address these challenges. With regards to the complexity-fit balance, the Bayesian approach amounts to choosing (or sampling) models based on their description length. With regards to the exploration of the space of models, we propose a Markov chain Monte Carlo approach with asymptotic guarantees of performance. We will illustrate the approach by showing how it has already helped to shed light on a number of real scientific problems of interest. Finally, we will discuss how observational noise in the data induces a transition between a learnable phase in which the generating model can be discovered, when the noise is low, and an unlearnable phase, when the noise is high, in which no method can possibly discover the model that truly generated the data.