This talk introduces a new generation of machine learning methods that provide state of the art performance and are very interpretable, introducing optimal classification (OCT) and regression (ORT) trees for prediction and prescription with and without hyperplanes. This talk shows that (a) Trees are very interpretable, (b) They can be calculated in large scale in practical times, and (c) In a large collection of real world data sets, they give comparable or better performance than random forests or boosted trees. Their prescriptive counterparts have a significant edge on interpretability and comparable or better performance than causal forests. Finally, we show that optimal trees with hyperplanes have at least as much modeling power as (feedforward, convolutional, and recurrent) neural networks and comparable performance in a variety of real world data sets. These results suggest that optimal trees are interpretable, practical to compute in large scale, and provide state of the art performance compared to black box methods.