Robot perception and computer vision have witnessed an unprecedented progress in the last decade. Robots and autonomous vehicles are now able to detect objects, localize them, and create large-scale maps of an unknown environment, which are crucial capabilities for navigation and manipulation. Despite these advances, both researchers and practitioners are well aware of the brittleness of current perception systems, and a large gap still separates robot and human perception. While many applications can afford occasional failures (e.g., AR/VR, domestic robotics), high-integrity autonomous systems (including self-driving vehicles) demand a new generation of algorithms. This talk discusses two efforts targeted at bridging this gap. The first focuses on robustness: I present recent advances in the design of certifiable perception algorithms that are robust to extreme amounts of outliers and afford performance guarantees. These algorithms are “hard to break” and are able to work in regimes where all related techniques fail. The second effort targets high-level understanding. While humans are able to quickly grasp both geometric and semantic aspects of a scene, high-level scene understanding remains a challenge for robotics. I present recent work on real-time metric-semantic understanding, which combines robust estimation with deep learning.