Hi! I'm a machine learning researcher interested in probabilistic machine learning.

I've recently become interested in neural theorem proving. I've previously dabbled in methods for efficient human-robot interaction. In particular, methods from Bayesian optimization, experimental design, and program synthesis for optimal and trustworthy communication.

A bit about me: I am currently a hacker at Cohere. I obtained my PhD from Cornell, advised by Sasha Rush. I started grad school at Harvard, also with Sasha. Before that, I was a research engineer at Facebook AI Research. And before all that, I scraped by as an undergrad at UPenn CIS.

Research Topics

Interaction as optimal control
In human-robot interaction, many tasks are too complex to accomplish in a single turn. How can robots successfully collaborate with humans in as few turns as possible? We frame interaction as an optimal control problem, and explore simple heuristics.
Scaling discrete latent variable models
Discrete structure is common in the world (language, biology, code), and can also yield efficient or interpretable models. However, discrete structure makes learning difficult due to non-differentiability. Can we scale models with discrete structure? And what structural properties can we take advantage of?