Measuring Goal-Directedness

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

Matt MacDermott, James Fox, Francesco Belardinelli, Tom Everitt

Abstract

We define maximum entropy goal-directedness (MEG), a formal measure of goal-directedness in causal models and Markov decision processes, and give algorithmsfor computing it. Measuring goal-directedness is important, as it is a criticalelement of many concerns about harm from AI. It is also of philosophical interest,as goal-directedness is a key aspect of agency. MEG is based on an adaptation ofthe maximum causal entropy framework used in inverse reinforcement learning. Itcan measure goal-directedness with respect to a known utility function, a hypothesisclass of utility functions, or a set of random variables. We prove that MEG satisfiesseveral desiderata and demonstrate our algorithms with small-scale experiments.