Paper Unavailable
Management of Substrate-Specific AI Capabilities (MoSSAIC)
Matthew Farr*, Chris Pang*, Sahil K
*equal contribution
Abstract
AI safety is a rapidly developing field with several competing theoretical perspectives. A large bulk of work being done today operates within what we identify as the mechanistic or causal paradigm, which seeks to decompose neural networks into sub-components and reliably extract safety insights from this decomposition. We propose several scenarios where such an approach may be inadequate and propose a supplementary framework, MoSSAIC (Management of Substrate-Sensitive AI Capabilities), which addresses some of the core assumptions that underlie the mechanistic paradigm. A roadmap for initial work under this framework and core motivations are also put forward.