Date Submitted: February, 2024
Journal/Venue: TMLR 2025 (under review)
Satchel Grant, Noah D. Goodman, James L. McClelland
Abstract:
What types of numeric representations emerge in neural systems, and
what would a satisfying answer to this question look like?
In this work, we interpret Neural Network (NN) solutions to sequence
based number tasks through a variety of methods in an effort to
understand how well we can interpret NNs through the lens of interpretable Symbolic
Algorithms (SAs)––defined by precise, abstract, mutable variables
used to perform computations. We use GRUs, LSTMs, and Transformers trained using
Next Token Prediction (NTP) on numeric tasks where the solutions
to the tasks vary in length and depend on numeric information only latent in
the task structure.
We show through multiple causal and theoretical methods that we can interpret
NN's raw activity through the lens of simplified SAs when we frame the neural
activity in terms of interpretable subspaces rather than individual neurons.
Depending on the analysis, however, these interpretations can be graded, existing
on a continuum, highlighting the philosophical quandry of what it means to
"interpret" neural activity. We use this to motivate the introduction of Alignment
Functions: invertible, learnable functions that add flexibility to the
existing Distributed Alignment Search (DAS) framework.
Through our specific analyses we show the importance of causal interventions
for NN interpretability; we show that recurrent models
develop graded, symbol-like number variables within their neural activity;
we introduce Alignment Functions to frame NN activity in terms of general,
invertible functions of interpretable variables; and we show that Transformers must
use anti-Markovian solutions---solutions that avoid using cumulative, Markovian
hidden states---in the absence of sufficient attention layers. We use our
results to encourage NN interpretability at the level of neural subspaces
through the lens of SAs.
I must admit that I'm proud of this work. It provides a satisfying
answer to a question that has motivated a lot of my research:
what does it mean to "understand the brain"? We provide the notion of
Alignment Functions, which are invertible, learnable functions that
establish an explicit relationship between neural activity and
interpretable, understandable variables. I don't state this in the
paper, but a nice definition of "understanding a concept" is the
ability form that new concept in terms of a system of concepts that
you already accept to be true. Alignment functions provide a way to
learn such a relationship for neural activity.
Date Published: January 10, 2025
Journal/Venue: ICLR ReAlign Workshop 2025 (and under review at NeurIPS)
Satchel Grant
Abstract:
When can we say that two neural systems are the same? The answer to this question is goal-dependent, and it is often addressed through correlative methods such as Representational Similarity Analysis (RSA) and Centered Kernel Alignment (CKA). What nuances do we miss, however, when we fail to causally probe the representations? Do the dangers of cause vs. correlation exist in comparative representational analyses? In this work, we introduce a method for connecting neural representational similarity to behavior through causal interventions. The method learns orthogonal transformations that find an aligned subspace in which behavioral information from multiple distributed networks' representations can be isolated and interchanged. We first show that the method can be used to transfer the behavior from one frozen Neural Network (NN) to another in a manner similar to model stitching, and we show how the method can complement correlative similarity measures like RSA. We then introduce an efficient subspace orthogonalization technique using the Gram-Schmidt process---that can also be used for Distributed Alignment Search (DAS)---allowing us to perform analyses on larger models. Next, we empirically and theoretically show how our method can generalize to a form of model stitching, it can be more restrictive than model stitching when desired, and it reduces the number of required matrices for a comparison of n models from quadratic to linear in n. We then show how we can augment the loss objective with an auxiliary loss to train causally relevant alignments even when we can only read the representations from one of the two networks during training. And lastly, we use number representations as a case study to explore how our method can be used to compare specific types of representational information across tasks and models.
This is a followup to the Emergent Symbol-like Number Variables paper
that allows us to causally compare representations between
multiple neural systems and allows us to get closer to performing
a DAS-like method on human brains. I feel the need to clarify that
the sole-authorship was with the approval of my advisor, Jay McClelland.
I offered for him to be a co-author, but he felt that he had not
contributed enough to justify authorship (due to his stretched schedule).
He is an extremely supportive advisor, and I am grateful for both the
guidance and freedom that he has granted me. Similarly for Noah Goodman,
who has been a great mentor and collaborator on the Emergent Symbol-like
Number Variables paper, but who did not contribute to this work. I hope
to one day manage as many projects as they do!
Date Published: Sept. 6, 2023
Journal/Venue: Neuron
Niru Maheswaranathan*, Lane T McIntosh*, Hidenori Tanaka*, Satchel Grant*,
David B Kastner, Joshua B Melander, Aran Nayebi, Luke E Brezovec,
Julia H Wang, Surya Ganguli, Stephen A Baccus
Abstract:
Understanding the circuit mechanisms of the visual code for
natural scenes is a central goal of sensory neuroscience. We show
that a three-layer network model predicts retinal natural scene
responses with an accuracy nearing experimental limits. The
model’s internal structure is interpretable, as interneurons
recorded separately and not modeled directly are highly
correlated with model interneurons. Models fitted only to
natural scenes reproduce a diverse set of phenomena related
to motion encoding, adaptation, and predictive coding,
establishing their ethological relevance to natural visual
computation. A new approach decomposes the computations of
model ganglion cells into the contributions of model
interneurons, allowing automatic generation of new hypotheses
for how interneurons with different spatiotemporal responses
are combined to generate retinal computations, including
predictive phenomena currently lacking an explanation.
Our results demonstrate a unified and general approach to
study the circuit mechanisms of ethological retinal
computations under natural visual scenes.
This was a big collaboration over the course of many years.
I love this work because it is a beautiful demonstration
of how to establish an isomorphism between biological and artificial
neural networks, and it shows how you can use that sort
of model for interpreting the real biological
neural code. I am a co-first author on this work for writing
most of the project code, developing many architectural improvements,
and developing much of the interneuron comparisons.
Date Published: March 4, 2022
Journal/Venue: Asilomar
Xuehao Ding, Dongsoo Lee, Satchel Grant, Heike Stein, Lane McIntosh, Niru Maheswaranathan, Stephen Baccus
Abstract:
The visual system processes stimuli over a wide range of
spatiotemporal scales, with individual neurons receiving
input from tens of thousands of neurons whose dynamics
range from milliseconds to tens of seconds. This poses a
challenge to create models that both accurately capture visual
computations and are mechanistically interpretable. Here we
present a model of salamander retinal ganglion cell spiking
responses recorded with a multielectrode array that captures
natural scene responses and slow adaptive dynamics. The model
consists of a three-layer convolutional neural network (CNN)
modified to include local recurrent synaptic dynamics taken
from a linear-nonlinear-kinetic (LNK) model. We presented
alternating natural scenes and uniform field white noise
stimuli designed to engage slow contrast adaptation. To overcome
difficulties fitting slow and fast dynamics together, we
first optimized all fast spatiotemporal parameters, then
separately optimized recurrent slow synaptic parameters. The
resulting full model reproduces a wide range of retinal
computations and is mechanistically interpretable, having
internal units that correspond to retinal interneurons with
biophysically modeled synapses. This model allows us to
study the contribution of model units to any retinal computation,
and examine how long-term adaptation changes the retinal neural
code for natural scenes through selective adaptation of
retinal pathways.
This project was a good extension of the CNN retinal model
that I listed earlier. In this work, we managed to give
the CNN model recurrence and used previous kinetics constants
to get the model to exhibit slow adaptation (something
that was lacking from the previous work).