AI Emotion

The ability to represent emotion plays a significant role in human cognition and social interaction, yet the high-dimensional geometry of this affective space and its neural underpinnings remain debated. A key challenge, the ‘behavior-neural gap,’ is the limited ability of human self-reports to predict brain activity. Here we test the hypothesis that this gap arises from the constraints of traditional rating scales and that large-scale similarity judgments can more faithfully capture the brain's affective geometry. Using AI models as ‘cognitive agents,’ we collected millions of triplet odd-one-out judgments from a multimodal large language model (MLLM) and a language-only model (LLM) in response to 2,180 emotionally evocative videos. We found that the emergent 30-dimensional embeddings from these models are highly interpretable and organize emotion primarily along categorical lines, yet in a blended fashion that incorporates dimensional properties. Most remarkably, the MLLM's representation predicted neural activity in human emotion-processing networks with the highest accuracy, outperforming not only the LLM but also, counterintuitively, representations derived directly from human behavioral ratings. This result supports our primary hypothesis and suggests that sensory grounding—learning from rich visual data—is critical for developing a truly neurally-aligned conceptual framework for emotion. Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations, offering a powerful paradigm to bridge the gap between subjective experience and its neural substrates.

Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports

Abstract

Visualization of MLLM's 30-dimensional affective embeddings, with the top 9 video stimuli exhibiting the highest activation in each dimension.

Visualization of LLM's 30-dimensional affective embeddings.

Visualization of Human (category)'s 30-dimensional affective embeddings.

Visualization of Human (dimension)'s 30-dimensional affective embeddings.

Overview of our work.

Overview of the experimental and analytical pipeline.

Bridging the behavior-neural gap with affective representations derived from MLLM.

MLLM representations show superior alignment with neural representations of emotion.

Models develop interpretable, categorical, and blended affective representations.

Models develop interpretable, categorical, and blended affective components.

Models computationally reconcile the category-dimension debate with a hybrid coding scheme.

Learned affective spaces are primarily organized by emotion categories.

Illustration of example video stimuli with their dominant dimensions.

Interpretation of the dimensions in the learned affective embeddings.

Visualizing Attribution in (M)LLM Emotion Recognition: GradCAM Heatmap Analysis.

Dimensional interpretation of learned affective embeddings using GradCAM.

Verifying Causal Interpretability of Affective Embeddings through Dimensional Manipulation.

Modifying the affective content of videos through dimensional manipulation of affective embeddings.

Convergent evolution and divergent signatures in human and artificial affect.

Convergent and divergent structures of affective spaces across humans and models.