On Localization in Language Models

Yonatan Belinkov (Technion - Israel Institute of Technology)
Large Language Models and Transformers

Abstract

Questions of distributivity or localization of information have long plagued the fields of artificial intelligence and cognitive science. Should a single unit encode a single concept, or should all units encode all concepts? Distributed representations power today’s successful neural models in NLP and other domains. As models scale to billions of parameters, we seem to be moving further away from a localist view. In this talk, I will review recent work on identifying the role of individual components such as neurons and attention heads in language models. I will show that such components can be characterized, and that analyzing the internal structure and mechanisms of language models can elucidate their behavior in various cases, including memorization, gender bias, and factual recall. I’ll conclude by demonstrating how such analyses can inform mitigation procedures to make these models more robust and up-to-date.

정보의 분산성 또는 지역화와 관련된 문제는 인공 지능과 인지 과학 분야에서 오랫동안 고민해온 주제입니다. 하나의 단위가 하나의 개념을 인코딩해야 할까요, 아니면 모든 단위가 모든 개념을 인코딩해야 할까요? 분산 표현은 오늘날 자연어 처리 및 다른 도메인의 성공적인 신경 모델을 지원합니다. 모델이 수십억 개의 매개변수로 확장될수록 지역주의적인 관점에서 멀어지는 것처럼 보입니다. 이 강연에서는 언어 모델 내의 개별 구성 요소인 뉴런과 어텐션 헤드의 역할을 파악하는 최근 연구에 대해 살펴보겠습니다. 이러한 구성 요소들이 특징화될 수 있으며, 언어 모델의 내부 구조와 메커니즘을 분석함으로써 기억, 성별 편향 및 사실적 기억과 같은 다양한 경우에서 그들의 동작을 명확히 할 수 있다는 것을 보여줄 것입니다. 이러한 분석이 어떻게 모델의 견고성과 최신 정보를 반영하기 위한 완화 절차에 기여할 수 있는지를 설명하여 강연을 마무리하겠습니다.

Stop Thinking, Just Do!