Teaching

Modern NLP: Mechanistic Interpretability of Large Language Models

Course held within the national PhD program in AI and Society (20 hours). The course focuses on recent topics related to mechanistic interpretability of Large Language Models (LLMs) including:

Probing
Outliers in LLMs
Activation Steering
Sparse Autoencoders

With guest lectures from Fabio Brau, Gabriele Sarti and William Rudman

Giovanni Puccetti

Teaching

Modern NLP: Mechanistic Interpretability of Large Language Models