Seminar Explainable Machine Learning
(Summer term 2024)

Who, when, where

Who: Ulrike von Luxburg together with Robi Bhattacharjee, Eric Günther, Gunnar König, Sebastian Bordt,
When: Wednesdays 14:15 - 16:00 and two compact days, details below
Where: Maria-von-Linden-Strasse 6, Lecture hall ground floor / Seminar room 3rd floor
Language: English
Credit points: 3 CP

Ilias link


Modern machine, learning methods, such as deep learning, random forests, or XGBoost typically produce ``black box models'': they can excel at prediction, but it is completely unclear on which criteria these predictions are being based. While there are many applications where this might not be an issue, there are others where a deeper understanding of the models is necessary: applications in medicine, applications in society (say, credit scoring), or applications in science, where we want to understand the underlying processes. Also, from the legal point of view, explanations are arguably requested in the new AI Act that regulates machine learning applications, and explanations are often considered as a means to establish trust in machine learning applications. The field of explainable machine learning, often abbreviated as XAI, tries to develop methods and algorithms that supposedly produce explanations for machine learning models. In this seminar, we are going to discuss many of the standard approaches in this field. We will also discuss critically whether and under which conditions the suggested methods might achieve their goal or not.


This seminar is intended for master students in machine learning, computer science or related fields. Basic knowledge on machine learning, for example at least one of the standard classes in the ML master program, is required.


Participants can start registering now on Ilias. Registration will remain open until the end of the first week of term. If more students register than we can accomodate, we will make a random selection in week three of the term. Details will be explained in the seminar.

Please note that already more than 50 people have registered, and we can accomodate only around 20 students with a presentation. So if you are unsure whether you like the topic, you might register for some other seminar instead ...


  • Phase 1 (Mid April - Mid May): During the first four weeks, the seminar proceeds as a lecture: Ulrike Luxburg and three of her postdocs give an overview on the basic methods and questions in the field.
  • Phase 2 (Mid May - End June): The seminar participants work on their individual presentations.
  • Phase 3 (End June / Beginning July): Presentations take place on two whole days (dates see above). Presentations need to be in english.
To pass the seminar, each participant has to give a presentation, needs to act as sparring partner for another paper, and needs to be present at the compact seminar days. Details will be explained during the seminar.


  • 17.4. Lecture 1 (Ulrike von Luxburg): Introduction and Orga. Venute: Lecture hall MvL6, 14:15 - 15:45
  • 24.4. Lecture 2 (Gunnar König): title (Lecture hall, MvL6)
  • (1.5. no seminar, public holiday)
  • 8.5. Lecture 3 (Robi Bhattacharjee): LIME and SHAP. Lecture hall, MvL6)
  • 15.5. Lecture 4 (Sebatian Bordt): Interpretable Machine Learning. Attention, this lecture takes place in the MvL6 Glassroom, 3rd floor
  • 26.6. All day: presentations by seminar participants (Lecture Hall MvL6)
  • 3.7. All day: Presentations by seminar participants (Lecture Hall MvL6)

List of all papers

  • Paper 1: D. S. Watson, and M. N. Wright. "Testing conditional independence in supervised learning algorithms." Machine Learning 110.8 (2021): 2107-2129.
  • Paper 2: S. Wachter, B. Mittelstadt, and C. Russell. "Counterfactual explanations without opening the black box: Automated decisions and the GDPR." Harv. JL & Tech. 31 (2017): 841.
  • Paper 3: P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang. "Concept bottleneck models." International conference on machine learning (ICML), 2020.
  • Paper 4: B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, and R. Sayres. "Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)." International conference on machine learning (ICML), 2018.
  • Paper 5: M. Sundararajan, A. Taly, and Q. Yan. "Axiomatic attribution for deep networks." International conference on machine learning (ICML), 2017.
  • Paper 6: I. C. Covert, S. Lundberg, and S. Lee. "Understanding global feature contributions with additive importance measures." Neural Information Processing Systems (NeurIPS), 2020
  • Paper 7: D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. "Fooling lime and shap: Adversarial attacks on post hoc explanation methods." Conference on AI, Ethics, and Society, 2020.
  • Paper 8: J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. "Sanity checks for saliency maps." Neural information processing systems (NeurIPS), 2018.
  • Paper 9: F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. Vaughan, and H. Wallach. "Manipulating and measuring model interpretability." CHI conference on human factors in computing systems, 2021.
  • Paper 10: S. Hooker, D. Erhan, P. Kindermans, and B. Kim. "A benchmark for interpretability methods in deep neural networks." Neural information processing systems (NeurIPS), 2019.
  • Paper 11: N. Nanda, L. Chan, T. Lieberum, J. Smith, and J. Steinhardt. "Progress measures for grokking via mechanistic interpretability." arXiv preprint arXiv:2301.05217(2023).
  • Paper 12: A. Karimi, B. Schölkopf, and I. Valera. "Algorithmic recourse: from counterfactual explanations to interventions." Conference on fairness, accountability, and transparency (FAccT), 2021.
  • Paper 13: A. Geiger, H. Lu, T. Icard, and C. Potts. "Causal abstractions of neural networks." Neural Information Processing Systems (NeurIPS), 2021.
  • Paper 14: S. Dasgupta, N. Frost, and M. Moshkovitz. "Framework for evaluating faithfulness of local explanations." International Conference on Machine Learning (ICML), 2022.
  • Paper 15: D. Janzing, L. Minorics, and P. Blöbaum. "Feature relevance quantification in explainable AI: A causal problem." International Conference on artificial intelligence and statistics (AISTATS), 2020.
  • Paper 16: G. Hooker, L. Mentch, and S. Zhou. "Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance." Statistics and Computing 31: 1-16, 2021.
  • Paper 17: S. Srinivas, and F. Fleuret. "Full-gradient representation for neural network visualization." Neural information processing systems (NeurIPS), 2019.
  • Paper 18: S. Kirishna, T. Han, A. Gu, J. Pombra, S. Jabbari, S. Wu, and H. Lakkaraju . "The disagreement problem in ele machine learning: A practitioner's perspective." arXiv preprint arXiv:2202.01602 (2022).
  • Paper 19: A. Dombrowski, M. Alber, C. Anders, M. Ackermann, K. Müller, and P. Kessel. "Explanations can be manipulated and geometry is to blame." Neural information processing systems (NeurIPS), 2019.
  • Paper 20: B. Lengerich, S. Tan, C. Chang, G. Hooker, and R. Caruana. "Purifying interaction effects with the functional anova: An efficient algorithm for recovering identifiable additive models." International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
  • Paper 21: J. D. Janizek, A. B. Dincer, S. Celik, H. Chen, W. Chen, K. Naxerova, and S. Lee . "Uncovering expression signatures of synergistic drug response using an ensemble of explainable AI models." BioRxiv (2021): 2021-10.
  • Paper 22: E. Candes, Y. Fan, L. Janson, and J. Lv. "Panning for gold:‘model-X’knockoffs for high dimensional controlled variable selection." Journal of the Royal Statistical Society Series B: Statistical Methodology 80.3 (2018): 551-577.