Moritz Haas
University of Tübingen
Department of Computer Science
Maria von Linden Str. 6
72076 Tübingen
Germany
Room: 30-5/A15
Phone: +49 (0)7071 29-70848
E-mail: mo.haas(at)uni-tuebingen.de
For an up-to-date list of my latest publications please see Google Scholar. My goal is developing a mechanistic understanding of deep learning that results in practical benefits. On the side, I am trying to improve statistical methods in climate science. Feel free to reach out, if you are interested in any of the two topics!
Research
In May 2021, I started my PhD under joint supervision of Ulrike von Luxburg in the Theory of Machine Learning group (TML) at Tuebingen university, computer science department, and Bedartha Goswami from the machine learning in climate science group. I am a scholar in the International Max Planck Research School for Intelligent Systems (IMPRS-IS), a graduate school for PhD students from both university and Max-Planck-Institute in Tuebingen and Stuttgart.
For my master thesis, I analysed Wasserstein GANs statistically. (pdf) Interestingly, our excess risk bound for unconditional WGANS captures a key advantage of generative models: Since we can generate as many samples as we want, the generalization error is only limited by the critic network and the dataset size. If we generate enough samples (and assume to find a global optimizer), the generator network may be arbitrarily large. We also show that large critic networks metrize weak convergence, that is they are able to distinguish arbitrary pairs of distributions and guide the generator to reproduce the data distribution.
At the beginning of my PhD, we explored empirical distortions in climate networks originating in limited amounts of noisy data (pdf). We also found that common resampling procedures to quantify significant behaviour in climate networks do not adequately capture intrinsic network variance. While we propose a new resampling framework, the question of how to reliably quantify intrinsic network variance from complex climatic time series remains the matter of ongoing work.
In my second year, we explored when kernel as well as neural network models that overfit noisy data can generalize nearly optimally. Previous literature had suggested that kernel methods can only exhibit such `benign overfitting' if the input dimension grows with the number of data points. Together with David Holzmüller and Ingo Steinwart, we show that, while overfitting leads to inconsistency with common estimators, adequately designed spiky-smooth estimators can achieve benign overfitting in arbitrary fixed dimension. For neural networks with NTK parametrization, you just have to add tiny fluctuations to the activation function. It remains to study whether a similar adaptation of the activation function or some other inductive bias towards spiky-smooth functions can also lead to benign overfitting with feature-learning neural architectures and complex datasets. (pdf)
Lately, I have been interested in signal propagation theory of neural networks, in particular throughout training. Naively scaling standard neural network architectures and optimization algorithms loses desirable properties such as feature learning in large models (see the Tensor Program series by Greg Yang et al.). We show the same for sharpness aware minimization (SAM) algorithms: There exists a unique nontrivial width-dependent and layerwise perturbation scaling for SAM that effectively perturbs all layers and provides in width-independent dynamics. A crucial practical benefit is transfer of optimal learning rate and perturbation radius jointly across model scales. In a second paper, we show that for the popular Mamba architecture, the maximal update parameterization and its related spectral scaling condition fail to induce the correct scaling properties, due to Mambas structured Hippo matrix and its selection mechanism. We derive the correct scaling using random matrix theory that necessarily goes beyond the Tensor Programs framework.
Publications
- Moritz Haas, Jin Xu, Volkan Cevher, Leena Chennuru Vankadara. Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling, pdf (preprint coming soon), ICML 2024 Workshop "High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning".
- Leena Chennuru Vankadara, Jin Xu, Moritz Haas, Volkan Cevher. On Feature Learning in Structured State Space Models, (preprint coming soon), Spotlight at ICML 2024 Workshop "Next Generation of Sequence Modeling Architectures".
- Moritz Haas*, David Holzmüller*, Ulrike von Luxburg, Ingo Steinwart. Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension, pdf, NeurIPS 2023.
- Moritz Haas, Bedartha Goswami, Ulrike von Luxburg. Pitfalls of Climate Network Construction: A Statistical Perspective, pdf, Journal of Climate 2023.
- Moritz Haas, Stefan Richter. Statistical analysis of Wasserstein GANs with applications to time series forecasting, pdf, arXiv:2011.03074.