Tips for writing a Bachelor / Master thesis

This page is permanently under construction and not exhaustive. Whenever an important point occurs to me I am going to add it.

How long should it be?

A BSc thesis around 20-30 pages, a MSc thesis between 30 and 50 pages, in a standard page style. It can be shorter, inq particular if it is a theoretical thesis. If you have many plots or many references, it can become a bit longer. But I am not going to read 100 text pages, for sure.

Outline

The outline of your thesis can be quite flexible. But it has to ask a concrete question in the beginning, explain your approach to answer this question (think about whether your approach is really appropriate!), and give a concrete answer in the end.

By which criteria do I evaluate your thesis?

The most important aspect of a thesis is NOT that you prove your own new theorem, or you invent your own new algorithm. It is supposed to demonstrate that you can argue scientifically. My focus of evaluation is on the following aspects: Is the question well formulated? Did he/she attack the question in a reasonable way, obeying scientific standards? How did he/she evaluate the results? Did he argue in a correct way? Did he/she see the limitations of the own approach? Are the results convincingly described and interpreted? Just a tiny fraction of the overall mark has to do with ``novelty" or ``originality" in the sense of coming up with new ideas / proofs / algorithms.

German or english?

I do not care, as long as the language is correct and understandable. If you realize that you have difficulty to express yourself in English, it might be better to write in German. For people whose English is already good, but who still want to improve, I warmly recommend the following book: Joseph Willimas, Joseph Bizup: Style: lessons in clarity and grace.

How plots are supposed to look like

A plot always has axis labels in a readable font and a title (describing parameter settings, for example). A plot always has a caption, and this caption needs to contain a concise summary about what the plot shows. The plot should be understandable without flipping back and forth between text and plot. If the plot shows experimental results, the caption should summarize the setup of the experiment (e.g., parameter choices), so that one would be able to reproduce the plot. The interpretation/discussion of the results in the plot typically end up in the main text, not in the caption. If possible, plots should be black-and-white readable (if I print it on my black-and-white printer, I should be able to distinguish different curves).

References

Use bibtex to generate references and use the natbib style in the author-year format:
\usepackage[round,comma]{natbib}\bibliographystyle{plainnat}
This is important so people who know the field do not have to look up every single reference in the list of references.

Find out how a reference list is supposed to look like! For example, when you cite a journal paper, a complete reference consists of authors, title, journal, volume (number if exists), page numbers, year. When you cite a book, always cite the publisher and always mention the section / page / theorem you refer to (so the reader can find it quickly in the book if he/she wants to look it up). When you cite conference papers, do cite them properly (at least, name of the conference and year, page numbers are not necessary here). For the same venue always use the same citation style. Typically, we don't need urls/isbn numbers, unless you cite something that cannot be described otherwise. In the reference list, try to cite journal or conference papers, not technical reports or arxiv-preprints (unless the paper only exists in this form). When you download bibtex references from google scholar, you will need to hand-edit many of them, they are often not accurate! Please also check out the comments on the paper writing page.

Evaluate your algorithms thoroughly

In many theses, you will have to implement an algorithm and evaluate its performance. My experience shows that nearly all students underestimate what this means. The goal of such an evaluation is NOT to show that it works in simple cases. The goal of an evaluation is to really test in which cases an algorithm it works, in which cases it does not work, whether you can break its performance, how it behaves when you change its parameters and when you choose different types of input data. Note that I do not talk about a correct implementation here. I talk about the all the questions that arise once you have a correct implementation.

Example: Suppose you come up with a new algorithm to find clusters in a graph and you want to evaluate its performance and compare to the state of the art. Then what you do is the following:
  • In the beginning you might just want to play with your algorithm and data sets to gain intuition about its behavior (exploratory phase). But at some point, you will have to formulate a concrete question that you want to answer by simulations, and think about the most appropriate setup to answer that question. Both the question and the choice of the setup should be discussed in the thesis.
  • You generate lots of toy graphs for which you know the ground truth (the cluster structure). To this end, you try to choose a large variety of such graphs that cover lots of different properties a graph could have:hidden partition model (expander graph, small world), k-nearest neighbor graph (no expander, long paths), preferential attachment graph (power law behavior), and so on.
  • In all these models, you vary lots of parameters (number of sample points, number of clusters, clusters of different sizes or densities, dimensionality of your data points, ... )
  • Then you run your algorithm and you systematically play with the parameters your algorithm has.
  • You think about various different criteria to evaluate the result of an algorithm on a data set (error with respect to ground truth, cut size, balancedness, running time, etc)
  • You generate lots (!) of plots (!), not tables, that systematically show the evaluation criteria against the parameters and changes in the model. Take your time to think about what and how to visualize.
  • Up to here, everything is pretty automatic, but now the work starts: you actually look at your plots. For each scenario you think about what would be the result you had expected, and whether this is what you see in the plot. If yes, good. If no, you start thinking, debugging, understanding. Often, if plots do not follow your expectation, you still have bugs in your code. If you are reasonably sure that this is not the problem, then there might be something about the algorithm that you still have to understand. This implies that you are not done once you produced 100 plots and put them in the thesis. You need to tell me why they are interesting, whether they show what you expected, what I am supposed to conclude from them. This, in my opinion, is the one of the most important part of your thesis.
  • In your thesis, you might just describe a condensed version of all your results. What is the most important message? I am not going to look at 100 plots, pick the most relevant ones. And don't use tables...

Write concisely and correctly

  • In the part where you introduce the background, describe the maths or the algorithm, make sure your mathematical notation is correct, all notions get introduced properly, etc.
  • Make sure you describe all your experiments in such a way that I really know what you have done (science needs to be reproducible). How exactly did you generate the toy data sets? Which algorithms exactly did you use (and did you implement them yourselves or did you use a toolbox)? How did you set the parameters of your algorithms? What exactly is shown in the plots? etc

Lists of figures, tables, abbreviations

List if figures and tables: not needed and useless.

List of abbreviations: if you need a list of abbreviations, then it likely means that your thesis is hard to read (which is bad). You may use few abbreviations (introduce them in the text!). But only use abbreviations that you really need often, otherwise just use the full word. In general, try to avoid abbreviations.

How much time do I need to grade your thesis?

If you need to receive the grade for your thesis by a certain date, then you have to inform me well in advance. Typically, I need a time window of about four weeks to read and grade your thesis (longer during vacation time).