Since writing a dissertation is usually something that you do only once, I gathered a lot of knowledge that I probably won't need anymore. This is a brief summary of how the LaTeX code of my dissertation is structured. I hope it can be an inspiration for someone else. As I used continuous integration to automatically build the PDF and check PDF/A-1b validity, I will also talk about this in the end.
Include vs. Input
If you want to split your LaTeX code into multiple files you can use
'\input
' or '\include
'.
\input{mychapter}
will have the same effect as copying everything from mychapter.tex
here.
\include{mychapter}
will create a page break before it includes the file. It does not work in the
preamble though. The benefit of \include
is that you can put
\includeonly{mychapter, mychapter2}
in the preamble to only compile a selected subset of all chapters to make compilation much faster. It does not mess up the references (e.g., table of contents, bibliography, and glossary). This can be used if you work on a specific chapter. I found this out at the end of writing my thesis. That is why I put it in the beginning here. Don't make the same mistake. This will save you lots of time.
I use \input
for everything in the preamble (e.g., package imports,
glossary entries) and \include
for my chapters.
Document Class
\documentclass[
draft=false,
paper=a4,
paper=portrait,
pagesize=auto,
fontsize=11pt,
version=last,
headings=twolinechapter,
listof=totoc,
listof=chapterentry]
{scrbook}
I highly recommend to use the book class of Koma-Script for a dissertation.
This is my configuration for the final document. One of the more obscure
options here is 'headings=twolinechapter
', which will print "Chapter X."
in a separate line before the title of a chapter.
'listof=totoc,listof=chapterentry
' will add lists of tables and figures
as chapters to the table of contents.
I recommend to use Koma-Script in combination with the following package:
\usepackage{scrhack}
This will fix some common problems
Links:
Document Structure
This is the overall structure of the LaTeX document.
A large document is often split into (1) something before the main text (front matter), (2) main text (main matter), and (3) something after the main text (back matter).
\documentclass{...}
% ... many imports and definitions here ...
\begin{document}
\frontmatter
% title page(s)
\tableofcontents
% acknowledgments
% abstract
This is everything before the main text. '\frontmatter
' will activate
roman page numbers and deactivate chapter numbering.
\mainmatter
\part{Title}
\chapter{Title}
Text
\section{Title}
Text
\subsection{Title}
Text
\subsubsection{Title}
Text
'\mainmatter
' will switch to Arabic page numbers and turn on chapter
numbering. So here we have all the main text with numerous subdivisions.
I avoided to have less than two subdivisions per level. If you have,
e.g., a section with only one subsection, you should think about whether
you really need a subsection here. You might want to integrate the subsection
in the section or make a new section instead of a subsection.
\appendix
\chapter{Title}
'\appendix
' will switch chapter numbering to letters.
\backmatter
\printglossaries
\listoffigures
\listoftables
\chapter{Bibliography}
\printbibliography[heading=none]
\end{document}
'\backmatter
' will turn of chapter numbering. Putting the list of figures
and tables to the end is unusual. These typically are part of the front
matter. I found it to distracting at the beginning though and put it to
the end. We will come to the glossary and the bibliography later.
Links:
Bibliography and Citations
Although I used bibtex before, I chose to use biber and biblatex for my thesis.
\usepackage[
style=alphabetic,
natbib=true,
backend=biber,
maxnames=2,
maxbibnames=99]{biblatex}
\addbibresource{literature.bib}
The main reason for this decision is that it allows to easily create environments for which you can generate a specific bibliography. I used this feature to generate for each chapter a list of corresponding publications. Here is an example:
Code:
\paragraph{Related Publications} \mbox{}
\begin{refsection}
\printbibliography[heading=none]
Blablabla.
Blablabla \citep{key1}.
Blablabla.
Blablabla \citep{key2}.
Blablabla.
Blablabla \citep{key3}.
\end{refsection}
|
Result: |
The option 'natbib
' ensures that you can use '\citep{key}
' to generate
citations of the form '(Names et al., Year)' and '\citet{key}
' to generate
'Names et al. (Year)'.
Now you have to run 'biber documentname
' (without file ending) after the
first pass of your LaTeX engine. A useful feature of biber is that it can
also check your .bib files for validity with 'biber --tool -V literature.bib
'.
For better line breaks I used the command '\sloppy
' before the complete
bibliography. Otherwise I had many lines (URLs, DOIs, etc.) that would go over
the text borders. It allows more space between words though.
Note that "[w]hen using babel [...] with biblatex, loading csquotes is recommended to ensure that quoted texts are typeset according to the rules of your main language." (source)
\usepackage[english=american]{csquotes}
csquotes
is also good if you want to quote in the text:
\textcquote{CitationKey}{Some text here with ellipses at the end \textelp{}}
or as a new block:
\blockcquote[pages 100--101]{CitationKey}{Some long text.}
Links:
List of Abbreviations and Acronyms
In smaller documents it is easy to keep track of abbreviations and whether you already introduced them or not. With a document exceeding 100 pages I would suggest to use tools that support a proper glossary. LaTeX can do this automatically.
\usepackage[toc]{glossaries}
To create a glossary you have to call
\makeglossaries
in the document. Then you can define abbreviations with
\newacronym{MDP}{MDP}{Markov decision process}
Now
\gls{MDP} or \glspl{MDP}
will result in "Markov Decision Process (MDP) or MDPs" ('\glspl
' means plural).
The abbreviation will be introduced once and then only the abbreviation will be
printed. No need to think about it. Later in the document you can print the
glossary with
\printglossaries
The package option 'toc
' will add it to the table of contents.
Note that the external command 'makeglossaries documentname
' (without file
ending) has to be executed after the first pass of your LaTeX engine.
Links:
Document Layout
There are several packages that can help you to fine-tune the layout.
\usepackage{layout}
This package provides the command \layout
that will print page layout
variables for your document. This is the result in my case (shows both sides):
If you want to find out specific dimensions in the unit that you are interested in, the following package is useful:
\usepackage{layouts}
For example, put
\printinunitsof{mm}\prntlen{\textwidth}
\printinunitsof{mm}\prntlen{\textheight}
in the document for millimeter. I found it useful to find out the text dimensions and then scale figures appropriately without rescaling them in LaTeX.
A good way to check if there are any overfull horizontal boxes is the
'showframe
' option of the geometry package:
\usepackage[showframe]{geometry}
This will draw lines around the important areas of each page.
The following package allows you to draw a grid on every page to precisely place elements.
\usepackage[grid=true,gridBG=true]{eso-pic}
Links:
The Mean Printer and Marginal Notes
I printed my dissertation on my own. It turned out that the printer that I used (with the specific driver that I used) didn't print 4.23 mm at each side of the paper. Unfortunately, I had some marginal notes that the printer wouldn't print completely; one and a half letter were missing. I fixed that by setting
\usepackage[marginparwidth=75pt]{geometry}
after using the layout package to find out the previous marginparwidth
and
converting 4.23 mm to pt according to this table.
Since there was now less space for the margin notes, LaTeX couldn't find
a good way to break lines and I modified my command to make notes on
the margin to
\newcommand{\marginparc}[1]{%
\ifthispageodd
{\marginpar{\raggedright\footnotesize #1}}
{\marginpar{\raggedleft\footnotesize #1}}}
which works only with Koma-Script in this form. The text is smaller and it is aligned to the direction of the page's center. But the conclusion of this should be: check the printer before you print!
Encoding
If you use pdflatex and have to write accented characters (and this is very likely if you want to cite anything from authors with Portuguese, Spanish, French, Italian, German, Danish, Norwegian, Swedish, Finnish, ... names) you should better import the fontenc package before inputenc. It enables proper hyphenation of words that contain accented characters and allows to copy those characters from the output document. If you want to write your thesis in UTF-8 you should import inputenc. This is not required if you build your thesis with a UTF-8 based engine like lualatex though.
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
Links:
Language
This is required in my thesis because it is written in English but has a German summary in the beginning because this is required by my university.
\usepackage[ngerman,english]{babel}
Babel changes the language of a document, that is, translates "table of contents", "chapter", citations, and dates if the default language is not English. It will also apply proper hyphenation to the text. It is possible to use multiple languages. The default language is the last option of the package. We can switch to other languages per section with
\begin{otherlanguage}{ngerman}
Etwas Text in deutscher Sprache...
\end{otherlanguage}
Links:
Figures and Illustrations
Wide Image
\usepackage{chngpage}
This is something that you usually should not do, but sometimes I found it aesthetic to extend a figure a bit on one side:
\hfuzz=100pt % suppress warnings
\begin{adjustwidth}{-\evensidemargin-0.8in}{-\rightmargin}
\includegraphics[width=0.92\paperwidth]{path}
\end{adjustwidth}
\hfuzz=0pt
Note that this is for an image on a left page. On a right page
you have to play around with \oddsidemargin
and \leftmargin
.
Links:
Full Page Image
\usepackage{floatpag}
This package provides the command '\thisfloatpagestyle
' to remove the header
and the page number and I used it for large figures that fill an entire page.
\begin{figure}
\thisfloatpagestyle{empty}
\includegraphics[width=\textwidth]{figure}
\caption{Bla bla.\label{fig:bla}}
\end{figure}
Links:
Barrier for Figures
Precise figure placement is hard.
\usepackage{placeins}
This package provides the command
\FloatBarrier
that prevents floats from moving past this line. I found this useful to force LaTeX to make a page with multiple figures.
Links:
Figure Surrounded by Text
\usepackage{wrapfig}
Example (figure on the right side):
\begin{wrapfigure}{r}{0.5\textwidth}
\centering
\includegraphics{figure}
\caption{Bla bla.\label{fig:bla}}
\end{wrapfigure}
Links:
Multiple Subfigures
In papers, you often see floats that consist of multiple figures that were arranged manually, for example, with a tabular environment. You don't have to do that. Use this package.
\usepackage[caption=false]{subfig}
It provides the command '\subfloat
':
\begin{figure}
\centering
\subfloat[Caption of first subfigure.\label{fig:bla1}]
{\includegraphics[width=0.45\textwidth]{subfigure1}}
\hfill
\subfloat[Caption of second subfigure.\label{fig:bla2}]
{\includegraphics[width=0.45\textwidth]{subfigure2}}
\caption{Caption of figure.\label{fig:bla}}
\end{figure}
If you have multiple subfigures with different heights it is difficult to arrange them vertically. There is a package for that.
\usepackage[export]{adjustbox}
Now, to arrange two figures vertically at the center, you can use the option
'valign=m
' of '\includegraphics
' that is provided by adjustbox and you
should add a phantom box of the size of the second figure to the first one.
\begin{figure}
\centering
\subfloat[Bla 1.]
{%
\includegraphics[width=0.45\textwidth,valign=m]{subfigure1}%
\vphantom{\includegraphics[width=0.45\textwidth,valign=m]{subfigure2}}%
}
\hfill
\subfloat[Bla 2.]
{\includegraphics[width=0.45\textwidth,valign=m]{subfigure2}}
\caption{Bla.}
\end{figure}
Links:
Tables
More Beautiful Tables
Here (in particular slide 8) is a good guide on how to make nice tables.
Table over Multiple Pages
In a few cases I had to make tables that are longer than one page.
\usepackage{longtable}
You can define a table head that appears on every page
(everything before '\endhead
').
\begin{longtable}{p{2cm}p{11.8cm}}
\toprule % you need the package 'booktabs' for this
A & B\\
\midrule
\endhead
1 & 2\\
1 & 2\\
1 & 2\\
1 & 2\\
\bottomrule
\end{longtable}
The rules are defined in the package booktabs.
\usepackage{booktabs}
Links:
Code Listing
For code formatting I use the following package:
\usepackage{listings}
Example:
\begin{algorithm}[t]
\begin{lstlisting}[language=Python,basicstyle=\footnotesize]
import numpy as np
x = np.linspace(0, 1, 101)
\end{lstlisting}
\caption{Some code.}
\label{alg:example}}
\end{algorithm}
Links:
Pseudocode
I use the following two packages to typeset pseudocode.
\usepackage{algorithm}
\usepackage{algpseudocode}
An algorithm in a float environment looks like this.
\begin{algorithm}
\begin{algorithmic}[1]
\Require algorithm inputs here
\State $a \gets 5$\\
\Comment{assign 5 to $a$}
\While{not converged}
\State $b \gets a + 1$
\For{$i \in \{1, \ldots, N\}$}
\State $b \gets 2 b$
\EndFor
\EndWhile
\end{algorithmic}
\caption{Caption.}
\label{alg:example}
\end{algorithm}
Links:
Math Fonts
\usepackage{newtxmath}
%\usepackage{amssymb} % conflicts with newtxmath
A font similar to Times New Roman for math. The package also provides several
typical mathematical symbols. Thus, it collides with amssymb
and you can't
use both. This is how the result looks like:
To change the appearance of '\mathcal
' I use the following package:
\usepackage[mathcal]{euscript}
Links:
SI Units
For consistent appearance of units I highly recommend this package.
\usepackage{siunitx}
Now you can write in your text:
\SI{1}{\metre}
\SI{2}{\centi\meter}
and much more complex units. You also don't have to think about whether you had put a space between the number and the unit before. The package automatically does this for you.
Links:
Todo Notes
I found it useful to put notes in the PDF output.
\usepackage{todonotes}
Use the option [disable]
for final version.
\todo{this will create a note on the margin}
\todo[inline]{this will create a comment in the text}
You can also print a list of all notes with
\listoftodos
Links:
Beginning of a Chapter
I like to put a related quote at the beginning of a chapter. You can do that with the scrbook class.
Code:
\renewcommand{\dictumwidth}{0.66\textwidth}
\dictum[\citet{Asada1996}]{%
The ultimate goal of AI and Robotics is to realize autonomous agents that
organize their own internal structure in order to behave adequately with
respect to their goals and the world.\\\textbf{That is, they learn.}}
|
Result: |
Another chapter looks like this at the beginning:
There are two elements here. The first one is the style of the heading, which can be defined in scrbook like this:
\definecolor{chaptercolor}{RGB}{152,152,152}
\renewcommand*\chapterheadstartvskip{\vspace*{45pt}}
\renewcommand*{\chapterformat}{%
\parbox{\textwidth}{\hfill\fontsize{35.83}{42}\selectfont\color{chaptercolor}\chapappifchapterprefix{\ }\bfseries\thechapter\autodot\enskip}}
\addtokomafont{disposition}{\normalfont\bfseries}
\addtokomafont{chapterprefix}{\mdseries}
The image at the top is included with tikz
:
% before document:
\usepackage{tikz}
% include image:
\tikz[remember picture,overlay] \node[opacity=0.75,inner sep=0pt] at (8,6.95){\includegraphics[width=\paperwidth]{image}};
If you want to do this, the image should of course be somewhat related to the content of the chapter.
Widows and Orphans
"In typesetting, widows and orphans are lines at the beginning or end of a
paragraph which are left dangling at the top or bottom of a page or column,
separated from the rest of the paragraph."
(Wikipedia)
A single line at the top or bottom of a page is just not aesthetic and we want
to avoid that. The simplest trick is to set high penalties for them globally in
the preamble (which means before '\begin{document}
'):
% penalizes orphans
\clubpenalty=10000
% penalizes widows
\widowpenalty=10000
% penalizes widows that are immediately followed by a formula \[ ... \]
\displaywidowpenalty=10000
Another trick that I used often is to enlarge a page by one line with
\enlargethispage{\baselineskip}
It looks better if both the left and right page have the same length, so I typically extended them both.
Links:
Prior Publications
Although my dissertation is a monograph, large parts of it have been published before. Prior publication must be indicated. This is good scientific practice. Otherwise, for example, survey papers can be distorted. I put marginal notes at the beginning of a chapter or section that refer to publications that the text is based on. At the end of each chapter the corresponding publications are listed. At the same place I describe my contributions and the contributions of my co-authors.
PDF/A-1b
My library requires a specific format for electronic publication of the thesis: PDF/A-1b.
The first step to ensure that your document is compatible is to test for
compatibility. There are many tools out there that claim to enable this but
many of them don't actually work. The tool that I relied on in the end is
veraPDF. Since it is used more as a GUI tool, I had
to find something that has a similar functionality but works from a command
line to integrate it into continuous integration. For that I use Apache
preflight 1.8.16 (available here),
which was compatible to the package openjdk-13-jdk
that I used in my Ubuntu
20.04 docker image that I used for continuous integration. I had problems
setting up preflight 2.X. Although preflight complains about some issues that
veraPDF does not see, these can be ignored so that I could test for PDF/A-1b
during continuous integration with good enough certainty.
If you only want to check whether all fonts are embedded, you can use
pdffonts document.pdf
and check the column emb.
The main problem with PDF/A-1b compatibility are PDF figures. I work a lot with inkscape for illustrations and matplotlib for plots. My solution for this problem is to use ghostscript to convert every PDF that is included in the main document to PDF/A-1b:
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sColorConversionStrategy=UseDeviceIndependentColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_figure.pdf input_figure.pdf
To make the main LaTeX document compatible I use the package pdfx, which already takes care of a lot of problems and should be important almost at the beginning:
\usepackage[a-1b]{pdfx}
However, this is not enough to fix all problems in my case so that I also had to let ghostscript convert the whole document.
Continuous Integration
I used git for version control and continuous integration to automatically build and validate the latest version of my thesis. I set up a docker container based on a recent Ubuntu to match my system.
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -qq && apt-get install -y \
texlive-full \
libreoffice \
pandoc \
git \
wget \
sudo \
openjdk-13-jdk \
poppler-utils \
ghostscript \
pdftk
RUN mkdir -p /opt/bin
COPY preflight-app-1.8.16.jar /opt/bin/preflight-app-1.8.16.jar
You certainly don't need all of these packages (e.g., libreoffice or pdftk).
As mentioned previously, preflight was used to check PDF/A-1b compatibility.
The docker image is also available as 'af01/dissertation
' from dockerhub.
I used GitLab CI. My setup looks similar to this (some checks are omitted):
stages:
- generate
- postprocessing
- evaluate
genpdf:
stage: generate
image: af01/dissertation
script:
- pdflatex -interaction=nonstopmode dissertation.tex
- biber dissertation
- makeglossaries dissertation
- pdflatex -interaction=nonstopmode dissertation.tex
- pdflatex -interaction=nonstopmode dissertation.tex
- cat dissertation.log
artifacts:
paths:
- $CI_PROJECT_DIR/dissertation.pdf
- $CI_PROJECT_DIR/dissertation.xmpdata
expire_in: 2 weeks
when: always
tags:
- docker
postprocesspdfa:
stage: postprocessing
image: af01/dissertation
script:
- scripts/convert_pdfa.sh dissertation dissertation_pdfa
- cp dissertation.xmpdata dissertation_pdfa.xmpdata
artifacts:
paths:
- $CI_PROJECT_DIR/dissertation_pdfa.pdf
- $CI_PROJECT_DIR/dissertation_pdfa.xmpdata
expire_in: 4 days
when: always
tags:
- docker
checkpdfa:
stage: evaluate
image: af01/dissertation
script:
- java -jar /opt/bin/preflight-app-1.8.16.jar dissertation_pdfa.pdf > pdfareport.txt || echo "Preflight report is not empty"
- cat pdfareport.txt
- scripts/check_pdfareport.sh pdfareport.txt
artifacts:
paths:
- $CI_PROJECT_DIR/pdfareport.txt
expire_in: 1 week
when: always
tags:
- docker