Complex text formatting in Matplotlib using LaTeX

To create complex text formatting in Matplotlib we have to use LaTeX. The standard text methods in Matplotlib use a text object that formats the entire string: this means we can make all of a string bold but not part of it. For complex formatting Matplotlib lets us delegate all text handling to LaTeX. By using LaTeX mark-up within the string we can do complex formatting like partially bolding some words. Figure 1 below demonstrates some of the formatting possibilities using LaTeX.

LaTeX formatting of matplotlib example plot image

Figure 1: LaTeX formatting example

This post covers how to set-up LaTeX on Linux (Ubuntu), how to format labels and annotations, and some of the gotchas I've discovered. We'll reserve colouring text for another time.

It builds on previous posts that covered text handling and describing plots in Matplotlib, see styling plots and standard text handling.

Matplotlib was primarily developed as a scientific visualisation library, so it's not that surprising that it uses LaTeX [1] for complex formatting. It's fairly complicated to set-up and use if you don't know LaTeX. I'm not an expert on LaTeX so treat this post with caution! Everything was tested on Ubuntu 14.04 LTS and Python 3.

To use LaTeX to format Matplotlib text we have to use LaTeX for all text formatting [2], the downside is that it's significantly slower to complete processing a plot. The output formats (backends) that support LaTeX are limited to AGG, PS, PDF and PGF. Consequently, to create SVG it's necessary to follow a conversion process which means two steps to create a Web ready image. Finally, the grey backgrounds in some styles didn't show up (e.g ggplot, fivethirtyeight and BMH), which makes ggplot in particular unusable.

With those constraints in mind, our first challenge is to set-up LaTeX.

Install LaTeX

Texlive is a distribution of LaTeX [3], it's available through the Ubuntu repositories. Installation is simple:

$ sudo apt-get install texlive
$ sudo apt-get install texlive-latex-extra
$ sudo apt-get install texlive-latex-recommended
$ sudo apt-get install dvipng

This is a very large download (~500MB), you can reduce it a little bit by removing the documentation.

Test LaTeX

There are a couple of resources on checking that your LaTeX environment is working correctly [4]. The core of the instructions are to create a file called 'test.tex' and put the following in it [5].

\documentclass[a4paper,12pt]{article}
\begin{document}

The foundations of the rigorous study of \emph{analysis}
were laid in the nineteenth century, notably by the
mathematicians Cauchy and Weierstrass. Central to the
study of this subject are the formal definitions of
\emph{limits} and \emph{continuity}.

Let $D$ be a subset of $\bf R$ and let
$f \colon D \to \mathbf{R}$ be a real-valued function on
$D$. The function $f$ is said to be \emph{continuous} on
$D$ if, for all $\epsilon > 0$ and for all $x \in D$,
there exists some $\delta > 0$ (which may depend on $x$)
such that if $y \in D$ satisfies
\[ |y - x| < \delta \]
then
\[ |f(y) - f(x)| < \epsilon. \]

One may readily verify that if $f$ and $g$ are continuous
functions on $D$ then the functions $f+g$, $f-g$ and
$f.g$ are continuous. If in addition $g$ is everywhere
non-zero then $f/g$ is continuous.

\end{document}

Run the following:

# converts the tex file to a dvi file
$ latex test.tex
# open the dvi file viewer
$ xdvi test.dvi
# convert the text file to a pdf
$ pdflatex test.tex
# open the pdf with whatever pdf viewer is installed
$ xdg-open test.pdf

The first command converts test.text to test.dvi and the second command lets you view it with xdvi so you can check the formatting. Rather than converting to DVI, the third command converts from LaTeX directly to PDF.

Test LaTeX in Matplotlib

With LaTeX installed and functioning on the system, the next step is to confirm it's working correctly in Matplotlib. The Matplotlib documentation covers using LaTeX extensively, see Text rendering with LaTeX. The easiest way to check is:

Standard Matplotlib Tex demo image

Figure 2: Standard Matplotlib Tex demo

  1. Download the example from the documentation.
  2. Run the demo as python3 tex_demo.py
  3. Check the created file with xdg-open tex_demo.png

You should see something similar to the figure on the right.

Setting Matplotlib for PDF and XeteX

To create plots we have to decide which output format to use, as conversion from LaTeX to an output format isn't handled correctly by all of them. For example, if we format some text using LaTeX in matplotlib and save directly as SVG (with plt.savefig()) the backend will not process this correctly and we'll lose the formatting. Outputting from Matplotlib as PDF is the best option as it supports vector graphics output, as a second step we can use pdf2svg or inkscape to do conversion to SVG.

For processing we'll hand-off fonts and marking our plot to LaTeX. This means we have to set-up LaTeX fully for processing all text handling elements, and tell Matplotlib's native text handling to get out of the way. It's surprisingly complicated!

LaTeX is a mature system that was created before modern standard fonts (e.g. TTF and OpenType) so by default it's unaware of them. To use standard fonts we set Matplotlib to use PGF [6] and the XeteX processor as these are font aware. The first step is to install XeTeX:

$ sudo apt-get install texlive-xetex

In our plotting source code we tell Matplotlib to use the PGF backend for processing PDF output. This fragment of code goes at the top of a plot:

import matplotlib
from matplotlib.backends.backend_pgf import FigureCanvasPgf
matplotlib.backend_bases.register_backend('pdf', FigureCanvasPgf)

The default is that LaTeX will use a different font to the general font that Matplotlib uses, it's best to set the fonts explicitly and to tell it how we want the Figure set-up. It's possible to define these settings with the more Pythonic plt.rcParams['blah']='blah' but there are a lot of them, so it's easier to do this [7]:

pgf_with_latex = {
    "pgf.texsystem": "xelatex",         # use Xelatex which is TTF font aware
    "text.usetex": True,                # use LaTeX to write all text
    "font.family": "serif",             # use serif rather than sans-serif
    "font.serif": "Ubuntu",             # use 'Ubuntu' as the standard font
    "font.sans-serif": [],
    "font.monospace": "Ubuntu Mono",    # use Ubuntu mono if we have mono
    "axes.labelsize": 10,               # LaTeX default is 10pt font.
    "font.size": 10,
    "legend.fontsize": 8,               # Make the legend/label fonts a little smaller
    "xtick.labelsize": 8,
    "ytick.labelsize": 8,
    "pgf.rcfonts": False,               # Use pgf.preamble, ignore standard Matplotlib RC
    "text.latex.unicode": True,
    "pgf.preamble": [
        r'\usepackage{fontspec}',
        r'\setmainfont{Ubuntu}',
        r'\setmonofont{Ubuntu Mono}',
        r'\usepackage{unicode-math}'
        r'\setmathfont{Ubuntu}'
    ]
}

matplotlib.rcParams.update(pgf_with_latex)

The first option, pgf.texsystem tells Matplotlib to use the xelatex program to process the PGF backend. The second option, text.usetex tells Matplotlib that all text should be processed using LaTeX.

The various font and axes lines set how Matplotlib processes parts of the plot, we covered many of these in a previous post. Defining pgf.rcfonts to False means that the backend will obey the fonts defined in the pgf.preamble rather than over-riding and using whatever is in your Matplotlib configuration parameters (e.g. ~/.matplotlibrc). The benefit is we're explicitly defining how XeTeX will function, and there's no risk of confusion with Matplotlib using settings from elsewhere. We also tell XeTeX to use unicode (text.latex.unicode) which allows us to send extended characters [8].

At this point we've told Matplotlib how to handle the plot and that it should hand over to the LaTeX system. Next, we have to tell the LaTeX processor how it should handle the incoming stream.

The pgf.preamble section has directives that control how xelatex command processes the document for PGF output. These are LaTeX commands to load packages and alter settings. If we were using the standard LaTeX backend then we'd provide an equivalent latex-preamble section. We set \usepackage{fontspec} so that we can define the fonts in LaTeX output, specifically \setmainfont{Ubuntu} and \setmonofont{Ubuntu Mono}. The fontspec package is part of the Ubuntu texlive-latex-recommended package which you may need to install.

This set-up is sufficient to show the Ubuntu font for text output like annotate() and xlabel().

Axis Font

At the moment the Axis (ie the numbers along the X and Y axis) will be in the default sans-serif font: many people consider this to be correct for complex maths which is why it's the default. However, I want the same font on all elements of my plot. Matplotlib defines the font LaTeX should use for the Axis in the Mathsfont setting which also controls how maths equations display. There are a few options for changing it, depending on your requirements.

Many free fonts cannot display maths script completely [9]. If you use maths script then the easiest option is to use the cmbright sans serif font package. To install it get the package texlive-fonts-extra and put the following in pgf.preamble:

r'\usepackage{cmbright}'

An alternative way to solve this is to update the Axis after it's been plotted [10], but while a clever hack it's messy as it intermingles standard matplotlib labelling and LaTeX labelling.

The last option is to change the font Matplotlib uses for maths. The Ubuntu font is not fully maths capable, but as I'm not using maths script (my graphs are simple numbers) it's fine for my purposes. Consequently, I define the maths font using the unicode-math package. Install the texlive-math-extra package, then in Matplotlib we can do:

r'\usepackage{unicode-math}',
r'\setmathfont{Ubuntu}'

It's possible to use any TTF font, the easiest way to see which ones are available on the command line is:

$ fc-list : family file | grep -i libertine

Running gnome-font-viewer gives a visual view of what the font looks like.

Formatting strings

Having completed the set-up, formatting text is pretty straightforward!

LaTeX uses a lot of back slashes to express formatting. To represent them in a normal Python string literal requires doubling up the slashes so that Python knows we're not trying to create an escape sequence (e.g \n). It's nicer to define formatting strings as raw string literals for Python, which is just a string with r at the start. An example of this:

plt.ylabel(r'\textbf{A Python raw string} with LaTeX formatting')

This code sets the part of the string "A Python raw string" to be bold. We can use many of the common LaTeX mark-up's for formatting text strings, the common ones are:

Format string Output
textbf{words to bold} Bold text
underline{some underlined text} Underlining text
textit{words to italic} Italics
newline or \ Embed a newline in the text - not with the PGF backend
plt.text(r'textit{Some text}', fontsize=16, color='blue'} Standard string formatting options in Matplotlib work.

The PGF backend doesn't support using LaTeX codes for newlines: according to this GitHub issue [11] the underlying problem is that LaTeX doesn't support newlines in a variety of situations. This is problematic if you're doing a multi-line annotation() or text() but there are a couple of options. The first is to mix raw text strings and normal strings together, putting newlines in the normal string:

txtcomment = r'\textbf{First line} of text' + '\n'

The downside with this approach is you have to manually work out where you want newlines.

The second option is to use textwrap.dedent() and textwrap.fill() with multi-line strings. The advantage of the multi-line string is we can tab the text in nicely in the source code, and in the output we can automatically wrap it at whatever length we want. We have to use double backslashes to escape the LaTeX codes properly, and add normal strings if we want to specifically force a newline at a set point in the string:

note1_txt = 'This is the first line, with a line break \n'
note1_multiline = '''\
                \\textit{These lines are} tabbed in to match
                but will be displayed using textwraps width
                argument. Both strings can have LaTeX in them
                '''
# Remove the indents from the multi-line text, then reformat it to 80 chars
# Add the two strings together so we make a final one to put on the plot
note1_txt += tw.fill(tw.dedent(note1_multiline.rstrip()), width=80)
plt.text(0.6, 130, note1_txt)

Post processing

The output with plt.savefig() should either be a PGF image to use within a LaTeX document, or a PDF document. We can convert the PDF image into an SVG image suitable for the Web [12] using inkscape [13], pdf2svg or pdftocairo:

$ /usr/bin/pdftocairo -svg some-example.pdf file-to-publish.svg

Generally, the best results are from using transparent=True and bbox_inches='tight' in the call to plt.savefig()

LaTeX example

In this example we use the PGF backend with LaTeX to do complex formatting on strings. It was output as a PDF and then converted to SVG for display with pdftocairo. The results are shown in Figure 1 at the top of this post.

#!/usr/bin/env python3
# Set-up PGF as the backend for saving a PDF
import matplotlib
from matplotlib.backends.backend_pgf import FigureCanvasPgf
matplotlib.backend_bases.register_backend('pdf', FigureCanvasPgf)

import matplotlib.pyplot as plt
import textwrap as tw

# Style works - except no Grey background
plt.style.use('fivethirtyeight')

pgf_with_latex = {
    "pgf.texsystem": "xelatex",     # Use xetex for processing
    "text.usetex": True,            # use LaTeX to write all text
    "font.family": "serif",         # use serif rather than sans-serif
    "font.serif": "Ubuntu",         # use Ubuntu as the font
    "font.sans-serif": [],          # unset sans-serif
    "font.monospace": "Ubuntu Mono",# use Ubuntu for monospace
    "axes.labelsize": 10,
    "font.size": 10,
    "legend.fontsize": 8,
    "axes.titlesize": 14,           # Title size when one figure
    "xtick.labelsize": 8,
    "ytick.labelsize": 8,
    "figure.titlesize": 12,         # Overall figure title
    "pgf.rcfonts": False,           # Ignore Matplotlibrc
    "text.latex.unicode": True,     # Unicode in LaTeX
    "pgf.preamble": [               # Set-up LaTeX
        r'\usepackage{fontspec}',
        r'\setmainfont{Ubuntu}',
        r'\setmonofont{Ubuntu Mono}',
        r'\usepackage{unicode-math}',
        r'\setmathfont{Ubuntu}'
    ]
}

matplotlib.rcParams.update(pgf_with_latex)

fig = plt.figure(figsize=(8, 6), dpi=400)
plt.bar([1, 2, 3, 4], [125, 100, 90, 110], label="Product A",
        width=0.5, align='center')
ax1 = plt.axis()

# LaTeX \newline doesn't work, but we can add multiple lines together
annot1_txt = r'Our \textit{"Green Shoots"} Marketing campaign, started '
annot1_txt += '\n'
annot1_txt += r'in Q3, shows some impact in Q4. Further \textbf{positive} '
annot1_txt += '\n'
annot1_txt += r'impact is expected in \textit{later quarters.}'

# Annotate using an altered arrowstyle for the head_width, the rest
# of the arguments are standard
plt.annotate(annot1_txt, xy=(4, 80), xytext=(1.50, 105),
            arrowprops=dict(arrowstyle='-|>, head_width=0.5',
                            linewidth=2, color='black'),
            bbox=dict(boxstyle="round", color='yellow', ec="0.5",
                    alpha=1))

# Adjust the plot upwards at the bottom so we can fit the figure
# comment as well as the ylabel()
plt.subplots_adjust(bottom=0.15)

# We want a figure text with a separate new line
fig_txt = '\\textbf{Notes:}\n'
comment2_txt = '''\
Sales for \\textit{Product A} have been flat
through the year. We expect improvement after the new release
(codename: \\underline{Starstruck}) in Q2 next year.
'''
fig_txt += tw.fill(tw.dedent(comment2_txt.rstrip()), width=80)
# The YAxis value is -0.06 to push the text down slightly
plt.figtext(0.5, -0.06, fig_txt, horizontalalignment='center',
        fontsize=12, multialignment='left',
        bbox=dict(boxstyle="round", facecolor='#D8D8D8',
                ec="0.5", pad=0.5, alpha=1))

# Standard description of the plot
# Set xticks, font for them is set globally
plt.xticks([1, 2, 3, 4], ['Q1', 'Q2', 'Q3', 'Q4'])
plt.xlabel(r'\textbf{Time} - FY quarters')
plt.ylabel(r'\textbf{Sales} - unadjusted')
plt.title('Total sales by quarter')
plt.legend(loc='best')

plt.savefig('matplot-latex.pdf', bbox_inches='tight', transparent=True)

LaTeX resources

These are the most useful resources I found for Matplotlib and LaTeX:

Final words

With LaTeX handling the formatting of all text we can mark-up our plots with any form of complex formatting we want. The constraints are that there are two steps to creating Web ready images, grey backgrounds on styles don't display properly and the processing of a plot is slow. Despite those issues I think the results are worth the extra effort.

[1]Overview of LaTeX on Wikipedia and TeX.
[2]This is not strictly true. If you're only interested in maths text then LaTeX input is supported by default, see the documentation.
[3]LaTeX on Ubuntu provides a good introduction to the LaTeX distribution options on Linux.
[4]James Trimbles' answer to Getting started on LaTeX and Manuel Quintero's short tutorial How to install LaTex on Ubuntu 14.04 LTS.
[5]Example from the LaTex Primer via James Trimbles answer.
[6]PGF provides direct embedding in LaTeX documents and matplotlib, for my use case it's really that we're using XeTeX for handling the LaTeX input which is key.
[7]Most of these settings are from Bennett Kanuka's great post on Native LaTeX plots.
[8]Matplotlib and Xelatex explains the main settings, note that your source file also needs to be set for Unicode e.g. coding:utf-8 in vim.
[9]Linux maths fonts
[10]Latex font issues using amsmath and sfmath for plot labelling
[11]PGF backend: Lines in multi-line text drawn at same position
[12]Convert PDF to clean SVG
[13]Using Inkscape is covered in Wikipedia PDF Conversion to SVG.

Posted in Tech Sunday 13 March 2016
Tagged with Python Matplotlib