Clear Maths Graphic

How to write mathematics clearly and keep more readers

by Matthew Leitch, 4 September 2009

Good reasons to write mathematics clearly

If you read or write mathematics, at school, college, as a teacher, as a researcher, as an author – for any reason whatsoever – you may have noticed already that mathematical writing often isn't as clear as it might be.

But however many opportunities you see for improvement the chances are there's much more than you realize today.

The two main benefits of writing mathematics that is clearer and more interesting to read are as follows:

  1. You will make fewer mistakes.

  2. You will have more readers and they will appreciate your work more.

How many more readers? I wish I knew, but the extent of improvement that is possible is so great that I would not be surprised by a 100 fold increase in the proportion of readers who get at least 50% of the way through a paper or book chapter they have started, assuming it is not compulsory reading for them.

Not only do people read more of documents that are easier to read but also they rate the contents and the author more highly. It is possible, sometimes, to get away with the ploy of displaying cleverness by being baffling but it is better to be the first person to write something on a topic that readers can understand. That makes you the expert they like.

A message for people who find mathematics confusing

If you find maths hard to understand then let me reassure you that it is not entirely your fault. Most published mathematics is unclear, confusing, and riddled with errors. This even includes many maths textbooks and is normal in technical and scientific papers with some mathematical content.

Why mathematical writing is often so hard to read

Of course mathematics is often hard to read, in part, because what has to be said is inherently hard to grasp. The density of very specific and often unfamiliar ideas makes it slow and tiring work for most people.

At other times the problem may be that someone with limited mathematical skill is writing mathematics, perhaps thinking it will make a paper more impressive and likely to get published. At other times the writer has English as a second language and would struggle to write anything clearly in English.

However, things are made much, much worse by patterns of writing that have been passed down from one writer to another, usually without conscious thought. I can imagine the pioneers of mathematics, including famously haughty and difficult characters like Newton, Gauss, and the Bernoullis, writing in Latin maybe, establishing the basic character of mathematical writing. Mathematics is often written as a sequence of commands to the reader ‘Let x be...’ and ‘Consider the plane...’ with little attempt to give reasons for making that effort. Writers of mathematics often write in the first person plural with ‘...then we have...’. Could it be that these go right back to the days when the great mathematicians personally instructed their pupils, constantly asserting their higher social status in a way that today would seem inappropriate in the West?

Writing patterns get passed on in other disciplines too. When I graduated in psychology many years ago my writing was a copy of the writing I had been exposed to by hundreds of papers in psychology journals and books. It was long winded, pointlessly abstract, and boring. Likewise, graduates in mathematics write in the special style of mathematical journals and books. Similarly, graduates in science write mathematical material in the style they have seen in journals.

These writing patterns are pervasive, largely unconscious, and often regarded as traditional. When mathematicians make a special effort to write ‘properly’ or in a ‘formal’ style they write in a more old fashioned way and text gets harder to read, even though it is more complete and correct, the reverse of what should happen. Even mathematicians who make a special effort to be clear and interesting may be surprised to find just how much more they can do.

Happily, shedding those habits and learning to write clearly is possible. We can do it for mathematical writing just as for writing in English. It took me a concentrated effort over several months to rid myself of my bad writing habits and start writing things people actually wanted to read. I believe the same can be done for mathematical writing once we understand exactly what to change and how. I know mathematics can be written more clearly.

Why things are changing

Mathematical writing has evolved down the centuries. Typically, words have been replaced by an ever increasing range of symbols. There was also the transition away from Latin. The expressions and attitudes of society presumably seep into mathematical writing eventually.

However, in the last two or three decades there is one influence that stands out: computer software. Program code is somewhat similar to mathematical writing in that it involves an artificial language with many special symbols and demands more precision than ordinary conversations. However, program code takes things a long way beyond mathematical writing.

Not surprisingly, programmers who also do mathematics (including those who use mathematical ideas to improve their programming) tend to prefer notation that is less ambiguous and more powerful than the traditional notations of mathematics and mathematical logic, use variable names with more than one character in them, favour systematic layouts and referencing schemes, and prefer to get things formally correct rather than just give a general impression to guide intuition.

Overview of guidelines for clarity

This article goes into the details of the skill of writing mathematics that is clear and interesting. It breaks down the skill into a set of specific habits, offering justification for each one, and examples. However, it does not advocate any specific style of notation or layout for workings. You don't have to agree with and adopt all the skills to make huge improvements in your writing or the writing of people you teach. Every habit you decide to adopt is a step in the right direction.

This is where we need to look in detail at what makes so much mathematical writing confusing and what can be done differently. The specific habits are grouped under some general guidelines.

  1. Motivate the reader and provide signposts

  2. Limit mental strain

  3. Provide a clear context

  4. Always be unambiguous

  5. Be complete and correct

  6. Maximize basic legibility

(Please note: When ordinary English is rewritten for clarity the end result is almost always shorter than the original, as well as clearer and easier to read. The reduction can easily be 30% – 40%. However, when mathematics is rewritten for clarity there is usually an increase in length. This is because in mathematical writing it is common to leave out a lot of things that need to be said.)

1. Motivate the reader and provide signposts

1.1 Reasons to read

Reading mathematics is, for nearly everyone, tiring and time consuming work. The reader needs good reasons to make that effort. Even professional mathematicians have other demands on their time.

However, mathematical writing often starts with something like ‘Let C(X) be ...’ or ‘Consider a point ...’ without giving the slightest hint as to why the reader should make the effort. What is the motivation, if any, for this section of text? For non-mathematicians this is one of the biggest annoyances and if they don't believe following the writer's instructions is going to be worthwhile then they won't read on.

Most often a good motivation would be to give the reader some mathematical tools to apply to an important problem. Perhaps it is a single equation, or a model, or a set of theorems or proof strategies.

Unfortunately, it is quite common to prove various ‘interesting’ results without saying why they are interesting, or to pose ‘interesting’ questions without saying why they are interesting. I suspect that often they are not interesting but having worked them out the writer is determined to share them.

Books sometimes start with a chapter of ‘useful results’. Useful for what? Sometimes the author says the first chapter is the foundation for what is to come but after 20 pages of apparently aimless definitions, notation, and proofs it is hard to believe so much pain is necessary.

The cure is to have a strong motivation and explain it up front. If the result to be proved has a useful application, even if it is just in proving some other piece of pure mathematics, it should be stated.

If there really isn't any motivation then don't write anything at all.

Some specific fields, such as game theory, are plagued by largely pointless mathematics. The writer starts by making a set of assumptions that set up a model that should represent a realistic situation. Sadly, the assumptions are highly restrictive to stop the mathematics getting too messy and boringly long. As a result the highly technical conclusions reached rest on those highly technical assumptions and nothing of practical value is established.

At the start of each section and whenever asking the reader to take a step (e.g. ‘Let..’, ‘Consider...’), check you have a good reason and state it clearly before you ask the reader to work. Take other easy opportunities to remind the reader of the value of what they are trying to understand.

In the example below, bold has been used to highlight the text that illustrates this particular technique. Other improvements have also been made, though not all that are possible.

Avoid: ‘Let V be a (non-empty) universe of discourse. A Σ-field F on V is a collection of subsets of V, such that:
1. V ∈ F;
2. If A ∈ F then A' ∈ F , where A' is the complement of A;
3. If A, B... ∈ F, then A ∪ B ∪ ... ∈ F.
A probability function P over V is a mapping from F to [0;1] satisfying the three axioms of probability [Ash, 1972].’

Prefer:Our approach to linking probability with similarity is based on the fundamental properties of a probability space, which are as follows. A probability space has three parts to it:
(1) the sample space of some experiment V, which is a set of possible outcomes and is assumed to be non-empty;
(2) a set of subsets of V, which we will call F; and
(3) a probability function P that maps from F to real numbers in the range [0,1] and satisfies the three axioms of probability [Ash, 1972].
The elements of F are called events and F has special properties with respect to V so that P is defined for all the events we might want to consider. These properties are those of a sigma-field:
  a. V ∈ F.
  b. If A ∈ F then A' ∈ F , where A' is the complement of A.
  c. If A, B... ∈ F, then A ∪ B ∪ ... ∈ F.’

1.2 Interpretations

Good mathematical ideas often start with a practical problem. At that early stage the objects in the mathematics represent things people can imagine and perhaps even touch. However, as time passes and the mathematics is developed further there is a tendency towards making it more abstract as well as more formal. Connections with other mathematical topics are made. Ideas that started simply become elements in a much grander and more general scheme with layers of elaborate definitions beneath it and less and less connection with the real world. References to the real world may be removed deliberately.

This process makes the mathematics increasingly difficult for beginners to understand and lessens the pressure to develop techniques with useful applications.

Referring to practical interpretations helps readers to understand and motivates them because it reminds them of practical applications.

When introducing models, refer early and often to interpretations of the mathematics even if it is only to give an example.

Avoid: ‘Let V be a (non-empty) universe of discourse. A Σ-field F on V is a collection of subsets of V, such that:
1. V ∈ F;
2. If A ∈ F then A' ∈ F , where A' is the complement of A;
3. If A, B... ∈ F, then A ∪ B ∪ ... ∈ F.
A probability function P over V is a mapping from F to [0;1] satisfying the three axioms of probability [Ash, 1972].’

Prefer: ‘Our approach to linking probability with similarity is based on the fundamental properties of a probability space, which are as follows. A probability space has three parts to it:
(1) the sample space of some experiment V, which is a set of possible outcomes and is assumed to be non-empty;
(2) a set of subsets of V, which we will call F; and
(3) a probability function P that maps from F to real numbers in the range [0,1] and satisfies the three axioms of probability [Ash, 1972].
The elements of F are called events and F has special properties with respect to V so that P is defined for all the events we might want to consider. These properties are those of a sigma-field:
  a. V ∈ F.
  b. If A ∈ F then A' ∈ F , where A' is the complement of A.
  c. If A, B... ∈ F, then A ∪ B ∪ ... ∈ F.’

1.3 Lists

Something else that readers don't enjoy is toiling on through paragraphs of dense mathematical text and symbols without a sense of progress or heading towards an end. They want structure.

Mathematical writing often gives opportunities for lists, such as lists of steps in a proof, lists of elements in a model, lists of conditions in a definition, and lists of axioms and theorems. It helps readers if you format these as lists, either with bullet points or numbers, and make sure that every item that should be on the list is included in the list rather than mentioned in text introducing the list.

Lists can be structured to have sub-lists within them, with indenting and/or reference numbers used to make the structure clear.

Some people think that writing mathematics in paragraphs even when lists could be used is a more sophisticated style. In particular, they regard writing proofs using lists of steps as being for beginners only. However, there are others, just as expert in advanced mathematics, who prefer lists because they reduce mistakes, allow computer support, and make the work easier to read.

When writing lists, try to itemize thoroughly rather than bundling several items into one bullet. For example, if your assumptions consist of ten facts, do not present them as a list of seven, say, with some bundled together.

When writing items that could be presented as a list (e.g. steps in a proof, properties of a definition, elements of a model, axioms, and theorems), present all the items in a list format (e.g. bullets or numbered).

Avoid: ‘Let V be a (non-empty) universe of discourse. A Σ-field F on V is a collection of subsets of V, such that V ∈ F, if A ∈ F then A' ∈ F (where A' is the complement of A), and if A, B... ∈ F then A ∪ B ∪ ... ∈ F.
A probability function P over V is a mapping from F to [0;1] satisfying the three axioms of probability [Ash, 1972].’

Prefer: ‘Our approach to linking probability with similarity is based on the fundamental properties of a probability space, which are as follows. A probability space has three parts to it:
(1) the sample space of some experiment V, which is a set of possible outcomes and is assumed to be non-empty;
(2) a set of subsets of V, which we will call F; and
(3) a probability function P that maps from F to real numbers in the range [0,1] and satisfies the three axioms of probability [Ash, 1972].
The elements of F are called events and F has special properties with respect to V so that P is defined for all the events we might want to consider. These properties are those of a sigma-field:
  a. V ∈ F.
  b. If A ∈ F then A' ∈ F , where A' is the complement of A.
  c. If A, B... ∈ F, then A ∪ B ∪ ... ∈ F.’

1.4 Strategy signposts

Proofs and derivations often proceed with no outline of the strategy being followed, or hints as to why the reader should assume something. The theorem to be proved should be stated first and proof strategies should be stated before each is used. If your strategy is a common one it is usually enough to name it.

More skilled readers need fewer signposts than novices, but even professionals will be spared a little bit of effort by systematic strategy signposts.

When writing a step of a proof or derivation that begins a new proof strategy, describe or name the strategy before continuing.

Avoid: ‘Proof that (P ∧ Q) ⇒ ¬ (¬P ∧ ¬Q). Assume (P ∧ Q)...’

Prefer: ‘The rule that, for all propositions P and Q, (P ∧ Q) ⇒ ¬ (¬P ∧ ¬Q) will be proved if, assuming the antecedent we can deduce the consequent. So, assume (P ∧ Q)...’


‘Proof that (P ∧ Q) ⇒ ¬ (¬P ∧ ¬Q). By direct proof: Assume (P ∧ Q)...’


‘Proof that (P ∧ Q) ⇒ ¬ (¬P ∧ ¬Q).
1.   (P ∧ Q)       by assumption for direct proof

1.5 Plain English previews

Sometimes authors start a book with an overview of what is covered in each chapter, or start a chapter with an overview of what is covered in that chapter. This is only helpful if the information makes sense to readers. It is a mistake to write such an overview using terms that will be explained later in the book/chapter. It is more sensible to assume that the reader does not understand those terms until they have been explained in the book.

To some extent this also applies to headings and the table of contents.

When tempted to write this kind of preview, consider how useful it will be, particularly if you also have a table of contents. When an author hits me with one of these previews at the start of a book I usually feel bored by it and begin to expect to be bored by the rest of the book! Some text that conveys the value of what is covered would be more motivating.

When previewing the content of a book, chapter, or article, do so in language that does not use concepts and terms to be explained later.

2. Limit mental strain

Aim for a wide audience. By ‘wide audience’ I mean ‘a bit wider than you would think reasonable’ and there are two reasons for doing this:

  1. Few of the educated, specialist readers most writers have in mind are as brilliant as they appear. Like anyone else they can't remember things they used to know, don't want to stop and look up details in another book, don't see things exactly the same way as the writer, and if made to work hard then they get tired and stop.

  2. Aiming for a wider audience means you might get a wider audience.

This doesn't mean you should write the same way for professional mathematicians as for school children. It just means you should write for a wider audience than most writers at the same level do. Experts are not insulted by rigourous clarity.

Reaching a wider audience requires concern for the reader's mental strain.

2.1 Thinking guidance

Ideally, your writing will take the reader down an easy but correct route to understanding. However, sometimes it can help readers to give them some informal advice on how to think something through, perhaps based on your own struggles with a particular topic.

When writing about something that you found hard to understand, consider giving some informal advice to the reader on how to understand more easily or how to avoid mistakes.

2.2 Achievable jumps

Readers get discouraged if they think they should be able to follow the steps in an argument but find they cannot. This happens very often and for a number of reasons.

Sometimes the jumps in an argument are consistently too big for the audience. It is true that writing every tiny detail of a mathematical argument can create long and boring explanations with many details that even moderately skilled readers could fill in for themselves. However, most writers go too far in the other direction.

Another reason for frustrating jumps is that the difficulty of the steps left to the reader is uncontrolled. In some sections the painstaking detail is enough for the slowest reader but a moment later a huge leap will be taken without comment. This makes matters worse than ever because it creates the expectation of being able to follow the argument only to dash those hopes a moment later.

The famous French mathematician, Pierre-Simon Laplace (1749-1827), wrote ‘Il est facile de voir que...’ meaning ‘It is easy to see that...’ when he had proved something but mislaid the proof or found it clumsy. In other words, it was a signal for things he believed true but which were hard to prove. From the reader's point of view this is a maddening habit.

There are two good reasons for avoiding overly big gaps in arguments. Firstly, it is easier to make a mistake if you leave a gap. Secondly, the bigger the gaps left the smaller the number of readers who will stay with you.

When writing successive steps in a proof, derivation, or other argument, make each step small enough that the reader can easily follow.

Avoid: ‘Given that L(Ik) = ε/(2k-1),

    ∑n∈1..∞ L(In)

    = ∑k∈1..∞ ε/(2k-1)

    = ε ∑k∈0..∞ 1/(2k)

    = 2ε.’

Up to the last line the steps can be followed with some basic algebra knowledge, but the last step assumes readers know the sum of the infinite series involved.

Prefer: ‘Given that L(Ik) = ε/(2k-1), and using the fact that ∑r∈0..∞ 1/2r = 2,

    ∑n∈1..∞ L(In)

    = ∑k∈1..∞ ε/(2k-1)

    = ε ∑k∈0..∞ 1/(2k)

    = 2ε.’

2.3 Missing proof warning

Still another problem is that writers sometimes do not make it clear if they are trying to present an argument or just some selected results without their derivation. The reader wonders if he/she is supposed to be able to see why one statement follows from the others and perhaps wastes time trying to see the connection where there isn't one or it is too remote to be seen without help. This can be avoided by saying something like ‘It can be shown that...’ to let the reader know the derivation is omitted.

Don't say that a proof is ‘left as an exercise for the reader’ unless you want to appear rude.

When presenting results without proof or derivation, make it clear that the justification is omitted.

Avoid: ‘From standard matrix algebra we then obtain,

π(z|y) ∝ N(0, In + xvx')Ind(y,z)’

Prefer: ‘Using standard matrix algebra it can be shown that,

π(z|y) ∝ N(0, In + xvx')Ind(y,z)’

2.4 Deferred proof warning

When you write something you intend to prove later, state this to stop the reader wondering where the proof is.

2.5 True by definition

Similarly, if it is unclear whether something is true by definition or by deduction then some readers will worry about whether there is a proof they should understand.

This includes situations where what is being defined is an abbreviation introduced to make some workings more convenient.

When making a statement that is true by definition, state that it is true by definition.

Avoid: ‘To derive Bayes's Theorem we start with the fact that:

P(A|B) = P(A∩B)/P(B)’

Prefer: ‘To derive Bayes's Theorem we start with the definition of conditional probabilities, which states that, for any two events, A and B, such that the probability of B is greater than zero:

P(A|B) = P(A∩B)/P(B)

If P(B) = 0 then P(A|B) is undefined.’

2.6 True by assumption

Furthermore, if something is true by assumption (e.g. it is an axiom) then it should be clear that this is so or some readers will be distracted by worry that there is a missing proof.

When making a statement that is true by assumption, state that it is true by assumption.

Avoid: ‘For all probability functions P,
P(S) = 1.’

Prefer: ‘For all probability functions P,
P(S) = 1.     (Axiom 1)

2.7 Restated definitions

Mathematics has an extensive vocabulary of specialized terms and they are essential. However, relying on readers to know the terms and to recall all the underlying assumptions and details is dangerous and usually derails more readers. How many people know what a ‘simple probability measure’ is, or a ‘compound lottery’ or a ‘Darboux sum’?

If the terms used are not even in a fairly big dictionary like the Unwin Hyman Dictionary of Mathematics then it is usually unreasonable to expect readers to know the jargon already or go to another book or journal to find it. Self-contained writing is far more accessible.

All important definitions should be given if they are not a standard part of the literature and it is better to err on the side of defining more rather than less. There are differences in definitions between experts.

Typically, definitions should go before the results that use them.

When using a specialist term, consider if the definition will be known to all readers in full without risk of different definitions being applied. If not, state the definition in your text.

Avoid: ‘Our calculation uses the upper quartile of the distribution...’

Prefer: ‘Our calculation uses the upper quartile of the distribution, defined as follows:...’

2.8 External references internalized

In a lot of mathematical and scientific writing is it common to write as if readers have access to a comprehensive library of books and journals and can access every document in it within seconds if needed.

Referencing is essential in many cases but even readers with access to a big academic library cannot find everything, let alone find it quickly. There are also many potential readers who do not have access to such a library at all.

Therefore it is usually a mistake to write so that access to another document is essential for understanding. Important points need to be restated, and can usually be restated more clearly than in the original version.

When you refer to another document, consider if access to that document is needed for the reader to understand your writing. If so, restate the relevant points in your own document.

2.9 Consistent terms

Sometimes writers use different words just for variety even though they intend them to have the same meaning. It creates the impression that perhaps the writer sees differences in meaning between the words. For example, if the words set, class, family, and collection are used interchangeably then this is likely to cause confusion, even if the intention is to be clearer.

Using the words ‘contains’ and ‘includes’ interchangeably when talking about sets can also frustrate readers. A sensible policy would be to say ‘A contains B’ when B is a subset of A, but say ‘A includes B’ when B is an element of A.

When you need to say the same thing in a document, even in a single sentence, use the same term, even if the repetition sounds ugly.

Avoid: ‘A Σ-field F on V is a collection of subsets of V, such that V...’

Prefer: ‘A Σ-field F on V is a set of subsets of V, such that V...’

2.10 Statements in words and mathematical symbols

Many statements in mathematical symbols can also be stated in words that are fairly easy to read. This is helpful to readers because it:

Sometimes it's not practical to restate a line of mathematical notation in plain English, but when it is you should do so. At least state in words the meaning of each important definition and result.

When writing important definitions and results, provide a translation in words and give it before giving the symbolic version.

Avoid: ‘We propose to define the probability that Yn+1 = 1, given the function S, by

Yn+1 = ∑i≤nS(Xi, Xn+1)Yi  /  ∑i≤nS(Xi, Xn+1)’

Prefer: ‘We propose to define the the probability that Rn+1 be 1 (i.e., that the result of the treatment will be success in the case of patient Pn+1), given the similarity function S, as the total s-weight of all past successes divided by the total s-weight of all past cases, successes and failures alike. That is:

Pr(Rn+1=1) = ∑i≤nS(Pi, Pn+1)Ri  /  ∑i≤nS(Pi, Pn+1)’

2.11 Familiar symbols

One of the simplest ways to repel readers is to use symbols that readers cannot name. Usually this involves using the less well known letters from the ancient Greek alphabet. English readers know f, g, and h, but how many know Γ, ζ, and ξ? Unfamiliarity makes the work a bit harder for the reader.

Some characters are very hard to distinguish, such as ℜ, ℘, ℑ and ℵ

To some extent this problem is reduced if most writers on a particular topic have already used the same symbol, making it more familiar and memorable for that reason. It may be that the established set of symbols has become so familiar to likely readers that a change would be confusing. In that case the symbols can stay.

However, if you are the first person to choose a symbol for the particular idea then choose something familiar that people can name, and if the precedent is not strongly established then consider seriously the benefits of establishing a new look with easier symbols.

When choosing symbols for objects (e.g. variables, functions, sets, relations) prefer symbols likely to be familiar to readers and avoid using the less common letters of the ancient Greek alphabet.

Avoid: ‘∫ dξ2 ∫ dξ11)pΨ(ξ1)Ψ(ξ2)’

Prefer: ‘∫ de2 ∫ de1(e1)pS(e1)S(e2)’

2.12 Dissimilar symbols

Some symbols are visually similar to each other, especially when hand written, making it just a little bit harder to distinguish between them.

As with familiarity, your choice may also be driven by established usage, but there are benefits to switching to more distinguishable symbols that may add up over time.

When choosing sets of symbols for objects (e.g. variables, functions, sets, relations) prefer symbols that are easy to distinguish from each other.

Avoid: u, v, w

Prefer: a, b, c

Avoid: i, j

Prefer: x,y

Avoid: p, q

Prefer: e, f

Avoid: m, n

Prefer: s, t

2.13 Memorable symbols

In mathematics it is sometimes necessary to introduce a large number of objects and everyone gets lost quickly if the symbols used to represent them are not easy to remember. Things get worse still if the symbols you choose tend to suggest that they are something else.

In mathematics it has been traditional to use single letters only (perhaps with subscripts or some other decoration) but this can be a tiresome restriction with big problems, especially if you are restricted to ASCII characters.

Choosing letters that are the first letter of a word that describes the object is traditional in computer programming but not quite as routine in mathematics. (Examples: f for a function, n for a number, p for a probability or a prime number, S for a set, v for a vector.) Choosing names that have multiple letters in them is also a long tradition in computer programming but even rarer in mathematics. I think the programming style should be used more.

Something else that helps is to use upper and lower case letters in conventional ways. Firstly, upper and lower case versions of the same letter are often used to represent a thing and parts of the thing respectively. For example, S could be the set whose elements are written as s1, s2, s3, etc. Secondly, upper case letters are often used as the limit of an index that is written as the lower case letter. For example, j might range from 1 to J.

Another useful technique is to use letters that are consecutive in the alphabet to represent objects of the same type. For example, if one function is f then two more might be g and h.

Other traditional techniques in mathematical writing can be helpful, but less so. For example, letters can be decorated with little hats and bars, but these are sometimes very small and similar looking. Different fonts are often used to distinguish symbols but it is very hard to write mathematics with a pen once such distinctions have become conventional and it is hard to remember the different functions of lots of fonts in one document. Using bold and italic can also help, but again do not work well when writing in pen.

These techniques can be used to make symbols that are different appear even more different and easier to distinguish in a printed text. However, relying on readers to distinguish between different objects solely on the basis of a different font or emboldening is dangerous.

When choosing symbols for objects (e.g. variables, functions, sets, relations) prefer symbols that are easy to remember, perhaps because they are the first letter of the most obvious word and aim for consistency.

Avoid: The beta probablity density function is sometimes written as

‘f(p) = kpα-1(1-p)β-1

However, f(p) most often in practice represents a probability density and p is a relative frequency of two alternative outcomes. It would be hard to choose more confusing symbols.

Prefer: It is better to stay neutral with

‘beta(x) = kxα-1(1-x)β-1

Or choose something suggesting the usual interpretation like

‘d(f) = kfα-1(1-f)β-1

2.14 Explain mnemonics

Assuming you have chosen a name or symbol for a good reason it sometimes helps to explain that reason.

When introducing a name or symbol for an object consider if explaining the reason for the choice will make it more memorable and, if so, state the reason.

Avoid: ‘Where I is the total stock level,...’

Prefer: ‘Where I is the total stock (i.e. Inventory) level,...’

2.15 Cutting symbols

Sometimes writers seem driven to give everything a symbol, even if they don't actually use it or don't need to use it.

Do not introduce symbols you don't really need.

Avoid: ‘Let Δ = b^2 − 4ac. If Δ >= 0, then the roots are real.’

Prefer: ‘If b^2 − 4ac >= 0, then the roots are real.’

2.16 Avoided indices

A common way to refer to particular items in a set or to all the items in a set is to introduce an index variable. However, often these index variables are unnecessary and it is easier to use the usual symbols of set theory. The index can be awkward to use if you want to talk about subsets.

Since the items are in a set, not a sequence, there is usually nothing special about the order in which the items are considered so, in a way, the index is misleading.

The style used so often is to write something like ‘Let X = {x1, x2,..., xN} be a set of ..." I know this is a shorthand, but taken literally it is wrong because ‘X = {x1, x2,..., xN}’ is an equation, not a set. The set is just X. Very often the ‘{x1, x2,..., xN}’ bit isn't used at all. When it is, a style without the subscript is usually cleaner looking and easier to understand.

When introducing a set, avoid index variables unless they are really necessary.

Avoid: ‘Let Aw be a set of n ACTS, where F(Aw) = G for w = 1..n

Prefer: ‘Let A be a set of ACTs where, F(a) = G for all acts a in A.’

Avoid: ‘Let U = {Un}n∈N be a uniform ensemble and X = {Xn}n∈N be an ensemble. The ensemble X is called pseudorandom if X and U are indistinguishable in polynomial time.’

Prefer: ‘If U is a uniform ensemble and X is an ensemble, then X is called pseudorandom if X and U are indistinguishable in polynomial time.’

2.17 Same spread

Where a reader will probably want to refer back from one paragraph/formula to another, try to position them so that they can be seen at the same time i.e. not on opposite sides of the same sheet of paper.

2.18 Accurate restatement

When writers say they are restating a result I notice that they often change details, perhaps driven by a feeling that they shouldn't just repeat themselves. For example, they may omit details, change notation for no particular reason, make small inferences forward from the previoius statement of the result, or generalize it.

When restating a result, restate it exactly or explain the changes.

2.19 Plain English

Writing clear English is a major skill in itself with many patterns that improve clarity. For example, it is usual to prefer verb phrases to loading meaning into noun phrases.

Sometimes there is an ordinary English word that means exactly the same as some mathematical word, so it is usually best to use the ordinary English word.

Occasionally, mathematical writing uses the words ‘on’, ‘by’, ‘through’, or ‘over’ in odd ways that usually are not deliberately controlled.

When writing the text of mathematics, use Plain English as far as possible.

Avoid: ‘X possesses an observation...’

Prefer: ‘X observes...’

2.20 Short sentences

Except when they make lists, sentences are easier to read if they are short.

When writing sentences, keep them short, especially where there are formulae included.

2.21 Build up

Although it is important to let readers know where the text is heading (e.g. by stating a theorem to be proved before proving it) it is generally easier for readers to build up gently from simple to complex.

Avoid: ‘To illustrate our viewpoint, let us consider formally a deterministic optimal control problem. We have a discrete-time system described by the system equation

    xk+1 = f(xk,uk),        (3)

where xk and xk+1 represent a state and its succeeding state and will be assumed to belong to some state space S; uk represents a control variable chosen by the decision maker in some constraint set U(xk), which is in turn a subset of some control space C. The cost incurred at the kth stage is give by a function g(xk,uk). We seek a finite sequence of control functions π = (μ01,...,μN-1) (also referred to as a policy) which minimizes the total cost over N stages. The functions μk map S into C and must satisfy μk(x) ∈ U(x) for all x ∈ S. Each function μk specifies the control uk = μk(xk) that will be chosen when at the kth stage the state is xk. Thus the total cost corresponding to a policy π = (μ01,...,μN-1) and initial state x0 is given by

    JN,π(x0) = ∑k=0..N-1 g[xkk(xk)],

where the states x1, x2, ..., xN-1 are generated from x0 and π via the system equation

    xk+1 = f(xkk(xk)),     k = 0,...,N-2

Corresponding to each initial state x0 and policy π, there is a sequence of control variables u0, u1,...,uN-1, where uk = μk(xk) and xk is generated by (3). Thus an alternative formulation of the problem would be to select a sequence of control variables minimising Σk=0..N-1 g(xk,uk) rather than a policy π minimising JN,π. The formulation we have given here, however, is more consistent with the DP framework we wish to adopt.’

Prefer: ‘Our approach is illustrated by a deterministic optimal control problem, using discrete time points. The system goes through a sequence of states s: seq STATE and the decision maker takes a corresponding sequence of control actions a: seq ACTION, where:

(1) the next state is determined by a function f from state and control action to a new state i.e. for all t in 0..N-2, s[t+1] = f[ s[t], a[t] ];

(2) for all t in 0..N-1, the choice of control action is constrained by a function A that maps states to sets of available actions so that a[t] ∈ A[s[t]];

(3) the cost incurred in each state is given by a function c for a given state and control action.

In this context, the total cost incurred is

    ∑ t ∈ 0..N-1c[ s[t], a[t] ];

A common formulation of this problem is that we seek a sequence of N control actions that will minimize this total cost.

However, consistent with our DP framework, our approach is to minimize the total cost by choosing a policy, i.e. a sequence of functions p: seq (STATE → ACTION) that map states to actions within the constraints imposed by A so that, for all t in 0..N-1, a[t] = p[t][s[t]].’

2.22 Working memory capping

When we read ordinary text, our minds load facts from the text and hold them in ‘working memory’, gradually slotting them together when we can. When we reach the end of a sentence or clause we often drop some of those facts because we know we won't need them again. When we reach the end of a paragraph we tend to drop lots of facts.

We struggle with long, fact-heavy sentences and struggle even more if we keep having to dive into sub-clauses. We also struggle if there is no punctuation to tell us a sentence has finished and when paragraphs are very long.

Mathematical writing, with its inherently tough content, can easily swamp a reader's working memory. Consider this example, which is the set up for a theorem whose grand finale is a complicated equation:

‘Let C be a positively oriented, piecewise smooth, simple closed curve in the plane R2, and let D be the region bounded by C. If L and M are functions of (x, y) defined on an open region containing D and have continuous partial derivatives there, then...’

The set up text alone seems to have about 15 facts in it that a reader needs to load up. To me this seems too many. Other techniques covered in this article will look at ways to cope when lots of facts need to be dealt with, but the first technique is simply to try to avoid theorems and definitions with too much information building up in one go.

When writing paragraphs, such as the set up for a theorem, avoid piling up too many facts in the reader's mind.

2.23 Chunks

Another tactic for reducing the risk of working memory overload is to help the reader form well-chosen ‘chunks’ that bundle more than one fact together into a familiar single unit. You can do this by giving names to chunks, by putting ideas together in a single sentence, and by using a build up earlier on to get the reader familiar with commonly-occurring combinations of conditions that are part of a definition or model.

One useful type of chunk is a form for expressions, which is just a consistent way of writing expressions that are the same.

When looking to reduce working memory load, identify useful chunks and encourage readers to form them and recognize them in your text.

2.24 Ladders

Chunking conditions and other model elements may help readers to take on what they are reading but it won't necessarily help them check it or reason from it. This in turn may hinder understanding.

If readers have to ‘unpack’ definitions (and perhaps definitions within them) in order to start reasoning, and if you don't do this for them in the text, then they will struggle.

If readers are to climb a hierarchy of definitions then they need to learn, along the way, reasoning rules expressed using the terms at each level. This allows them to reason at each level without unpacking definitions.

Ladders seems a good name for the tactic of building up explanations of these rules because, at each level, the reader has something to stand on to work at that level.

When building up a definition hierarchy, provide reasoning rules at each level to reduce the amount of unpacking readers have to do, or explcitly unpack for them in your explanations.

2.25 Depth control

Mathematics on the web is, overall, a wonderful thing. If you are reading something and come across an unfamiliar term or result, often there is a hyperlink to a page that tells you more about it. This wonderful convenience makes it possible to read material that otherwise would be too hard.

However, sometimes the page ‘below’ is impossible to understand without diving below it, and that leads to a page that is also baffling, and to pages lower down. It's not long before the reader must give up, tired and frustrated.

The cause of this problem seems to be a tendency for mathematical writers to pull in terminology and theory from related fields, making it seem that the definition hierarchy goes on forever without getting to simple ideas anyone can grasp.

When writing a definition hierarchy, especially using hyperlinked pages, avoid creating a very deep hierarchy. Reach a simple starting point as quickly as possible.

2.26 Subdivisions

Long explanations are easier to digest if divided, perhaps repeatedly, into subsections of some kind, such as paragraphs.

When writing long explanations, divide them, perhaps repeatedly, into subjsections of some kind and use new lines, space, and/or referencing to emphasize the subdivisions and structure.

2.27 Consistent forms

This technique is so commonly used I hardly need to point it out. If readers see expressions in familiar forms it helps them to recognize and remember the expressions.

For example 3x2 + 4x + 1 is a familiar form of quadratic expression. Writing it as 4x + 1 + 3x2 makes recognition just a tiny bit harder. The current tradition is to write with the higher powers of x first for quadratics and other polynomials of low order. (Oddly, when writing a general polynomial it is usual to write it with the lower powers first, like this: a0 + a1x + a2x2 + .. + anxn. Don't ask me why.)

When writing expressions, try to stick to familiar, consistent forms.

2.28 Just in time explanations

Starting a book with a chapter on notation, terminology, or other material to be assumed later can work, but more often it is better to tell readers these things just before they need it.

Otherwise readers tend to forget things before they have a chance to use them.

When tempted to write an introductory chapter on notation, terminology, or other knowledge to be assumed later, consider the problem of forgetting and decide to introduce each piece of background knowledge just before it is needed instead.

2.29 Parallelism

Similar statements or formulae appearing together can be easier to understand if they are expressed in a parallel way. By removing unnecessary differences in appearance between the two statements we make it easier for the reader to identify the important, real differences in meaning.

When writing two statements together that are similar, maximize their similarity to show clearly the remaining differences.

Avoid: ‘If i < j then s(j) > s(i).’

Prefer: ‘If j < k then s[j] < s[k].’

3. Provide a clear context

3.1 Explicit platform

I was a teenager when I first faced a problem at school that began with the terrifying words ‘Prove that...’ They put my mind into a spin because it was not clear what I was allowed to take as already established. In theory I could even start from the silly position that the theorem to be proved was already established so I need say nothing other than name the theorem. At the other extreme I perhaps needed to start at the beginning with the most basic axioms (i.e. starting assumptions) of numbers and work up from there, establishing every last substitution, working my way through the basics of algebra, deriving calculus, and so on before finally concluding with the theorem to be proved.

As far as I know this huge cloud of uncertainty still sits over school mathematics and probably mathematics at other levels, even though there is a simple solution to it.

What could be done to clarify the problem for students is to put axioms and results established from them into standard groups with standard names. The grouping would be by topic and by level (with the axioms usually being at the lowest level). These would then be bundled further so that a question can say something like, ‘Using Level 2 Trigonometry, prove that...’ This would tell the well-prepared student exactly which results are established already, which can be used without comment because they are so simple and often used, and which need to be named when used.

This might be useful for professional mathematicians too. It is quite possible that some published proofs are really circular, since they rely on some result that itself was derived from the very thing the author is trying to prove by some new method.

In general it is helpful to structure mathematical writing so that it is clearer what is taken as established at each point.

Several projects to build mathematics from the ground up in a rigorous and formal way have made impressive progress. In all cases that I have seen (e.g. Mizar, HOL-Light, Isabelle, MetaMath) every step is checked by a computer program and has to be tied back to previously established statements.

Unfortunately, these rigorous proofs are not easy to follow either; there is more to clarity than rigorous cross referencing. However, they are easier to follow than they seem at first. If you take a look at them then make allowance for the fact that they are aimed at advanced mathematical work, which is one of the reasons they are so baffling. When they cover familiar school algebra they can be quite lucid at times.

Always make it clear what has already been established or assumed.

Avoid: ‘If P(A) > 0, then the quotient

     PA(B) = P(AB)/P(A)     (5)

is defined to be the conditional probability of the event B under the condition A.

From (5) it follows immediately that

     P(AB) = P(A)PA(B)     (6)

[...some more paragraphs of text...]

From (6) and the analogous formula that

     P(AB) = P(B)PB(A)

we obtain the important formula:

     PB(A) = P(A)PA(B)/P(B),     (12)

which contains, in essence, the Theorem of Bayes.’

In this tiny example it is not quite clear when the assumption that P(A) > 0 stops being in force, and it is not made clear that P(B) > 0 is also necessary when the 'analagous formula' is used.

Prefer: ‘If P(A) > 0, then the quotient

     PA(B) = P(AB)/P(A)     (5)

is defined to be the conditional probability of the event B under the condition A.

From (5) it follows immediately that if P(A) > 0 then

     P(AB) = P(A)PA(B)     (6)

[...some more paragraphs of text...]

From (6) and the analogous formula that, if P(B) > 0 then

     P(AB) = P(B)PB(A)

we obtain the important formula:

     P(A) > 0 and P(B) > 0 ⇒ PB(A) = P(A)PA(B)/P(B),     (12)

which contains, in essence, the Theorem of Bayes.’

3.2 Stated scope for assumptions

One pitfall when trying to be clear about what assumptions are currently applicable is to introduce assumptions but not make it clear when they cease to apply. Perhaps the writer intended them to be in force only for the current page, or proof, or section, or chapter, or perhaps for the whole book. Often it is only by clever deduction that the reader knows.

This can be done by saying ‘within this chapter’ or ‘for the purposes of this model’, or by restating assumptions (perhaps by referencing) on every line, or by using a box or lines to show the scope. In the mathematical specification language of Z, rather attractive looking lines enclose formulae and clarify scope fully.

Perhaps the most useful Z technique for establishing a clear set up up of objects and assumptions is ‘schema inclusion’. The idea is very simple. Imagine you want to write a paper about an area of mathematics called something like ‘happy spaces’ and a happy space has various objects in it with their own special properties and conventional names. In many of the formulae in your paper this basic set up needs to be in scope, with perhaps some extra objects and assumptions thrown on top.

The schema inclusion technique involves writing out the basic set up once, in a box (i.e. a ‘schema’), and giving it a name. Then, whenver you want that set up to apply in a formula you just write its name in an appropriate style and ‘by inclusion’ all the details are in place.

In Z, schema names are included in other schemas, keeping the scope clear at all times.

Without something like schema inclusion it is easy to create doubt by leaving the set up unstated, trusting to the reader to guess it from the context, or to generate confusion by reminding the reader of some parts of it but not all.

When stating assumptions, say what their scope will be or use notation to do so, such as brackets, a box of some kind, or schema inclusion.

3.3 Stated scope for objects

As with assumptions, objects introduced need to have a clear scope so that readers know when they cease to exist as well as when they come into existence.

When introducing objects, say what their scope will be or use notation to do so, such as brackets or a box of some kind.

4. Always be unambiguous

4.1 Unambiguous notation

Work to support mathematics on computer systems has revealed how much of traditional notation is irritatingly ambiguous and almost guaranteed to cause problems for readers at many levels of skill. Here are some examples:

More confusion can be created by inventing new notation, especially if it involves something that already has other meanings, like subscripts and superscripts.

A lot has been done to establish unambiguous ways to type mathematics into computer systems and these techniques will probably get used in mathematics more over coming decades. For example, since curved parentheses are used too much, Mathematica requires inputs to functions to be contained in square brackets instead (e.g. log(x) becomes log[x]). It's a nice idea.

Also, cracking the simple problem of typing mathematics using ordinary ASCII text symbols has shown some helpful techniques. For example, raising to a power is shown by the '^' symbol so that a2 becomes a^2.

In Latex (and Mimetex) some of these symbols are translated into the traditional form for display so getting used to the Latex script version helps to prepare the way for reading and writing in the ASCII notation.

When choosing a style of notation, carefully consider the problem of ambiguity and prefer notation that is unambiguous without clever inferences from the context, wherever possible. Do not shy away from small departures from old fashioned notation as there are often precedents for changes you might want to adopt.

Avoid: cos-1x

Prefer: arcos[x]

Avoid: cos2x

Prefer: cos[x]2

Avoid: cos x2

Prefer: cos[x2]

Avoid: dy/dx

Prefer: f'

Avoid: The price at time t is Pt

Prefer: The price at time t is P[t]

4.2 Object introductions

One of the most basic disciplines in writing mathematics is to introduce each new object (e.g. variable, function, matrix, operator) before using it, saying what it represents and what type of object it is. Occasionally authors start using a symbol as if they have already introduced it but in fact they haven't. Often what they introduce looks similar to a variable already introduced but it has a different subscript or some other decoration so that in truth it is really a new object.

When you want to introduce a new object of any kind, introduce it properly first (or immediately after using the ‘where x is...’ style), stating clearly and precisely what it means and what type of thing it is.

Avoid: ‘This leads to the equation s = rt, where s is the distance, r is the rate, and t is the time.’

Prefer: ‘This leads to the equation s = rt, where s:ℜ is the distance from my car to my house in metres, r:ℜ is the velocity at which I'm travelling in metres per second, and t:ℜ is the number of seconds for which I have been travelling.’

4.3 Diagram variable introductions

It is tempting to think that a variable introduced on a diagram will be clear to readers because ‘a picture is worth a thousand words,’ which may be why variables on diagrams sometimes slip past the usual discipline of introductions.

Unfortunately, diagrams are not the same as pictures – they are more abstract and limited – and frequently they are as baffling as any text.

When using a variable in a diagram, check that it has been introduced properly in the text, preferably just before or just after the diagram.

4.4 Non-reuse of symbols

It can be confusing to reuse a variable name (or symbol for another object) for something else, even if you say clearly that you have done so. A common example of this is when authors subtly alter the meaning of a symbol, sometimes without realizing it themselves.

Another common example of re-using symbols is where the writer wants to introduce an abbreviation for something that is going to be written lots of times, but instead of choosing a new symbol for the abbreviation writes something bizarre like Φ = Φ(σ,γ)

When choosing symbols, do not reuse symbols within the same document.

Avoid: ‘Thus:

    Φ = Φ(σ,γ)’

Prefer: ‘A convenient abbreviation is:

    φ = Φ(σ,γ)’

4.5 Unambiguous quantifiers

It is easy to write subtly ambiguous or misleading statements involving quantifiers.

When writing about something that could be singular or plural, take care to use the right words.

Avoid: ‘f(x) = g(x)     (x is an element of X)’

leaves doubt as to whether the equation holds for all elements of X or only for a particular element, x.

Prefer: ‘For all x in X, f(x) = g(x).’


‘For some x, an element of X, f(x) = g(x)."

Avoid: ‘Let x be an element of X...’

Again, this is ambiguous.

Prefer: ‘Let x be any element in X ...’


‘Let x be a specific element in X that satisfies...’

Avoid: ‘A solution of the equation x = 20/10 is x = 2’

Prefer: ‘The solution of the equation x = 20/10 is x = 2’

Avoid: ‘The solution of the equation x2 = 4 is x = 2’

Prefer: ‘A solution of the equation x2 = 4 is x = 2’

4.6 Precise inequalities

Sometimes it is not made clear whether inequalities are strict or not. For example, ‘y is between 3 and 5’ leaves doubt as to whether y can be equal to 3 or equal to 5. This can lead to errors.

When stating inequalities always remember the ‘or equal’ bit if appropriate.

4.7 Broken chains

What does ‘A = B = C’ mean to you? Probably you would interpret that as A = B and also B = C (which also implies A = C). However, this sort of thing is not allowed in computer programming languages and should not appear in mathematical writing because it has other interpretations. It could mean that A is equal to the truth value of B = C, which would be True or False. It could also mean that C is equal to the truth value of A = B.

In short, ‘A = B = C’ is ambiguous and should be avoided.

Similarly, although traditional, the notation ‘0 < x < 1’ is ambiguous and should be avoided.

When writing a series of equalities or inequalities use connectives or a list format rather than a chain on one line.

Avoid:A = B = C

Prefer:A = B and B = C


= B
= C

Avoid: ‘0 < x < 1’

Prefer: ‘0 < x and x < 1’

or the less well known form

x ∈ (0,1)’

4.8 Implications

As a schoolboy learning algebra I was taught to use the symbol ∴, meaning ‘therefore’. I used it all over the place, as instructed, without really thinking about what it meant.

Today I recognize more than one type of inference and believe using more symbols helps. Specifically, if B follows from A, but A does not follow from B, then the symbol to use is AB but if B follows from A and also A follows from B then I can write AB.

This applies where reasoning progresses from one line of workings to the next.

When writing successive lines of formulae that follow from each other, use appropriate symbols for the inference involved.

Avoid: ‘P ∧ Q
∴ P’

Prefer: ‘P ∧ Q

Avoid: ‘P ∧ Q
∴ ¬(¬P ∨ ¬Q)’

Prefer: ‘P ∧ Q
⇔ ¬(¬P ∨ ¬Q)’

4.9 Three examples

If you have an abstraction that can be applied to a variety of more specific cases, and if you want readers to learn to recognize when your abstraction can be applied, then you need to give them some practice. Research suggests that people usually need at least three examples to get the hang of it.

When you want to teach application of an abstraction, give three varied examples. Also consider giving some kind of exercise to practise.

4.10 Boxes

One technique that helps to clarify scope as well as improve basic legibility is to draw boxes around certain parts of the text. Things you could choose to box include important formulae, theorems, proofs, examples, and bundles of formulae you want to reuse by reference. It is easier for readers to see that objects introduced within the box don't exist outside it.

The mathematical specification language Z uses special boxes that are open on the right hand side and also have a horizontal line through them to separate declarations (i.e. introduction of objects) from statements about the options. The boxed text is called a schema and each schema is given a name so that its contents can be included in other schemas by just giving the name, rather like a subroutine in a computer program.

When choosing formats for a book or paper, consider which elements will be put inside boxes, if any. In particular, consider doing this for key results, proofs, and examples.

4.11 Avoided pronouns

In mathematical writing, the word ‘it’ is lethal. Very often the reader cannot be quite sure which object ‘it’ refers to.

When referring to an object use the object's name or symbol, never ‘it’, ‘they’, or another pronoun.

5. Be complete and correct

Some say that when you write mathematics correctly it becomes long-winded and hard to follow. It can, but it doesn't have to. More important is the fact that mathematics is easier to understand when it makes sense.

The problems caused by incomplete, incorrect mathematics are serious. Even when readers can't say exactly why they feel confused and concerned they may still have good reasons for feeling that way. Mathematical writing, especially in journal papers with some mathematical content, is riddled with small but worrying mistakes and omissions.

5.1 Comprehensive assumptions

When I first started using the Z language for specifying computer systems I was surprised at how many statements where needed to set up the basic model of a system. It was easy to forget something but also possible to pick up the gaps by looking over my formulae.

In other mathematics it is also possible to leave out the less interesting assumptions and so fail to set up everything needed.

When setting up a model, include all necessary assumptions.

5.2 Provisos

In some areas of mathematics there are lots of fiddly qualifications to general rules that we should keep in mind but often forget about.

For example, in calculus it is easy to forget that functions need to be continuous, differentiable, or integrable if a theorem is to be true. Sometimes a writer will state the theorem in one place with all the proper provisos but restate it again later without some of them. It is easy for the reader to be led astray by this.

From here it is just a small step to making a mistake when a function used in a practical problem is not continuous and differentiable in the range needed.

Similarly, in probability theory the conditional probability P(A|B) (i.e. the probability of A given B) is only defined if P(B) > 0 but this crucial point is often left out when conditional probabilities are discussed.

Assumptions can be restated succinctly by bundling them up under one name so that only the name needs to be restated. This is better than not stating them at all or stating some of them in a seemingly haphazard way.

When stating assumptions and results, state all assumptions and qualifications whenever applicable. Do not skimp.

Avoid: ‘To derive Bayes's Theorem we start with the definition of conditional probability as:

P(A|B) = P(A∩B)/P(B)’

Prefer: ‘To derive Bayes's Theorem we start with the definition of conditional probabilities, which states that, for any two events, A and B, such that the probability of B is greater than zero:

P(A|B) = P(A∩B)/P(B)

If P(B) = 0 then P(AB) is undefined.’

5.3 Type declarations

In mathematics it matters a lot what type of thing you are talking about. Is a number a rational number, a real, an integer? Is something a function or a relation? If it is a function what is the type of its domain and the type of its range?

Sometimes writers introduce something and give it a name without showing its type. This introduces great doubts into the alert reader's mind.

This is staggeringly common, perhaps because the idea of rigorously typing every object has come to prominence only recently thanks to computer programming languages and systems designed to perform, support, or check mathematics.

For example, the classic school textbook question says something like ‘Solve 3x + 2 = 0’ but does not say what type of number x is. If it has to be a whole number then there are no values for x that will do the trick. The book may have stated at some point that all numbers are real numbers unless otherwise stated, so strictly speaking this is not wrong, but it fails to reinforce the good habit of always stating types.

In solving real life problems this could cause problems. For example, if a variable represents a number of children then it must be non-negative and a whole number.

Staying with equations for a moment, have you ever wondered what it means to say ‘Solve the equation ...’? It's actually quite a sophisticated concept that means different things in different contexts. However, for something like ‘Solve the quadratic equation x2 + 2x − 8 = 0’ (with x as a real number) most people would say there are two solutions, -4 and 2. These are the values that, when substituted for x make the equation true. But what is the type of that answer?

Is the solution a number? No, because even with a quadratic equation there can be 0, 1, or 2 solutions. Is it a list of numbers? No, because the order of the solutions does not matter. A way to look at this that I like is to say that ‘solving’ one of these equations means enumerating the elements of the set of numbers for which the equation is true. On this view the solution is {-4,2}.

Mistakes happen when we try to ‘solve’ equations because we think of it as an exercise in finding one number that is the solution. We should be thinking of finding a set of numbers, moving from an implicit definition of the set to one in which the actual values are listed.

The following examples show a style of equation solving in which the types involved are made clear:

Types unclear:
x2 = 4
∴x = 2

Types clear:
{x: ℜ | x2 = 4}
= {2,-2}

Types unclear:
x2 + x = 4x
∴x(x + 1) = 4x
∴x + 1 = 4
∴x = 3

Types clear:
{x: ℜ | x2 + x = 4x}
={x: ℜ | x2 − 3x = 0}
={x: ℜ | x(x − 3) = 0}
= {0,3}

Types unclear: Solve for b:
b.d + e = k
∴b.d = k − e
∴b = (k − e) / d

Types clear: Solve for b, where b, d, k, and e are real numbers. {b: ℜ | b.d + e = k}
={b: ℜ | b.d = k − e}
={b: ℜ | (d = 0 ∧ b ∈ ℜ) ∨ (b = (k − e) / d)}

Calculus gives another example of problems arising from not being clear about types. Finding definite integrals and indefinite integrals involves almost the same set of techniques. However, a definite integral is a number (e.g. the area under a graph between two points on the x axis) whereas the type of an indefinite integral of a function is almost never mentioned let alone made clear. I think it is a set of functions all of which have the property that their derivative is the same as the function you started with. The traditional way to represent this set of functions is to write what looks like one function but add ‘+ C, where C is an arbitrary constant’ to the end. No wonder beginners get confused. What they are seeing looks like one function and they've never before heard of an arbitrary constant.

Probably the most common way to state types is using a colon followed by the set to which the object belongs. For example, ‘x : ℜ’ introduces a real number called x. (The ugly ℜ symbol is the best HTML can do, but there are clearer versions of ‘R’ that can be used with other software.)

When introducing an object, always state its type in words or using the ‘:’ notation, or by saying what set it is a member of using the ∈ notation.

Avoid: ‘...where X is the value of the investment...’

Prefer: ‘...where X, a real number, represents the value of the investment...’

Avoid: ‘Let S be the sum of the values of a collection of investments...’

Prefer: ‘Let S: (bag ℜ) → ℜ be a function that returns the sum of the values of a collection of investments...’

5.4 Setting up types

If you've been reading carefully you may have noticed that in the last example I mentioned ‘bag ℜ’ and this was perhaps an unfamiliar notation. What is a bag? A bag is a mapping from a set of anything to the set of natural numbers. Imagine you have a bag of groceries and some of the items are the same as each other, perhaps because you took advantage of a ‘buy one get one free’ offer. The mathematical bag is like a list of your groceries with the number of units of each written by the side.

The point of the notation ‘bag’ is to provide an abbreviated way to express a type that is often needed. It replaces ‘ℜ → Ν’.

In serious mathematics it can be very useful to have a way to create new types and abbreviations for more complex types. The mathematical toolkit of Z, which is a way to use mathematics to describe computer systems, has a particularly well developed approach to types and its tricks could and should be used much more widely.

To introduce a new basic type you just give its name, in capitals, inside square brackets. For example, a line in a document that just reads ‘[HORSE]’ introduces a new basic type called HORSE. After that you can introduce a new object and say it is a horse by writing x : HORSE.

To introduce an abbreviation or alternative name for a type just use a double equals sign. For example, a line in a document that just reads ‘HERD == ℘ HORSE’ introduces an abbreviation for a powerset of HORSES. After that you can introduce a new object and say it is a set of horses by writing y : HERD.

When you need to use types beyond the usual number types, or need to use complicated types, set them up with helpful names using a systematic approach such as that of Z.

5.5 Type rigour

Not only is it common to find that types are unstated, but also it is common to find that types are not enforced consistently. Objects appear to change their types during documents and have done so down the generations.

In probability theory I wonder if there has been a change in views about what type of object a random variable is. Technically it is now a function that maps outcomes within a sample space to real numbers. So, if X is a random variable then X stands for the function, not for the number that the function X returns. However, it is traditional to write expressions as if X is the number returned as in ‘P(X < 4)’. If we were taking types seriously then we would need to mention the event and say something like ‘P(X(e) < 4)’ instead. What we actually write is incorrect and confusing but traditional.

I have seen functions transmogrify into the numbers they return in other places too.

Books on Bayesian statistics often represent probabilities and probability densities with the same notation, excusing this as an ‘abuse of notation’ that is intended to simplify their exposition. I think it is wrong and one of the consequences is that these books tend to prove Bayes's Theorem only in the simple case of a discrete prior distribution with discrete likelihood function, but then merrily apply other versions with a continuous prior and/or a continuous likelihood function.

The reader's confidence is knocked and it leaves a nagging feeling that something isn't quite clear and it could lead to error somewhere along the line.

Another bizarre example from calculus is the pattern of teaching where differentiation is explained as an operation on a function but at some point, without warning or explanation, differentiation gets used on things that aren't functions – or don't seem to be.

Then there is the infamous notation ‘dy/dx’ which is introduced as a single symbol that means the derivative of the variable y with respect to the variable x. The young mathematician has hardly grasped this before the dy and the dx start flying around on their own, appearing in integrals, in expansions of dy/dx, and even in equations.

It is better to be strictly correct with types, though you can appreciate that it will seem unfamiliar in a few cases because mistakes are so common and so entrenched.

When writing mathematics, make sure types are always consistent.

Avoid: ‘...the function y = f(x) ...’

Prefer: ‘...the function f ...’

Avoid: ‘Let A = {1,2,3}. Then 3 is a subset of A.’

Prefer: ‘Let A = {1,2,3}. Then {3} is a subset of A.’

Avoid: ‘The cumulative probability, P(X < x)...’

Prefer: ‘The cumulative probability, P(X(w) < x)...’

5.6 Domain definitions

Functions map objects in one set to objects in another. The objects for which a function returns a value are its domain and defining that domain is part of defining a function.

For example, here is a complete function definition for a function called ‘sqrt’ that returns the square root of numbers.

sqrt: ℜ → ℜ
x: ℜ | x ≥ 0 • sqrt(x) × sqrt(x) = x

This definition clearly states the types involved (sqrt: ℜ → ℜ) and also points out that the domain is restricted to non-negative numbers (∀ x: ℜ | x ≥ 0).

It is common to say nothing about the domain when defining functions. This can cause problems in calculus, where many useful functions cannot handle all real numbers, or where particular points are problematic and need to be excluded from the domain for some theorems to hold.

Avoid: ‘If f(x) = x2 + 2, what is f(3)?’

Prefer: ‘If f:ℜ → ℜ is a function where, ∀ x:ℜ • f[x] = x2 + 2, what is the value of f[3]?’

5.7 Labelled axes

Each axis on a graph should be labelled to show which variable it represents. This is so basic you might wonder if anyone could fail to do it and yet it happens. In one case a graph called the ‘dog bone’ produced by consultants had axis labels that did not identify the variables. It looked pretty and the story told as it was displayed on the screen was plausible enough at a very superficial level. However, asking what the axes represented revealed immediately that it was consultant's nonsense.

When creating a graph, ensure that all axes are clearly labelled to identify the variables they represent.

5.8 Genuine implications

It can be confusing to imply that something follows from what has been said already using a word like ‘thus’ when in fact it does not follow. Too often, when the word ‘thus’ is used what follows is a non sequitur i.e. it does not follow.

Occasionally, what follows ‘thus’ actually contradicts what precedes it.

When tempted to write ‘thus’, consider if the next statement really does follow.

Avoid: ‘A random variable can be thought of as an unknown value that may change every time it is inspected. Thus, a random variable can be thought of as a function mapping the sample space of a random process to the real numbers.’

Prefer: ‘A random variable is sometimes thought of as an unknown value that may change every time it is inspected. However, a random variable is really a function mapping the sample space of a random process to the real numbers. Thus, a random variable is neither random nor a variable.’

5.9 Limit directions

One of the most devious yet useful mathematical techniques is to study the value an expression moves towards as numbers in it are changed. Sometimes it is possible to say that the value of the expression is clearly heading to a limit as a crucial number in it gets closer and closer to (typically) either zero or infinity.

In calculus it is often possible to approach a limiting point either from above or below but sometimes authors forget to mention which direction they are approaching from. In some cases the limit idea works from one direction but not from the other, so it may matter.

For example, x/sin(x) is undefined when x = 0 because sin(0) = 0. However, the limiting value of x/sin(x) as x gets closer to 0 is 1, whether you approach from below or above. The traditional notation for limit when approaching from below (i.e. x starts as less than 0 and moves up towards 0) is:

lim       x/sin(x) = 1
x → 0-

The notation for the limit when approaching from above (i.e. x starts as more than 0 and moves down towards 0) is:

lim       x/sin(x) = 1
x → 0+

In this case the limits are the same from either direction so it is correct to write:

lim       x/sin(x) = 1
x → 0

When I searched the internet for a proof of this the first one I found ignored the issue of direction, made no distinction, and demonstrated the result from above only. This is not unusual but is incomplete and may lead to error.

When working with limits, always consider the issue of direction of approach and state the direction of approach if you are assuming one.

5.10 Write limit whenever needed

Writing out the ‘limit’ part of a statement about a limit is work and some lazy writers stop mentioning it prematurely.

When working with limits, remember to keep stating the ‘limit’ part as long as needed.

‘Lim       x/x2
x → ∞

= 1/x

= 0

‘Lim       x/x2
x → ∞

= Lim       1/x
  x → ∞

= 0’

5.11 And only if

When writing ‘if and only if’ do not be lazy and stop mentioning the ‘and only if’ part.

When writing the phrase ‘if and only if’ do not leave out the ‘and only if’ part if it is true, even if you think it is obvious.

5.12 Explicit quantifiers

Often quantifiers are missed out and this can leave the reader in doubt.

When writing statements, be explicit about quantifiers.

Avoid: ‘The product rule can be stated as:

(f ⋅ g)'(x) = f'(x)g(x) + f(x)g'(x)’

Prefer: ‘The rule can be stated as follows: For all x, real numbers, and all functions from real numbers to real numbers, f and g, with certain provisos,

(f ⋅ g)'(x) = f'(x)g(x) + f(x)g'(x)’

5.13 Explicit overloading

Overloading is where a symbol for an operator (often + or . or simply juxtaposition) is used for analogous operations on different types of object. For example, adding two numbers is different from adding two matrices, which is also very different from pointwise addition of functions. I argue against reusing symbols (which is what overloading is) but some examples are standard (e.g. the = sign is used for equating many different things). If you want to do it then be explicit. Tell the reader what is going on.

I've noticed that overloading is very common in calculus rules. For example, here is a familiar rule that uses indefinite integrals (i.e. integrals that give sets of functions):

∫ (u + v) dx = ∫ u dx + ∫ v dx

The first time the symbol '+'' appears it is the pointwise addition of the two functions of x, u and v. However, when it appears again on the right hand side of the equation it is now combining two sets of functions (the antiderivatives of u and v) in a specific way that forms a new set of functions.

If you're still following this, consider the version of the rule that uses definite integrals (i.e. integrals that give numbers):

A (u + v) dx = ∫A u dx + ∫A v dx

In this version the '+' on the left hand side is, again, pointwise addition of two functions, but this time the '+' on the right hand side is simply addition of two numbers.

Reform of the notation for calculus as it is taught to school children will take some time. The notations (and associated ideas) created by pioneers of calculus are still around, jostling for position in books and websites today.

PioneerNotation for
second derivative
Newton (1643 – 1727)ü
Leibniz (1646 – 1716)d2y ⁄ dx2
Euler (1707 – 1783)D2 f
Lagrange (1736 – 1813)f''

When overloading operator symbols, seriously consider using separate symbols instead, or explain the overloading to the reader when they need to know.

Avoid: ‘The rule can be stated as:

(f ⋅ g)' = f' ⋅ g + f ⋅ g' ’

Prefer: ‘The rule can be stated as follows: For all functions from reals to reals, f and g, with certain provisos:

(f ⋅ g)' = f' ⋅ g + f ⋅ g'

(Note that the operators ⋅ and + are pointwise function operators for multiplication and addition.)’

5.14 Reliable terminology

Misleading terminology can be a huge problem in any field, and especially in mathematics, where understanding is often so hard to achieve in the best circumstances.

Terminology should be reliable in the sense that obvious inferences the reader might make from the terminology should usually be correct. In other words, the thing is what it sounds like. Unfortunately, some mathematical terminology is anything but reliable.

For example, the phrase ‘random variable’ should, surely, refer to something that is (1) random, and (2) a variable. Neither of these is true! A random variable is actually an entirely deterministic function from items within a sample space to real numbers. So, it's deterministic, not random, and it's a function, not a variable.

As another example, a ‘normed vector space’ sounds like it will be a vector space that's had some kind of ‘norming’ done to it so now it's ‘normed’. Not so. It is a pair of objects, one of which is a vector space and the other is a function that takes vectors from the vector space and returns positive scalars. The vectors in the vector space haven't actually been ‘normed’ but the function is there and could be applied to any of them. Incidentally, the terminology causes more problems because writers flip between mentioning the norm function and not bothering. Readers are supposed to use the context to work out what is going on.

When inventing terminology, make it reliable. Carefully consider what alternative terms suggest and whether these suggestions could be misleading.

6. Maximize basic legibility

6.1 Proper sentences

Even with mathematical writing, readers look out for punctuation, capital letters, and sentence structure as clues to the beginning and end of sentences and subclauses. They can be thrown into confusion by missing or incorrect punctuation and other writing errors.

When writing in paragraph form, create proper sentences with proper sentence punctuation, even around formulae.

6.2 Separate formulae and text

Although sentences may contain formulae it is unsettling and can lead to errors if you use mathematical symbols outside formulae.

When writing sentences, do not use symbols like ∃, ∀, λ, ⇒, ≈, =, > unless they are part of a formula; replace them by words.

Avoid: ‘Let S be the set of all numbers of absolute value = 1.’

Prefer: ‘Let S be the set of all numbers of absolute value equal to 1.’

6.3 Equals not is

Conversely, using words within what is very close to being a complete formula looks odd and can be an error or ambiguous.

When writing an equation in a sentence, use the appropriate equality symbol rather than saying ‘is’.

Avoid: ‘... so therefore R is nV

Prefer: ‘... so therefore R = nV.’

6.4 Initial word

When starting a sentence, begin with a word, not a mathematical symbol.

Avoid: ‘ prove the continuity of f(x) = 2.cos(x).sin(x). cos(x) being continuous...’

Prefer: ‘ prove the continuity of f(x) = 2.cos(x).sin(x). Since cos(x) is continuous...’

6.5 Separated formulae

When two formulae follow each other in a sentence, separated by only some punctuation, readers may be slow to realize there are two formulae.

Sometimes, the words are needed for the formulae to make sense.

When writing a sentence where two formulae appear close together put at least one appropriate word between them.

Avoid: ‘Consider Sp, p = 1, ... ,n.’

Prefer: ‘Consider Sp for p = 1, ... ,n.’

Avoid: ‘If x = 2, y = 3, z = 4.’

Prefer: ‘If x = 2 and y = 3, then z = 4.’

6.6 Line breaks in formulae

Formulae are usually easier to read if they are not wrapped onto a second line. However, some formulae are very long and others have lots of parentheses nested inside each other.

Software source code also contains lots of parentheses and other constructs that are deeply nested. The solution programmers use is to be systematic about line breaks and use increasingly indented lines to indicate lower levels of a hierarchy. When this is done automatically it is called pretty printing.

Here's a little bit of Javascript from one of my websites to show what that looks like:

function lvl_name(txt)
  var lvl_txt = "";
  var ptr = 0;
  while (txt.substring(ptr,ptr+1) != ":")
    lvl_txt += txt.substring(ptr,ptr+1);
  return lvl_txt;

Some work has been done to apply this to mathematical formulae and in some cases it seems to help. However, in others the pretty printing style breaks a perfectly readable formula over several lines making it harder to read, not easier. Perhaps the best approach is to go to new lines only when it helps.

What would you do with this formula?

|- (((a → (a → c)) → (a → c)) → (((a → b) → (a → (a → c))) → ((a → b) → (a → c))))

It fits on one line with no problem but the brackets are hard to keep track of. I had to count them carefully to make sure I had typed them correctly. In this version some light ‘pretty printing’ has been used.

|- (
     ((a → (a → c)) → (a → c))
     → (
          ((a → b) → (a → (a → c)))
          → ((a → b) → (a → c))))

I think this helps a bit (if your screen is big enough), particularly for people who have learned to recognize, at a glance, formulae such as ((a → b) → (a → c)).

When writing a long formula, try to fit it onto one line. Otherwise, break onto a new line at a sensible place and indent the second and subsequent lines a little.

6.7 Italicized variables

Variables in text don't stand out very well, especially if they are common letters such as ‘a’ and ‘I’. Variable names in formulae are often in italics (by default with some software), so they should also be in italics when they appear alone in a sentence. If your variable names are not in italics in formulae then they should not be in italics when appearing alone in text.

When writing variables, format them in italics within formulae and when they appear alone in text.

6.8 Little spaces

Formulae are often easier to read with some spaces included. Some formatting software puts them in automatically.

When writing formulae, consider where a little space would improve readability, especially around =, <, ≤, >, ≥, ∨, ∧, ⇒, ⇐, and ⇔.

Avoid: ‘4-2=2’

Prefer: ‘4 − 2 = 2’

Avoid: ‘p∧q⇒p’

Prefer: ‘p ∧ q ⇒ p’

6.9 Comma-less introduction

Putting commas around the symbol for a new object when introducing it is not necessary and can make the variable name a little harder to read.

When introducing the symbol for a new object do not put commas around it unless a comma afterwards is needed for the sentence as a whole.

Avoid: ‘If the discriminant, Di, is non-negative, then the roots are real.’

Prefer: ‘If the discriminant Di is non-negative then the roots are real.’

6.10 Spelling out abbreviations

Some abbreviations that can be handy when doing some calculations for your own benefit look odd and don't help other readers.

When you want to say ‘with respect to’, ‘without loss of generality’, or ‘if and only if’, use the full words, not the abbreviations.

6.11 Separated proofs

Proofs are easier to see and to follow if formatted in a systematic way.

When beginning a proof, start with some extra white space then a statement of what is to be proved.

When ending a proof, use a simple word or phrase to show the end has been reached then add a little extra white space.

6.12 Summary tables

Sometimes it is helpful to readers to summarize useful facts, definitions, formulae, etc in one place, usually in some kind of table.

When you have several facts of the same kind that readers might like to see summarized, insert a table to show them in one place.

6.13 If then

The word ‘if’ used in a sentence creates the expectation of a ‘then’ and if the word ‘then’ does not appear when it should then some readers will be confused.

When you introduce a condition with ‘if’, introduce the conclusion with ‘then’.

6.14 Descriptive titles

When presenting a graph or table, provide a descriptive title.


Guides to writing mathematics

‘The most common errors in undergraduate mathematics’ by Eric Schechter, Associate Professor, Math Department, Vanderbilt University.

‘Guidelines on Mathematical Style’ by Jeremy L. Martin, Assistant Professor, Department of Mathematics, University of Kansas.

‘Algorithmic Proof Style Guide’ (February 9, 2009) by Ned Dimitrov, a post-doc in the Operations Research Program at UT Austin.

‘Writing math in paragraph style’ by Tim Hsu, associate professor in the Math department at San Josť State University.

I also picked up some useful points from Norman Megill of Metamath fame when he commented on an earlier version of this article.

Proposed styles for writing proofs

Wim Feijen's website has some amazing publications and some of the clearest mathematical writing I've seen. Computer programming has obviously been an influence and Wim and his associates have devised their own notation that improves on traditional notation is many ways.

‘The notational conventions I adopted, and why’ by Professor Edsger Dijkstra is recommended by Wim and terrific.

The same sort of thinking is explained in ‘Calculational mathematics: writings on the predicate, relational, and other calculi’ by Edsger Dijkstra, Wim Feijen, and Netty van Gasteren

‘Designing a Calculational Proof of Cantorís Theorem’ by Edsger W. Dijkstra and Jayadev Misra gives another example of the style.

A more conventional proof style is proposed by Leslie Lamport in ‘How to Write a Proof’ (1993).

Information on computer supported mathematics

Z (the language for specifying computer systems; has excellent type system)

Metamath (the most user friendly proof system I've seen)

Mizar (tries to support very traditional looking proofs)

Isabelle (another sophisticated one)

HOL-light (and another)

If you have any comments on this article, questions, or ideas for new topics, please let me know.

About the author: Matthew Leitch has been studying the applied psychology of learning and memory since about 1979 and holds a BSc in psychology from University College London.