4 Proofs to Mantel's Theorem

The following is a re-upload of a post in my previous blog (which was destroyed by mistake).

Things will be slower these 3 weeks as I'm travelling on my grad trip. I'll be attending ICAPS 2016 in Kings College London (12-17 June 2016). Maybe the next post will be something from it :)

Edit: Wow this took way too long as I had much less free time than expected

I looked a little into extremal graph theory during my undergraduate thesis on maintaining maximal independent sets on dynamic graphs. In particular, I came upon triangle-free graphs which have some interesting properties:

  • The neighbourhood set of any vertex in a triangle-free graph form an independent set
  • A triangle-free graph with n vertices will have at most $\frac{n^2}{4}$ edges

The first property is a direct implication of a graph being "triangle-free". That means for any 2 adjacent vertices x and y, they cannot share a common neighbour z. In other words, no 2 vertices in any neighbourhood set can be adjacent (otherwise, they share a common neighbour).

The second property was a special case of Turán's theorem with r = 2. It turns out that the special case of r = 2 was also named after Mantel. Unfortunately, I was unable to find any references to the original paper on Mantel's theorem, and all books and articles referencing the Turán's theorem pointed to a Hungarian paper1 (I was also unable to find the paper itself).

In the past few days I have been browsing through "Extremal combinatorics: with applications in computer science" by Stasys Jukna2 and watching some YouTube videos by Robert Morris3 who has a book called "Extremal and Probabilistic Combinatorics". Mantel's Theorem is always mentioned because Turán's theorem is a rather basic result in extremal graph theory.

Jukna's book is a nice read. It's quite concise but that means sometimes I have to sit on a page for a while to follow his succinct argument. I never knew the name for the "double counting" technique (Chapter 1 Section 4) till now - I've just been using it without knowing its name!4 I've also noticed similarities between the "weight shifting argument" (A counting technique in Chapter 4 Section 7) and "cut and paste argument" which is commonly used to prove correctness of greedy algorithms.

In Chapter 4, 4 different proofs for Mantel's theorem are provided, each highlighting a different counting technique. I find this particularly interesting and worthy of sharing in a self-contained manner.

In this post, I will cover the following:

I will not directly copy the proofs from the book. Instead, I will reconstruct the proofs myself in a way that I feel is clearest to understand. Hence, if you read the book, you might notice that I added more details or rephrased certain things in certain ways.


Mantel's Theorem (1907) states that:

If a graph G with n vertices contains more than $\frac{n^2}{4}$ edges, then G contains a triangle.

It's a special case of Turán's theorem, with r = 2, proven 34 years earlier by Mantel.

The 4 proof techniques that we will be looking at are

  1. Double counting4
  2. Mathematical induction
  3. Geometric mean $\leq$ Arithmetic mean ; Maximum Independent Sets
  4. Weight shifting argument

"Double counting" usually carries a different connotation of over-counting something, like in the case of counting set unions (hence we need to subtract their intersection: $|A \cup B| = |A| + |B| - |A \cap B|$). However, I will stick to Jukna's naming convention for this technique for this blog post.

The 3$^{\text{rd}}$ point isn't really a technique but just combining the relation of means with the first observation in my preamble. In the sections below, I will introduce each approach before going into the proof that uses that technique.

Proof 1: Double counting4

The principle of double counting tells us that counting the same thing in different ways will give us the same value. This allows us to establish relationships between mathematical terms.

Before I give the proof, let me illustrate the idea behind double counting using a graph. I will state and prove 2 simple results using double counting in this example.

Graph example

Consider the following graph:
Example graph

We can represent graphs in a few ways (see below). Note that only the black text are used for the graph representation. Red and blue text are for annotation.

  1. Adjacency list

    Adjacency list

  2. Adjacency matrix

    Adjacency matrix

    The symmetry along the diagonal is because the graph is undirected. This is a 0-1 matrix because the graph is unweighted.

  3. Edge list

    Edge list

    An edge list simply stores pairs of vertex IDs. Strangely, Wikipedia doesn't have an article on this.

  4. Incidence matrix

    Incidence matrix

    Instead of showing the relationship between vertices, an incidence matrix captures the relation between a vertex and edge. The (i,j)-entry of this 0-1 matrix indicates 1 if and only if vertex $v_i$ is one of the endpoints of the edge $e_j$.

The first 2 are commonly taught in typical computer science courses and the last is more often seen in applied mathematics. Here's a Khan Academy article explaining the first 3.

Let us focus our attention to the incidence matrix. Clearly summing up the matrix row-by-row should give us the same value as summing it up column-by-column. This particular application of double counting yields the well-known Handshaking Lemma.

Theorem (Handshaking Lemma):
In any graph $G = (V,E)$, the sum of degrees of all vertices is twice the total number of edges, i.e.
$$\sum_{v \in V} d(v) = 2 \cdot |E|$$

In a incidence matrix, the sum across row i yields the degree of vertex i. Summing up the matrix row-by-row is then equivalent to taking the sum of degrees of all vertices.

At the same time, each column contains exactly 2 entries of "1" with the rest being "0" since each edge involves exactly 2 vertices.

A less obvious result from double counting the incidence matrix is as follows.

In any graph $G = (V,E)$, the squared sum of degrees of all vertices is equal to the sum of degree of endpoints of all edges, i.e.
$$\sum_{v \in V} d(v)^2 = \sum_{(x,y) \in E} (d(x) + d(y))$$

Note that $\sum_{v \in V} d(v)^2 = \sum_{v \in V} d(v) \cdot d(v)$. That means instead of counting "1", we count "d(v)" as we sum up each row.

To visualise this sum on the left hand side, consider the following matrix conversion from A to B, where Matrix A is the incidence matrix and Matrix B is derived from A by replacing all 1's with the degree of the vertex.

Matrix A to Matrix B

The left sum $\sum_{v \in V} d(v)^2$ is then the sum of Matrix B, row-by-row. It is clear by looking at Matrix B that summing it up column-by-column corresponds to the right sum $\sum_{(x,y) \in E} (d(x) + d(y))$.

Proof of Mantel's Theorem

This proof actually uses the 2$^{\text{nd}}$ result that we proved in the example and the Cauchy-Schwartz inequality, one of the most important and frequently used inequalities in Mathematics (alongside the triangle inequality). For completeness, I'll state the Cauchy-Schwartz inequality here: For 2 sequences of real numbers $a_1, a_2, ..., a_n$ and $b_1, b_2, ..., b_n$,
$$(\sum_{i=1}^n a_ib_i)^2 \leq (\sum_{i=1}^n a_i^2)(\sum_{i=1}^n b_i^2)$$

If a graph $G$ with $n$ vertices contains more than $\frac{n^2}{4}$ edges, then $G$ contains a triangle.

Suppose $G$ has no triangles, we show that $m \leq \frac{n^2}{4}$.

Consider any 2 adjacent vertices $x$ and $y$ and their neighbourhood sets (excluding $x$ and $y$ themselves).

$d(x) + d(y) \leq n$

Since $G$ does not have any triangles, these neighbourhood sets (the red and blue parts in the diagram) cannot have any intersection (the purple part in the diagram). i.e. $N(x) \cap N(y) = \emptyset$. Otherwise, for any vertex $z$ in the intersection, "$x-y-z$" forms a triangle.

This implies that adding the vertices x and y with their neighbourhood sets (the red and blue parts in the diagram) yield at most all possible vertices: $2 + |N(x) \setminus \{y\}| + |N(y) \setminus \{x\}| \leq n$. i.e. $d(x) + d(y) \leq n$

Using the Cauchy-Schwartz inequality with $a_i = d(x_i)$ and $b_i = 1$ $\forall i = 1, ... , n$, we have: $(\sum_{x \in V} d(x))^2 \leq (\sum_{x \in V} d(x)^2)(\sum_{i=1}^n 1^2)$. Rearranging, we get: $\frac{(\sum_{x \in V} d(x))^2}{n} \leq \sum_{x \in V} d(x)^2$.


$$ \begin{array}{rcll} \frac{(2m)^2}{n} & = & \frac{(\sum_{x \in V} d(x))^2}{n} & \text{By handshaking lemma}\\ & \leq & \sum_{x \in V} d(x)^2 & \text{From Cauchy-Schwartz inequality above}\\ & = & \sum_{(x,y) \in E} (d(x) + d(y)) & \text{From 2$^{\text{nd}}$ result above}\\ & \leq & \sum_{(x,y) \in E} n & \text{Since $d(x) + d(y) \leq n$}\\ & = & mn \end{array} $$

Rearranging $\frac{(2m)^2}{n} \leq mn$, we get our desired result $m \leq \frac{n^2}{4}$.

Proof 2: Mathematical induction

There are 2 forms of mathematical induction:

  1. (Weak/Regular) Mathematical Induction

    • Statement holds for some base case (say $n = 1$)
    • If statement holds for some case (say $n = k$), then it holds for the next case ($n = k+1$).
  2. Strong Mathematical Induction

    • Statement holds for some base case (say $n = 1$)
    • If statement holds for all smaller cases (say $n \leq k$), then it holds for the next case ($n = k+1$).

Although the 2nd lines look different, both version of mathematical induction are actually equivalent. i.e. they are equally powerful - what you can prove with one, you can prove with the other. There are many proofs of equivalence out there, but here's one that may be useful. Here and here are forum exchanges regarding the equivalence.

In practice, people use strong induction if they require more than 1 previous step.

While using induction, one needs to be clear about what is being inducted over. In simple statements like "$n^2 \leq 2^n, \forall n \geq 4, n \in \mathbb{N}$", it is clear that we are inducting over the natural number $n$. In graphs, we could induct over several things - the number of vertices $n$, the number of edges $m$, size of groups of certain vertices, etc.

Proof of Mantel's Theorem

This proof is also mentioned in Robert Morris's lecture video at roughly 25min 37seconds.

If a graph $G$ with $n$ vertices contains more than $\frac{n^2}{4}$ edges, then $G$ contains a triangle.

Consider the statement "If $m \geq \frac{n^2}{4}$, then $G$ contains a triangle". Let us induct over the number of vertices $n$.

When $n = 1$, we cannot have $\frac{1^2}{4} + 1 = \frac{5}{4}$ edges.

When $n = 2$, we cannot have $\frac{2^2}{4} + 1 = 2$ edges.

In either case, the statement is vacuously true since the antecedent does not hold.

Suppose the statement holds for all $n \leq k$ and $k \geq 2$. Let us consider the case when $n = k+1$. Then, $m = \frac{(k+1)^2}{4} + 1$.

Pick any edge $(x,y)$. Consider the subgraph $G' = G \setminus \{x,y\}$ that has $k-1$ vertices - $G$ originally had $k+1$ vertices and $G'$ excluded vertices $x$ and $y$.

Subgraph $G' = G \setminus \\{x,y\\}$

Case 1: $G'$ has $m' \geq \frac{(k-1)^2}{4} + 1$ edges

Then we are done: Our inductive hypothesis tells us that $G'$ will contain a triangle, so $G$ (which contains $G'$) will contain a triangle.

Case 2: $G'$ has $m' \leq \frac{(k-1)^2}{4}$ edges

$$ \begin{array}{rcll} m & = & \frac{(k+1)^2}{4} & \text{$G$ has $m = \frac{(k+1)^2}{4}$ edges}\\
& = & d(x) + d(y) - 1 + m' & \text{Subtract 1 as edge $(x,y)$ is counted both in $d(x)$ and $d(y)$}\\ & = & d(x) + d(y) - 1 + \frac{(k-1)^2}{4} & \text{$G'$ has $m' = \frac{(k-1)^2}{4}$ edges}\\ \end{array} $$ Rearranging $\frac{(k+1)^2}{4} = d(x) + d(y) - 1 + \frac{(k-1)^2}{4}$, we get $d(x) + d(y) = k + 2$. This tells us that there is a vertex, say $z$ that lie both in $N(x)$ and $N(y)$. Hence, "$x-y-z$" forms a triangle.


In either case, we will find a triangle in the original graph $G$. Hence, the statement holds by mathematical induction.

Proof 3: Inequality of means & Maximum Independent Sets

Inequality of means

There are many ways of computing the "mean" of $n$ items $x_1, ..., x_n$:

  • Root Mean Square: $\sqrt{\frac{x_1^2 + ... + x_n^2}{n}}$

  • Arithmetic Mean: $\frac{x_1 + ... + x_n}{n}$

  • Geometric Mean: $\sqrt[n]{x_1 \cdot ... \cdot x_n}$

  • Harmonic Mean: $\frac{n}{\frac{1}{x_1} + ... + \frac{1}{x_n}}$

Though they look different, there is an inequality equation relating each of them:

$$ \frac{n}{\frac{1}{x_1} + ... + \frac{1}{x_n}} \leq \sqrt[n]{x_1 \cdot ... \cdot x_n} \leq \frac{x_1 + ... + x_n}{n} \leq \sqrt{\frac{x_1^2 + ... + x_n^2}{n}}$$

To find out more, check out here and here. We will only require $\sqrt[n]{x_1 \cdot ... \cdot x_n} \leq \frac{x_1 + ... + x_n}{n}$ (Geometric mean $\leq$ Arithmetic mean) in this proof.

Maximum Independent Sets

An independent set is a subset of vertices such that no 2 vertices are adjacent with each other. Any single vertex can be a valid independent set and it is of interest to find a large independent set.

There are 2 notions of "as large as possible". A maximal independent set (MIS) is an independent set such that adding any other vertices to this set will invalidate its independence property. A maximum independent set (MaxIS), however, is the largest possible sized independent set within a graph. Clearly, any MaxIS is also a MIS but not the other way round (Analogy: Any square is a rectangle).

Star graph $S\_8$

The image about shows the only 2 possible MIS choices in a star graph $S_8$. The independent set on the left is the MaxIS of this graph. This example illustrates the point that that the size difference between a MaxIS and MIS can indeed be very large (as much as $\mathcal{O}(n)$), depending on the graph.

Independent sets are wildly interesting in their own right. In this proof, we will only be using 2 things:

  • Any 2 vertices in an independent set cannot be adjacent
  • The definition of maximum independent set MaxIS

Proof of Mantel's Theorem

If a graph $G$ with $n$ vertices contains more than $\frac{n^2}{4}$ edges, then $G$ contains a triangle.

Let $A$ be the maximum independent set of $G$ and $B = V \setminus A$.

Since A is an independent set, there cannot be an edge joining 2 vertices in A. So, all edges have endpoints in B. That means $m \leq \sum_{x \in B}d(x)$.

Note that handshaking lemma does not apply here ($\sum_{x \in B}d(x) \neq 2m$) since we are not counting the degree of the vertices in A, whose sum is non-zero. If $\sum_{x \in A}d(x) = 0$, we can increase the set A by adding any vertex from B, contradicting its assumed maximality.

In any triangle free graph, neighbours of any vertex $x \in V$ form an independent set. So, $d(x) \leq |A|, \forall x \in V$.

$$ \begin{array}{rcll} m & \leq & \sum_{x \in B}d(x) & \text{From observation above}\\
& \leq & \sum_{x \in B}|A| & \text{Since $d(x) \leq |A|, \forall x \in V$}\\ & = & |A| \cdot |B| & \text{Taking summation}\\ & \leq & (\frac{|A| + |B|}{2})^2 & \text{(Geometric mean)$^2$ $\leq$ (Arithmetic mean)$^2$}\\ & = & \frac{n^2}{4} & \text{Since |A| + |B| = n} \end{array} $$ $\Box$

Proof 4: Weight shifting argument

The weight shifting argument is very similar in flavour the standard "Exchange" and/or "Cut and paste" proof techniques used to prove optimality of greedy algorithms. (The links given for the techniques are examples from actual slides that I found online.)

Informally, there are 2 ways of looking at those approaches:

  1. From any optimal solution, we can replacing/exchanging certain parts of the solution with something that our greedy solution will produce, and still yield an optimal result
  2. Suppose one does not use our greedy approach, then we can show that we can "cut" out one part of his/her solution and "paste" in a replacement from the greedy solution to yield a better output (hence contradicting the assumption that his/her solution was optimal to begin with)

The weight shifting argument is somewhat similar - some weight is distributed among items and the weights are shifted around to optimise some condition.


Here's a trivial example from Jukna's book (Page 62, Proposition 4.12).

Let $n \leq m < 2n$. For any distribution of pigeons into $n$ holes such that no hole is left empty, at most $2(m-n)$ pigeons will not be alone.

Consider any valid assignment of $m$ pigeons into $n$ holes. If there exists a hole $A$ with $>2$ pigeons, there exists a hole $B$ with only 1 pigeon (since $n \leq m < 2n$). Shifting a pigeon from $A$ to $B$ increases the total number of non-lonely pigeons by 1. Hence, the optimal way to allocate would be to no more than 2 pigeons in any hole.

This is equivalent to: (i) Put 1 pigeon in each hole to ensure that no holes are empty; (ii) Put the remaining $(m-n)$ pigeons in different holes. This gives us $(m-n)$ holes with 2 pigeons and $2(m-n)$ non-lonely pigeons.

In this example, the pigeons can be seen as weights on the holes that are being shifted around. In less trivial examples, the weight that is being moved around is just a number/value (and it may be hard to assign a meaning to that number). The reason for the moving is to optimise some objective, which in this case was to maximise non-lonely pigeons.

Proof of Mantel's Theorem

If a graph $G$ with $n$ vertices contains more than $\frac{n^2}{4}$ edges, then $G$ contains a triangle.

Suppose $G$ has no triangles, we show that $m \leq \frac{n^2}{4}$.

  1. Assign weights $w_1, ... , w_n$ to each of the $n$ vertices such that $\sum_{i=1}^n w_i = 1$.
  2. Define $W_x :=$ Sum of weights of vertices adjacent to vertex $x$
  3. Define $S := \sum_{(x,y) \in E} w_x w_y = \frac{1}{2}\sum_{x \in V}w_x \cdot W_x$
  4. Define $S^* :=$ Maximum $S$ over all possible weight assignments

We will now show that (i) $\frac{m}{n^2} \leq S^*$; (ii) $S^* = \frac{1}{4}$, hence yielding $\frac{m}{n^2} \leq \frac{1}{4}$, which rearranges to our desired $m \leq \frac{n^2}{4}$.

Proof for (i) $\frac{m}{n^2} \leq S^*$:

Consider a valid weight assignment of $\frac{1}{n}$ to every vertex. Then $S = \sum_{(x,y) \in E} w_x w_y = \sum_{(x,y) \in E} \frac{1}{n} \cdot \frac{1}{n} = \sum_{(x,y) \in E} \frac{1}{n^2} = \frac{m}{n^2}$.

Since this is a valid assignment and $S^*$ is defined to be the maximum possible $S$, we have $\frac{m}{n^2} \leq S^*$.

Proof for (ii) $S^* = \frac{1}{4}$:

If there is an adjacent pair of vertices $x$ and $y$ such that $W_x > W_y$, then we can shift all the weight from $y$ to $x$ to yield a larger $S$ value.

$\forall \epsilon \geq 0$, $$ \begin{array}{rcll} 2 \cdot \text{(New $S$)} & = & \sum_{v \in V}w_v \cdot W_v\\
& = & [\sum_{v \in V, v \neq x, v \neq y}w_v \cdot W_v] +(w_x + \epsilon)W_x + (w_y - \epsilon)W_y & \text{By shifting $\epsilon$ from $y$ to $x$}\\ & = & [\sum_{v \in V, v \neq x, v \neq y}w_v \cdot W_v] + w_xW_x + w_yW_y + \epsilon(W_x - W_y)\\ & > & [\sum_{v \in V, v \neq x, v \neq y}w_v \cdot W_v] + w_xW_x + w_yW_y & \text{Since $W_x > W_y$}\\ & = & 2 \cdot \text{(Original $S$)} \end{array} $$

Note: The claim always holds. But if vertex y has no weight, then there's nothing to move.

The claim implies that the weight shifting stops when all vertices with weights have greater or equal weights compared to all their neighbours. This happens only when weights are concentrated in complete subgraphs.

It takes another observation to note that we attain larger values for $S$ if all the weight accumulates on one single complete subgraph than being split across multiple disjoint complete subgraphs.

Since $G$ is triangle free, the largest complete subgraph is $K_2$ (a single edge), hence $S^* = \max_{w_x \in [0,1], w_y \in [0,1]}\{w_x w_y: w_x + w_y = 1\} = \frac{1}{4}$, where $w_x = w_y = \frac{1}{2}$.


By combining (i) $\frac{m}{n^2} \leq S^*$ and (ii) $S^* = \frac{1}{4}$, we get $\frac{m}{n^2} \leq \frac{1}{4}$, which rearranges to our desired $m \leq \frac{n^2}{4}$.


It's always interesting when there are multiple ways of getting something done. Usually, either each way reveal some new property of the problem or we learn relationships between seemingly different approaches.

This post is also the first post that I took over 1 week to draft (usually I take only 1~2 days). I tried hard to maintain a consistent tone throughout. Hope it worked!

Going further

Definitely give the books by Jukna2 and Morris5 a go. If you prefer videos, you can also check out Robert Morris's lecture series3.

References and Footnotes

  1. Turán, P. (1941). On an extremal problem in graph theory. Mat. Fiz. Lapok, 48(436-452), 137.

  2. Jukna, S. (2011). Extremal combinatorics: with applications in computer science. Springer Science & Business Media.

  3. Such as this series of lectures

  4. I've also been told by Shunhao that what Jukna calls "double counting" is also better known as "combinatoric proof". I'm not entirely sure which is "better". On one hand, "double counting" clearly tells you what it's trying to do and "combinatoric proof" is a little vague (like how the term "dynamic programming" doesn't tell much about "avoiding repeated computation"). However, there is already an existing connotation for "double counting", which may lead to confusion. So... shrug. For this post, let's just stick to Jukna's since the idea for this post came from his book.

  5. Morris, R., & Imbuzeiro, R. (2011). Extremal and Probabilistic Combinatorics. SBM. 28o CBM.