Data Processing Club

An Intuitive Proof That Every Real Symmetric Matrix Can Be Diagonalized by an Orthogonal Matrix

It is well known that eigenvalues of a real symmetric matrix are real values, and eigenvectors of a real symmetric matrix form an orthonormal basis. This theorem plays important roles in many fields. For example, the principal component analysis relies on this theorem.

Although every textbook on linear algebra contains a proof of this theorem, it is not easy to understand it intuitively. I will give an intuitive explanation of this theorem in this post.

Review: A Typical Proof

We review a typical proof. If you are familiar with it, you can skip this part to the “main part.” You don’t need to understand this proof completely but only have to understand that this proof is not intuitive. The results in this section will not be used in the following discussion.

Existence of eigenvalues

The existence of eigenvalues is not trivial. Most textbooks define characteristic polynomials based on determinants and prove the existence of the roots of the polynomial by the fundamental theorem of algebra.

Determinants and the fundamental theorem are not intuitive. I think these tools make it difficult to understand the existence of eigenvalues.

As we will see later, we can prove the existence of eigenvalues without determinants nor the fundamental theorem as for symmetric matrices.

Proof that eigenvalues of a real symmetric matrix are real

Let \(A\) be a real symmetric matrix and \(\lambda\) be a complex eigenvalue of \(A\). Here, there exists a complex vector \(x\) such that

\[A x = \lambda x, x \neq 0,\]

by the definition of eiganvalues. We take the complex conjugates of both sides. Since \(A\) is real, \(\bar{A} = A\) holds, where the overline denotes the element-wise complex conjugate. Thus,

\[A \bar{x} = \bar{\lambda} \bar{x}.\]

We take the transpose of both sides. As \(A\) is symmetric.

\[\bar{x}^\top A = \bar{\lambda} \bar{x}^\top.\]

Here,

\[\bar{\lambda} \bar{x}^\top x = \bar{x}^\top A x = \bar{x}^\top \lambda x = \lambda \bar{x}^\top x. \quad \cdots (1)\]

Besides, since \(x \neq 0\),

\[\bar{x}^\top x = \bar{x}_1 x_1 + \cdots + \bar{x}_n x_n > 0.\]

Therefore, if we devide the both sides of (1) by \(\bar{x}^\top x\), we have \(\bar{\lambda} = \lambda\), which means that \(\lambda\) is real.

Proof that eigenvectors of different eigenvalues are orthogonal

Let \(A\) be a real symmetric eigenvalue, \(\lambda_1 \neq \lambda_2\) be different eigenvalues, and \(x_1, x_2\) be corresponding real eigenvectors. Then,

\[\lambda_1 x_1^\top x_2 = x_1^\top A^\top x_2 = x_1^\top A x_2 = \lambda_2 x_1^\top x_2.\]

Therefore,

\[(\lambda_1 - \lambda_2) x_1^\top x_2 = 0.\]

Since \(\lambda_1 \neq \lambda_2\), this indicates that \(x_1^\top x_2 = 0\). Therefore, \(x_1\) and \(x_2\) are orthogonal.

Implessions

These proofs are counterintuitive because complex conjugates are involved even though we consider real-valued matrices and vectors. Besides, desired results appear suddenly as results of abstract mathematical deformations.

Intuitive Proof (Main Part)

Definition of a real symmetric matrice as an operator

A real symmetric is defined by the relations of elements, i.e., \(A_{ij} = A_{ji}\). However, this does not indicate the function of a matrix. Here, we consider a characteristic property of real symmetric matrices.

Let \(\langle x, y \rangle\) denote the inner product of vectors \(x\) and \(y\). Intuitively, this measures the similarity of \(x\) and \(y\), cf. cosine similarity.

The most important property of a real symmetric vector \(A\) is that the inner product does not change whichever vector is multiplied by \(A\).

Proof: \(\langle Ax, y \rangle = (Ax)^\top y = x^\top A^\top y = x^\top A y = \langle x, Ay \rangle\)

This property is called self-adjoint. We can think that this is the definition of real symmetric matrices.

We assume this property of real symmetric vectors is intuitive in the following discussion. Or, you can think this is the definition and a trivial property. This self-adjoint property plays a pivotal role in the following.

Intuition behind the existence of real eigenvalues

Suppose \(\langle x, Ay \rangle\) is larger than \(\langle x, y \rangle\). In other words, multiplying \(y\) by \(A\) moves \(y\) closer to \(x\).

Let’s multiply the other vector. From the self-adjoint property, \(\langle Ax, y \rangle\) is also larger than \(\langle x, y \rangle\). In other words, multiplying \(x\) by \(A\) moves \(x\) closer to \(y\).

Therefore, multiplying by \(A\) makes \(x\) and \(y\) closer by pulling them to each other. Intuitively, in the “middle” of \(x\) and \(y\), the force of \(A\) is balanced, and there may be a fixed point (i.e., an eigenvector).

Proof of the existence of real eigenvalues

We construct a real eigenvalue and eigenvector. Let \(\|\cdot\|\) denote the length of a vector.

Let \(A\) be a real symmetric matrix, \(x\) be a unit vector such that \(\|Ax\|\) is maximized, and \(\alpha = \|Ax\|\). Indeed, there exists such a vector because \(\{x \mid \|x\| = 1\}\) is a closed set. If there are many, we use an arbitrary one.

Let’s consider the inner product of \(x\) and \(x\). Fron the self-adjoint property, the value of the inner product does not change whichever vector is multiplied. Therefore,

\[\langle Ax, Ax \rangle = \langle x, AAx \rangle.\]

The left-hand side is \(\alpha^2\) by the definition of \(\alpha\). The right-hand side is at most \(\alpha^2\) because multiplying by \(A\) makes the length of a vector at most \(\alpha\) times, and \(x\) is a unit vector. Besides, the right-hand side attains \(\alpha^2\) only if \(x\) and \(AAx\) are parallel. From the equality, \(\alpha^2\) is attained, thus \(x\) and \(AAx\) are indeed parallel.

Therefore, \(x\) goes to \(y := Ax\) by \(A\), and \(y\) goes to the direction of \(x\) by \(A\). It means that \(A\) switches the \(x\)-drection and \(y\)-direction. Intuitively, the “middle” of \(x\) and \(y\) is an eigenvector.

(Case 1) when \(x\) and \(y\) are paralell: \(x\) is an eigenvector corresponding to eigenvalue \(\alpha\) or \(-\alpha\)

(Case 2) otherwise: \((y + \alpha x) \neq 0\) and \(A (y + \alpha x) = Ay + \alpha Ax = \alpha^2 x + \alpha y =\alpha (y + \alpha x)\). Therefore, \((y + \alpha x)\) is an eigenvector corresponding to eigenvalue \(\alpha\).

This completes the proof.

Orthogonality

Let \(A\) be a real symmetric matrix and \(v\) be a real eigenvector of \(A\).

Let’s take an arbitrary vector \(u\) orthogonal to \(v\). The inner product of \(v\) and \(u\) is zero.

If we multuply \(v\) by \(A\), the direction of \(v\) does not change since \(v\) is a eigenvector. Thus \(Av\) and \(u\) are still otrhogonal. Therefore \(\langle Av, u \rangle = 0\).

Let’s multiply the other vector. From the self-adjoint property, the inner product \(\langle v, Au \rangle\) is zero.

This means that multiplying \(u\) by \(A\) does not affect the direction of \(v\).

Therefore, the \(v\)-direction and the subspace orthogonal to the \(v\)-direction are completely independent for \(A\). This indicates the orthogonality of \(A\).

We can construct an orthogonal basis by eigenvectors: If we carry out the same thing we did in the “proof of the existence of real eigenvalues” in the subspace orthogonal to \(v\), the resulted vectors do not stick out the \(v\)-direction. Thus the obtained eigenvectors are orthogonal to \(v\). We can obtain eigenvectors one by one in this manner. Formally, we can prove it by mathematical induction.

Orthogonal eigenvectors indicate that if we rotate the axes by an orthogonal matrix so that they coincide with the directions of eigenvectors, the linear map stretches only along the axes, which means that the matrix is reduced to a diagonal matrix.

Conclusion

The operator theoretic point of view of eigenvalues and eigenvectors is based on an excellent textbook, “Linear Algebra Done Right” [1]. However, this textbook uses the fundamental theorem of algebra to prove the existence of eigenvalues. Actually, I read this textbook when I was looking for an intuitive proof of this theorem. I devised this alternative proof because the proof in the book was not satisfactory to me.

If you know a more elegant proof, please let me know on Twitter or email.

[1] Sheldon Axler. Linear Algebra Done Right. Springer.