Data Processing Club studies machine learning and data mining

An Intuitive Proof That Every Real Symmetric Matrix Can Be Diagonalized by an Orthogonal Matrix

It is well known that eigenvalues of a real symmetric matrix are real values, and eigenvectors of a real symmetric matrix form an orthonormal basis. This theorem plays important roles in many fields. For example, the principal component analysis relies on this theorem.

Although every textbook on linear algebra contains a proof of this theorem, it is not easy to understand it intuitively. I will give an intuitive explanation of this theorem in this post.

Review: A Typical Proof

We review a typical proof. If you are familiar with it, you can skip this part to the “main part.” You don’t need to understand this proof completely but only have to understand that this proof is not intuitive. The results in this section will not be used in the following discussion.

Existence of eigenvalues

The existence of eigenvalues is not trivial. Most textbooks define characteristic polynomials based on determinants and prove the existence of the roots of the polynomial by the fundamental theorem of algebra.

Determinants and the fundamental theorem are not intuitive. I think these tools make it difficult to understand the existence of eigenvalues.

As we will see later, we can prove the existence of eigenvalues without determinants nor the fundamental theorem as for symmetric matrices.

Proof that eigenvalues of a real symmetric matrix are real

Let be a real symmetric matrix and be a complex eigenvalue of . Here, there exists a complex vector such that

by the definition of eiganvalues. We take the complex conjugates of both sides. Since is real, holds, where the overline denotes the element-wise complex conjugate. Thus,

We take the transpose of both sides. As is symmetric.

Here,

Besides, since ,

Therefore, if we devide the both sides of (1) by , we have , which means that is real.

Proof that eigenvectors of different eigenvalues are orthogonal

Let be a real symmetric eigenvalue, be different eigenvalues, and be corresponding real eigenvectors. Then,

Therefore,

Since , this indicates that . Therefore, and are orthogonal.

Implessions

These proofs are counterintuitive because complex conjugates are involved even though we consider real-valued matrices and vectors. Besides, desired results appear suddenly as results of abstract mathematical deformations.

Intuitive Proof (Main Part)

Definition of a real symmetric matrice as an operator

A real symmetric is defined by the relations of elements, i.e., . However, this does not indicate the function of a matrix. Here, we consider a characteristic property of real symmetric matrices.

Let denote the inner product of vectors and . Intuitively, this measures the similarity of and , cf. cosine similarity.

The most important property of a real symmetric vector is that the inner product does not change whichever vector is multiplied by .

Proof:

This property is called self-adjoint. We can think that this is the definition of real symmetric matrices.

We assume this property of real symmetric vectors is intuitive in the following discussion. Or, you can think this is the definition and a trivial property. This self-adjoint property plays a pivotal role in the following.

Intuition behind the existence of real eigenvalues

Suppose is larger than . In other words, multiplying by moves closer to .

Let’s multiply the other vector. From the self-adjoint property, is also larger than . In other words, multiplying by moves closer to .

Therefore, multiplying by makes and closer by pulling them to each other. Intuitively, in the “middle” of and , the force of is balanced, and there may be a fixed point (i.e., an eigenvector).

Proof of the existence of real eigenvalues

We construct a real eigenvalue and eigenvector. Let denote the length of a vector.

Let be a real symmetric matrix, be a unit vector such that is maximized, and . Indeed, there exists such a vector because is a closed set. If there are many, we use an arbitrary one.

Let’s consider the inner product of and . Fron the self-adjoint property, the value of the inner product does not change whichever vector is multiplied. Therefore,

The left-hand side is by the definition of . The right-hand side is at most because multiplying by makes the length of a vector at most times, and is a unit vector. Besides, the right-hand side attains only if and are parallel. From the equality, is attained, thus and are indeed parallel.

Therefore, goes to by , and goes to the direction of by . It means that switches the -drection and -direction. Intuitively, the “middle” of and is an eigenvector.

(Case 1) when and are paralell: is an eigenvector corresponding to eigenvalue or

(Case 2) otherwise: and . Therefore, is an eigenvector corresponding to eigenvalue .

This completes the proof.

Orthogonality

Let be a real symmetric matrix and be a real eigenvector of .

Let’s take an arbitrary vector orthogonal to . The inner product of and is zero.

If we multuply by , the direction of does not change since is a eigenvector. Thus and are still otrhogonal. Therefore .

Let’s multiply the other vector. From the self-adjoint property, the inner product is zero.

This means that multiplying by does not affect the direction of .

Therefore, the -direction and the subspace orthogonal to the -direction are completely independent for . This indicates the orthogonality of .

We can construct an orthogonal basis by eigenvectors: If we carry out the same thing we did in the “proof of the existence of real eigenvalues” in the subspace orthogonal to , the resulted vectors do not stick out the -direction. Thus the obtained eigenvectors are orthogonal to . We can obtain eigenvectors one by one in this manner. Formally, we can prove it by mathematical induction.

Orthogonal eigenvectors indicate that if we rotate the axes by an orthogonal matrix so that they coincide with the directions of eigenvectors, the linear map stretches only along the axes, which means that the matrix is reduced to a diagonal matrix.

Conclusion

The operator theoretic point of view of eigenvalues and eigenvectors is based on an excellent textbook, “Linear Algebra Done Right” [1]. However, this textbook uses the fundamental theorem of algebra to prove the existence of eigenvalues. Actually, I read this textbook when I was looking for an intuitive proof of this theorem. I devised this alternative proof because the proof in the book was not satisfactory to me.

If you know a more elegant proof, please let me know on Twitter or email.

[1] Sheldon Axler. Linear Algebra Done Right. Springer.