Notes On Matrix Analysis

Author: Eiko

Tags: matrix, matrix analysis, eigenvalues, singular values, p-adic

Time: 2024-11-29 17:41:45 - 2024-12-06 22:20:59 (UTC)

Reference: p-adic Differential Equations by Kiran S. Kedlaya

Matrix With Complex Numbers

For complex numbers we use the \(L^2\) norm for vectors and the operator norm \(|A| = \sup_{v\neq 0} \frac{|Av|}{|v|}\) for matrices.

We always arrange the eigenvalues \(\lambda_1,\dots,\lambda_n\) of \(A\) in absolute value decreasing order.

Singular Values

It is familiar by inner product and orthogonalization that real symmetric matrices have real eigenvalues. Semi-positive definite matrices have non-negative eigenvalues, which also holds for complex matrices.

Definition. The singular values of \(A\) are the square roots of the eigenvalues of \(A^*A\).

For a real symmetric matrix these are basically the absolute values of the eigenvalues.

Singular Value Decomposition

There are unitary \(U,V\in U(n)\) such that

\[UAV^* = \Sigma = \Sigma(\sigma_1,\dots,\sigma_n)\]

is a diagonal matrix, where \(\sigma_1\geq \sigma_2\geq \dots \geq \sigma_n\geq 0\) are the singular values of \(A\).

Singular value decomposition preserves metric on both sides. So it is a good measurement of the size of the matrix.
Changing notation ,we can write it as \(AV = U\Sigma\), which basically means you can find orthogonal basis such that \(Av_i = \sigma_i u_i\).
For any vector \(v\) we have \(|Av|\le \sigma_1|v|\).
For any two dimensional subspace \(W\) there is a non-zero vector \(w\) such that \(|Aw|\le \sigma_2|w|\). This generalizes to higher dimensions.
\(\sigma_d\) is the smallest number such that for every \(d\) dimensional subspace you can find a non-zero vector \(v\) such that \(|Av|\le \sigma_d|v|\). i.e.

\[\sigma_d(A) = \sup_{\dim W=d} \inf_{v\in W-\{0\}} \frac{|Av|}{|v|}.\]
Using this formula it is obvious that all singular value does not exceed operator norm, \(\sigma_i \le |A|\). Actually \(\sigma_1 = |A|\). So you can consider singular values as generalizations of operator norm! Instead of one number, you have a sequence of numbers describing the size of the matrix in different dimensions.

Remarks. Singular values are the invariants of a linear map between two inner product spaces \((V,\langle,\rangle_V, f, W,\langle,\rangle_W)\), actually for matrices, here \(V,W\) are given basis and equipped with standard inner product. The structures of two inner products on source and target spaces are preserved.

Interaction With Exterior Algebra

For an inner product space \(V\) with orthonormal basis \(e_1,\dots,e_n\), the exterior space \(\wedge^k V\) has inner product structure and an orthonormal basis \(e_{i_1}\wedge \dots \wedge e_{i_k}\) for \(i_1<\dots <i_k\).

Since we can map

\[\begin{align*} \wedge^k A (v_{i_1}\wedge \dots \wedge v_{i_k}) &= Av_{i_1}\wedge \dots \wedge Av_{i_k} \\ &= \sigma_{i_1}\dots \sigma_{i_k} u_{i_1}\wedge \dots \wedge u_{i_k}, \end{align*}\]

we can see the singular values of \(\wedge^k A\) are \(\{\sigma_{i_1}\dots \sigma_{i_k} : i_1<\dots <i_k\}\).
Similarly we can see the eigenvalues of \(\wedge^k A\) are \(\{\lambda_{i_1}\dots \lambda_{i_k} : i_1<\dots <i_k\}\).

Singular Values Bounds Minors

In fact this observation gives another interpretation of singular values in terms of exterior product and orthonormal basis,

\[(\sigma_1\cdots \sigma_k)(A) = \sigma_1(\wedge^k A) = \sup_{|o|=|o'|=1} \left|\langle o',(\wedge^k A)o\rangle\right|.\]
In particular this means all minors of \(A\) are bounded by the partial products of singular values, if \(|I|=|J|=k\) then

\[\left|\det A^J_I\right| \le \sigma_1\dots \sigma_k.\]
From \(Ae_1 = \lambda_1 e_1\) and \(|\lambda_1||e_1| = |Ae_1| \le \sigma_1 |e_1|\) we see that \(\sigma_1\ge |\lambda_1|\). We have

\[|\lambda_1|\le \sigma_1= |A|.\]
Applying above to \(\wedge^k A\) we obtain Weyl’s inequality

\[\sigma_1 \dots \sigma_k\ge |\lambda_1\dots \lambda_k|,\]

which is an equality when \(k=0,n\).
Weyl’s inequality has a converse, if you have \(\{\sigma_i\}\subset \mathbb{R}_{\ge 0}\) and \(\{\lambda_i\}\subset \mathbb{C}\) such that \(\sigma_1 \dots \sigma_k\ge |\lambda_1\dots \lambda_k|\) for all \(k\) then there is a matrix \(A\) such that \(\sigma_i\) are the singular values and \(\lambda_i\) are the eigenvalues.

Remarks. Unlike eigenvalues, singular values do not behave well under polynomial compositions, since the singular value decomposition essentially assumes the source and target spaces be different spaces. But they do work under inverses, from the singular value decomposition you can see the singular values of \(A^{-1}\) are \(1/\sigma_i\).

Perturbations

The singular values behave better with additive perturbations,

\[|\sigma_i(A+B) - \sigma_i(A)|\le |B|.\]

This can be easily proved by the curious formula of \(\sigma_i\) in sup inf, and the two facts that \(|x+y|\le |x|+|y|, |x+y|\ge |x|-|y|\).
The eigenvalues cannot easily be controlled by additive perturbations, but we can say about the characteristic polynomials.

\[\left|\chi_{A+B}[t^{n-m}] - \chi_A[t^{n-m}]\right|\le (2^m-1)\binom{n}{m} |B|\prod_{j<m} \max(\sigma_j, |B|).\]

This can be seen as the \(\chi_M[t^{n-i}]=\sum_I \det M^I_I\) term is the sum of all \(i\)-principal minors, we have

\[\begin{align*} \chi_{A+B}[t^{n-m}] &= \sum_{|I|=m} \det (A+B)^I_I \\ &= \sum_{|I|=m} \det A^I_I + \sum_{|I|=m} \sum_{\gamma_s \in \{\alpha_s,\beta_s\}, s\in I} \det(\gamma_{i_1}\dots\gamma_{i_m})^I_I \\ &= \chi_A[t^{n-m}] + O_1\left((2^m-1)\binom{n}{m} \sup_{k<m} \sigma_1(A)\dots \sigma_k(A) |B|^{m-k}\right) \end{align*}\]
For multiplicative perturbation, if \(B\) is invertible we have

\[\sigma_i(BA) \le |B|\sigma_i(A).\]

This can be seen using the formula

\[\sigma_i(BA) = \sup \inf \frac{|BAv|}{|v|} \le |B| \sup \inf \frac{|Av|}{|v|} = |B|\sigma_i(A).\]
A similar formula \(\sigma_i(BA)\le \sigma_i(A)|B|\) also holds, you can see this by applying transpose (also called dual or adjoint) and observe that singular values are invariant under transpose (dual) operation.

Matrix With \(p\)-adic Numbers

For \(p\)-adic fields (or actually, any other non-archimedean valued fields) the infinity norm on vectors is more appropriate

\[|v| := \max_i |v_i|.\]

The operator norm is defined as usual using this norm.

As a corollary we have

\[\begin{align*} |Av| &= \max_i \left|\sum a_{ij}v_j\right| \\ &\le \max_i \left(\max_j |a_{ij}| |v_j|\right) \\ &\le \max_{i,j} |a_{ij}| \left(\max_j |v_j|\right) \\ &\le \max_{i,j} |a_{ij}| |v|. \end{align*}\]

This means \(|A|\le \max_{i,j} |a_{ij}|\).
Note also that \(|A|\ge \max_j \frac{|Ae_j|}{|e_j|} = \max_{i,j} |a_{ij}|\), therefore

\[|A| = \max |a_{ij}|.\]
Matrix in \(P\in \mathrm{GL}_n(\mathcal{O}_p)\) preserves the vector norm, \(|Pv| = |v|\). This can be seen by
- \(|P|=\max |p_{ij}|\le 1\),
- \(|Pv|\le |P||v| \le |v|\),
- \(|v| = |P^{-1}Pv| \le |P^{-1}||Pv| \le |Pv|\).
- Equality holds also tells us that \(|P|=|P^{-1}|=1\).

Hodge Polygon

For a sequence \((s_i)_{i=1}^n\in \mathbb{R}^n\) its associated polygon is the path joined by \(\{(-n+k, s_1+\dots+s_k): k=0,\dots,n\}\). The \(s_i\) are understood as successive slopes, and if \(s_i\) are non-decreasing, the polygon is convex.

Given a matrix \(A\), let \(s_i\) be a sequence whose partial sum \(S_i=s_1+\dots+s_i\) satisfy

\[S_i = \min_{|I|=|J|=i} v(\det A^J_I).\]

\(s_i\) are called elementary divisors or invariant factors of \(A\).
The Hodge polygon of \(A\) is the associated polygon of this sequence, and the singular values of \(A\) are given by \(\sigma_i = p^{-s_i}\), here \(p\) should be the base used in the absolute value \(|\cdot|=p^{-v}\).

Singular Values Computes Maximal Minors

As a corollary of the definition,

\[\sigma_1\cdots \sigma_k = \max_{|I|=|J|=k} \left|\det A^J_I\right|.\]
These \(s_i\) are are clearly invariant under the action of \(\mathrm{GL}_n(\mathcal{O}_p)\times \mathrm{GL}_n(\mathcal{O}_p)^{op}\), as a result the singular values and Hodges polygons are also invariant.
Since we are in a valuation ring, through the same process of putting matrix with PID coefficients into normal form, we have the smith normal form of \(A\) as \(PAQ=\Sigma(c_1,\dots,c_n)\) the diagonal matrix, where \(P,Q\in \mathrm{GL}_n(\mathcal{O}_p)\) and \(|c_i|=\sigma_i = p^{-s_i}\). This definition of singular value make sense because the vector norms on source and target spaces are preserved by \(\mathrm{GL}_n(\mathcal{O}_p)\).
The process guarantees that \(c_1|\dots|c_n\) which means \(s_1\le \dots \le s_n\), i.e. Hodge polygon is convex.
Similarly we have the subspace characterization of singular values

\[\sigma_d(A) = \sup_{\dim W=d} \inf_{v\in W-\{0\}} \frac{|Av|}{|v|}.\]

Remarks. Here \(\mathrm{GL}_n\) preserves vector norm, so it is natural choice for action. Unlike the complex case we do not try to preserve the inner product structure but we choose to preserve the vector norm. This is because the \(p\)-adic norm is not obtained from an inner product. The singular values are the invariants of a linear map between two normed spaces \((V,|\cdot|_V, f, W,|\cdot|_W)\).

Newton Polygon Associated To Eigenvalues

For a matrix \(A\) with eigenvalues \(\lambda_1,\dots,\lambda_n\) the Newton polygon is the associated polygon of the sequence \(v(\lambda_1)\le \dots\le v(\lambda_n)\), so that \(|\lambda_1|\ge \dots \ge |\lambda_n|\).

As with eigenvalues, Newton polygon is invariant under conjugation by \(\mathrm{GL}_n(k)\).
Weyl’s inequality holds as well

\[\sigma_1\cdots \sigma_k\ge |\lambda_1\cdots \lambda_k|,\]

with equality when \(k=0,n\). This means the Newton polygon lies above the Hodge polygon, with the same starting and ending points.

Hodge-Newton Decomposition

If the Hodge and Newton polygons of a matrix ‘breaks into two parts’, i.e. for some \(i\) we have

\[ |\lambda_i|>|\lambda_{i+1}|, \quad \sigma_1\cdots \sigma_i = |\lambda_1\dots \lambda_i|\]

the matrix has a Hodge-Newton decomposition at \(i\), there is an integral matrix \(U\in \mathrm{GL}_n(\mathcal{O}_p)\) such that

\[ U^{-1}AU = \begin{pmatrix} B & C \\ 0 & D \end{pmatrix} \]

If moreover \(\sigma_i>\sigma_{i+1}\) then \(C\) can be chosen to be zero.

Perturbations Of \(p\)-adic Matrices

A lot of the perturbation results for complex matrices have a \(p\)-adic analog.

Non-Archimedean Property For Vectors

Recall that for numbers we have \(|a+b|\le \max(|a|,|b|)\), with equality when \(|a|\neq |b|\).

There is a vector non-archimedean property (for the \(\infty\)-norm) for vectors \(v,w\), since \(|v+w| = \max_i |v_i+w_i| \le \max_i \max(|v_i|,|w_i|) = \max(|v|,|w|)\), we have

\[|v+w| \le \max(|v|,|w|)\]
If \(|w|<|v|\), there exists at least on \(i\) such that \(|w_i|<|v_i|\), at this time the equality holds.

\[|w|<|v|\Rightarrow |v+w|=|v|.\]

Perturbations

If \(|B|<\sigma_i\), then \(\sigma_j(A+B) = \sigma_j(A)\) for all \(j\neq i\).

This is the direct consequence of the vector non-archimedean property and the sup-inf property of singular values, as for the vector \(v\) that attains the sup-inf of \(\sigma_j(A)\), we have \(|Bv|<\sigma_i|v|=|Av|\) therefore \(|Av+Bv|=|Av|\).
Let \(\chi_A\) and \(\chi_{A+B}\) be the characteristic polynomials. Then

\[\left|\chi_{A+B}[t^{n-m}] - \chi_A[t^{n-m}]\right| \le |B| \prod_{j<m} \max(\sigma_j, |B|). \]

Here the coefficients appearing in the complex version of the statement are replaced by \(1\), due to the archimedean property of \(p\)-adic numbers.
For multiplicative perturbation, the result is the same

\[\sigma_i(BA) = \sup \inf \frac{|BAv|}{|v|} \le |B| \sup \inf \frac{|Av|}{|v|} = |B|\sigma_i(A).\]