Posts Tagged ‘Laplace Expansion’

Determining the Determinant

Most often than not, the most important number associated with a matrix is the value of its determinant. But many a times, at the school level, we’re only taught about determinants of 2 \times 2 and 3 \times 3 matrices by the so called Laplace Expansion of Determinants. But, it is forgotten that the Laplace Expansion becomes rapidly tedious as one goes for calculating higher order determinants, and indeed, for those well versed in computational complexity theory, Laplace Expansion take O(n!) time, whereas the simple Gaussian Reduction method takes O(n^3), a significant reduction from O(n!). And therefore, it is essential for one to have the right ideas in mind while looking at the determinant. For that, it’s important to look at where the determinant originates from.
So, where exactly does the idea of a determinant come from? Of course, a square matrix is called non-singular, if its determinant is non-zero, i.e. the system of linear equations having the elements of the square matrix as its coefficients has a unique solution. It’s important to start from here, since the existence or non-existence of solutions of system of linear equations is the basis behind all of linear algebra.

 

So, let us look at the determinant, from the point of view of a person with just the fundamental idea that the determinant is non-zero whenever the matrix is non-singular, and vice-versa (and maybe knows the determinant of the 2 \times 2 and 3 \times 3). Well, with this fundamental idea in place, we will try to look for methods to determine non-singularity of a matrix. And the most basic method to accomplish this is most certainly the Gaussian Row Reduction Method.

 

For the uninitiated, the Gaussian Row Reduction Method is what is also called the ‘Elimination Method’. Let us suppose that our equations are x+y+z = 1, 2x+3y+4z = 2, 3x +9y + 8z = 3. One starts by taking the first equation, multiplying it with a suitable constant (in the example, it is 2), and subtracts it from the second equation so that the first variable (x) is eliminated from the second equation. Then one does the same thing again to remove x from the third equation, and then uses the new second equation to remove y from the third equation(of course, I’m assuming that the second equation does have the variable y in it. If not, then if the third equation has y, then we swap those equations, else the system is non-singular). If one obtains a non-zero coefficient of z at the end, then trivially, we will get a solution, else, the system is non-singular. The difference in the ‘Elimination Method’ and the Gaussian Row Reduction Method is that the latter does the reduction on the coefficient matrix to bring it into what is called the ‘Row-Echelon Form’ of the Matrix. This is an upper triangular matrix, and the leading non-zero element of each row are called the Pivots of the matrix.

 

So, now, at this point, our primary way to decide whether a matrix is singular is to do Gaussian reduction and then check whether the diagonal of resulting echelon form matrix has any zeroes (that is, to check whether the product down the diagonal is zero). So, we may expect that the proof that a formula determines singularity will involve applying Gauss’ method to the matrix, to
show that in the end the product down the diagonal is zero if and only if the determinant formula gives zero. This suggests our initial plan: we will look for a family of functions with the property of being unaffected by row operations and with the property that a determinant of an echelon form matrix is the product of its diagonal entries. Under this plan, a proof that the functions determine singularity would go, “Where T \rightarrow \cdots \rightarrow \overline{T} is the Gaussian reduction, the determinant of T equals the determinant of \overline{T} (because the determinant is unchanged by row operations), which is the product down the diagonal, which is zero if and only if the matrix is singular”.

 

But, of course, being a good problem solver would also entail that we check our idea in the special cases of the 2 \times 2, and the 3 \times 3. The first step in checking the plan is to test whether the 2 \times 2 and 3 \times 3 formulas are unaffected by the row operation of combining. Carrying through our calculations, we get that these remain the same. So there seems to be promise in the plan. Of course, perhaps the 4 \times 4 determinant formula is affected by row combinations. We are exploring a possibility here and we do not yet have all the facts. Nonetheless, so far, so good.

 

The next step is to compare the determinants before and after a row swap. But, things don’t go according to plan always, and we get that the sign of the determinant is changing after performing a row swap, for the cases of 2 \times 2 and 3 \times 3 matrices. Thus, row swaps appear to change the sign of a determinant. This modifies our plan, but does not wreck it. We intend to decide non-singularity by considering only whether the determinant is zero, not by considering its sign. Therefore, instead of expecting determinants to be entirely unaffected by row operations, will look for them to change sign on a swap.

 

Finally, we compare the determinants before and after a multiplication by a constant. Again, unexpectedly, the determinant doesn’t remain the same, and we see that the determinant get multiplied by the same constant upon performing this transformation. This fits with our modified plan because we are asking only that the zero-ness of the determinant be unchanged and we are not focusing on the determinant’s sign or magnitude.

 

With these ideas in mind, we’re ready to find the general formula for a determinant. First of all, some notation. We shall denote the determinant of a n \times n matrix A, having rows r_1,r_2,\cdots, r_n, either by \det(A) or by |A| or by \det(r_1,r_2, \cdots , r_n) whichever is convenient.

Definition:    A n \times n determinant is a function \det : \mathcal{M}_{n \times n} \rightarrow \mathbb{R} such that

1. \det(r_1, \cdots, k \cdot r_i + r_j,\cdots, r_n) = \det(r_1, \cdots, r_j, \cdots, r_n) for i \not = j.

2. \det(r_1, \cdots, r_i, \cdots, r_j, \cdots, r_n) = - \det(r_1, \cdots, r_j, \cdots, r_i, \cdots, r_n) for i \not = j.

3. \det(r_1, \cdots, k \cdot r_i,\cdots, r_n) = k \cdot \det(r_1, \cdots, r_i, \cdots, r_n) for k \not = 0.

4. \det(I) = 1 where I is the identity matrix.

 

The way that we have defined the determinant, it’s pretty simple to find it for any given matrix. Just perform the Gaussian Reduction, and bring it to the row-echelon form. Of course, we keep track of all the sign changes in this process. Now, we prove that the determinant of the row echelon form matrix is just the product down the diagonal. To prove this, we first observe that if some row of our REF (Row-Echelon Form) Matrix is the zero vector, then condition 3 implies the the determinant is 0. Therefore, if we have less than n pivots, then we would have the nth row as the zero vector, and hence the determinant is 0, which is also the product down the diagonal. Otherwise, we will have n pivots, all of them down the diagonal. Now, this implies that if we do Gauss-Jordan Elimination (which just means, that we use the lowest row to clear off the last column, then the second lowest row to clear off the second last column, and so on), and hence get a diagonal matrix. Now, when we factor out the pivots, we get their product outside, and we’re left with the identity matrix, which has determinant 1, by condition 4.

 

Now we know how to find the determinant of every matrix (and that too how to find it fast!). What we don’t know, however, is if this determinant is unique for every matrix. Of course, the procedure that we just gave, the Gaussian Elimination, could infact be done in many many different ways, and thus, we do not yet know if each different process will give the same result for the determinant. (We could do row swaps at any time, which would give us totally different pivots, and hence it’s not too obvious of how we’ll go ahead in this way)

 

So, what we do now, is that we develop a complete formula for the determinant in terms of the coefficients. And to accomplish this, we first prove the following assertion:

\det(r_1, \cdots, av + bw,\cdots, r_n) = a \cdot \det(r_1, \cdots, v,\cdots, r_n) + b \cdot \det(r_1, \cdots, w,\cdots, r_n)

This is done by noticing first, that if the n-1 rows other than av + bw are linearly dependent, then all the three determinants will be 0. If they are independent, then we can add another independent vector r into them as they are all ndimensional. Now, write v,w in terms of r_1, \cdots, r_{i-1}, r, r_{i+1}, \cdots, r_n, and make both sides into a constant times \det(r_1, \cdots, r_{i-1}, r, r_{i+1}, \cdots, r_n).
Now, that we have this crucial result in hand, we take it to the extreme. First we prove by induction that we can extend the above result to split the ith row into m different vectors for any m, and have the determinant split itself correspondingly into the m different components.
Now, take the ith row, r_i = a_{i1} \cdot v_1 + a_{i2} \cdot v_2 + a_{i3} \cdot v_3 + \cdots + a_{in} \cdot v_n, where

v_1 = [1,0,0, \cdots, 0], v_2 = [0,1,0, \cdots, 0], \cdots, v_n = [0,0,0, \cdots, 0,1]

Let A_{ij} denote the Matrix with 1 at the i,jth position and 0s in the other positions. Using the above strategy, we will have that our Matrix will break up into n^n different pieces,

|A| = \sum_{t_1,t_2, \cdots t_n=0}^{n} a_{1t_1}a_{2t_2} \cdots a_{nt_n} |E_{1t_1} + E_{2t_2} + \cdots + E_{nt_n}|

Of course, writing n^n terms in summation form is going to look really really scary, but I hope the following tells you what’s going on (there are actually 27 such matrices, but I’m not going to show all of them, just enough to give you a hint):

{\left\vert\begin{array}{ccc}a_{11}&0&0\\a_{21}&0&0\\a_{31}&0&0\end{array}\right\vert} + {\left\vert\...

Now, as you notice, most of our matrices have a column of zeros, by which we must get their determinant as 0. But, hold it right there. We haven’t yet proven anything about the determinant being zero, when a column is zero. But, in his special case of these very sparse matrices, we are saved by noticing that whenever we have a zero column, we also have some column in which two elements are present. Now, in these matrices, as we have one element in each row, this would mean that we can subtract one row (multiplied by a suitable constant) from the other row, to get a zero row. And this implies that the determinant of such matrices is zero.

 

This means that many of the matrices that we get, have determinant zero! So, we’re only left with matrices in which there is exactly one element in every row, and also exactly one element in every column. Therefore, if we let t_1 be the column in which the element in row 1 appears, t_2 be the column in which the element in row 2 appears, and so on, t_n be the column in which the element in row n appears, then t_1,t_2, \cdots, t_n would be a permutation of 1,2, \cdots, n, since every column has to be occupied. Thus, we’ve reduced the determinant of our large matrix, to finding the determinant of the so-called permutation matrix. Now, we would think, this is easy enough, one could just swap repeatedly to bring our permutation matrix into the identity matrix, whose determinant we know to be 1. But the thing is, we would have to keep track of the number of swaps! For every swap we would have to change the sign of the determinant, therefore, our determinant will be -1 to the power of the number of swaps. Also, note that the swapping operations that we’re doing to bring the permutation matrix into the identity is the same as the swapping that we’re doing on t_1,t_2, \cdots, t_n to bring this permutation to 1,2,3, \cdots, n.

 

So, do you see what our whole theory of a unique determinant boils down to? Perhaps not yet. But, imagine a some random permutation at hand. One could think that the ways that we could swap the elements of this permutation to bring the permutation to the identity would be pretty random. And therefore, one could very well believe that the number of swaps in these could be quite random too, like they could be even sometimes and odd some other time. But let us assume that we have caught hold of some permutation which we could make into the identity by an even number of swaps as well as an odd number of swaps. Notice what this will do to the determinant of our permutation matrix. It would make the determinant both equal to 1 and -1 at the same time! Thus, in a very crucial manner, the uniqueness of our determinant hinges on the uniqueness of the function called the sign of a permutation, for every specific permutation. We will not elaborate on this theory right now, though, as it will prove to be too large a digression.

Therefore, in the above discussion, we have proven that if we denote the permutation

\sigma = {\begin{array}{ccccc}1&2&3&\cdots&n\\t_1&t_2&t_3&\cdots&t_n\end{array}}
and S_n the nth Symmetric Group, then we have

\det(A) = \sum_{\sigma \in S_n} sgn(\sigma) a_{1,t_1}a_{2,t_2} \cdots a_{n,t_n}

where sgn(\sigma) is the sign of a permutation, being +1 for an even permutation, and -1 for an odd permutation.
Thus, we have finally, established the Leibniz Formula of a determinant, which gives that the determinant is unique for every matrix. Further, it’s also easy to establish now, that the determinant is a continuous and differentiable function.

 

Finally, we come to the Laplace Formula. We show this for the expansion along the first row, the proof of expansion along the other rows will be analogous. First, we plan out our way to prove this.

For now, the only formula we have for the determinant, is the Leibniz formula, and this is the formula we’re going to manipulate to give the Laplace formula. Now, we know that the Laplace Formula looks like |A| = a_{11} C_{11} + a_{12} C_{12} + \cdots + a_{1n} C_{1n} (Of course, we’re expanding along the first row here), where  C_{ij} is called the cofactor of the element a_{ij}. We obtain this, by fixing t_1 = 1, in the Leibniz Formula, and then taking a_{11} common from those terms, and similarly for the others.

 

Now, that we’ve charted out our plan, we bring it into action. Let us consider what C_{1i} looks like. As we’ve fixed t_i = i, taken a_{1i} common, and named the rest as C_{1i}, we have

C_{1i} = \sum_{\sigma \text{ s.t. } \sigma(i) = i} sgn(\sigma)a_{2,t_2}a_{3,t_3}\cdots a_{n,t_n}

Now, we have to prove that this is (-1)^{1+i} times the determinant of the matrix formed by deleting the first row and the ith column. But, again this hinges on prove that if \lambda is a permutation of 1, 2, \cdots, n-1, and we transform it by adding 1 to every term \ge i, and then place i at the ith position, and call this permutation \sigma, then sgn(\sigma) = (-1)^{1+i} sgn(\lambda). The proof is not very difficult, and is left as an exercise.

Well, maybe I’ll do another post on determinants, maybe not. So try to prove the following yourself, as they’re not very difficult.

1. Prove that , where is the transpose of .

2. Prove that

3. Prove that if where C_{ij} is the cofactor of a_{ij}. (Note that this adjugate matrix is the cofactor matrix transposed), then .

4. (Vandermonde Determinant) Prove that