Dimensionality Reduction in Machine Learning using PCA
By Priyal Walpita
When you are trying to pre-process your machine learning data sets, you may need to deal with huge number of dimensions. Processing these huge number of different dimensions is a resource hungry process for your machine learning models. But can we get rid of these extra dimensions in our ML models? We cannot get rid of these dimensions from our data model because it would get rid of the effect of the those particular data dimensions into our data analysis as well. So the answer to this problem is Principle Component Analysis (PCA). What PCA does is, it is reduce the certain dimensions of your data set by projecting them into another dimension. Confused ? :) Lets see how we can do it in this post.
By the end of the article you will know how to implement PCA for yourself and used to reduce the dimension of data.
Note : This article is based on teaching by Dr.Andrew Ng @ Deep learning
Data Pre-Processing
The preprocessing to be performed is mean normalization or feature scaling. We are doing this for unlabeled data x(1) through x(m). For the mean normalization process, we first calculate the mean of each feature, then construct a new feature by subtracting the mean from each feature, which will make the mean of each feature just zero. Then, if there is a large difference between the features, then the feature needs to be scaled to a relative value range.
Principal Component Analysis — PCA
What does PCA Algorithm does ?
Is to find a low dimensional subspace on to project the data to minimize the square of the projection error and to minimize the square of the distance between each point and it’s projection point.
PCA Intuition
In the Figure 1 it is given the examples x(i), which are in R2 and what we’d going to do is to find a set of numbers Z(i) in R which we wish to represent our data. So, that’s what a reduction from 2d to 1d means and so specifically by projecting on to the red line we need only one number to specify the position of the points on the line. So, we will call that number Z1. There on the line it will represent all the Xs on the Z1 with denoting X with Z(1) and so on. So, what PCA has to do is we need to come up with a way to compute two things. One is to compute the vectors u1 and then the bottom picture to compute u1 and u2. And the other is how to compute the number Z. So, this means we are reducing the data from 2D to 1D.
In the figure 2 ,we would be reducing data from 3 dimensional as in r3 to Zi which is now two dimensional. So, these z vectors would now be two dimensional. So, it would be z1 and z2 , and we need to give away to compute these new representations, the z1 and z2 of the data as well. But how do you calculate all of these quantities? This turns out to be a mathematical derivation seen what is the right value for U1, U2, Z1, Z2.
Procedure
To implement all of these things, the vectors U1, U2 and the vector Z we use the below procedure. Here’s the procedure. Let’s assume that we want to reduce the data to a dimension n to a dimension k. What we’re going to do is first compute something called the covariance matrix, and the covariance matrix is commonly denoted by the Greek alphabet which is the capital Greek alphabet sigma.
Next is how do you compute this matrix. Let’s say we want to store it in an octave variable called sigma. So, here we need to do is compute the eigenvectors of the matrix sigma. And the octave, so, the way you do this is by using the command u,s,v equals s v d of sigma. S V D stands for singular value decomposition.
As well as you can use an octave function which also computes the same thing and will give you the same vectors, although S V D is a little more numerically stable. And the next thing you should know is whether you are going to implement in Octave or Matlab. If you are going to implement this using a different language than both these , then you should do is find the numerical linear algebra library that can compute the SVD or singular value decomposition. Then you can compute the value for the matrices of the covariance matrix sigma. So, this covariance matrix sigma will be an n by n matrix. One way to see this is to look at the definition of the covariance matrix where x(i) is n by 1 vector and the other x(i) is 1 by N and the product of such two things is going to be N by N matrix. And what the SVD outputs is the three matrices U, S, V. U matrix is the thing you really need out of the SVD. Here the U matrix will also be a N x N matrix. And if we look at the columns of the U matrix it turns out the columns of the U matrix that will be exactly containing the vectors u1, u2 and so on.
If you want to reduce the data from n dimensions down to k dimensions, you need to take the first k vectors from u1 to uk that give us the K path to which we want to project the data.
where U is a matrix of n*n , we get the first k column of U and get a matrix of n*k, which we call U reduce U reduce matrix.
Thereafter we will use this matrix to reduce data. U reduce it is an n * k dimensional matrix , and is an n *1 dimensional matrix, so z is a k * 1 dimensional matrix, which is the dimensionally reduced data we want to get.
Principal Component Analysis (PCA) algorithm summary
So, to summarize at last the PCA algorithm. After mean normalization to ensure that every feature is zero mean and optional feature scaling which you should really do feature scaling if the features are on very different ranges of values. So, after the pre processing we compute the carrier matrix Sigma. Example like if your matrix X is given in rows like where x1 transpose down up till xm. And this covariance matrix sigma is having a good vectoring implementation. As well as this can be implemented in octave, which you can run the sigma equals 1 over m, times x with the dash up and transpose times x. So, this creates a simple expression, to show how to calculate the matrix sigma. Then you can try both of these simple expression and covariance matrix sigma that these both give the same answer using octave.
Then we can apply the SVD routine to get U, S and V. And then we grab the first k columns of the u matrix U reduce and then finally defines how we should go from the feature vector x to the reduce dimension representation z. So, if you apply the PCA, there it would apply to the vectors X and Rn. Therefore this is not done with X0 =1. So, this was the PCA algorithm.
The PCA algorithm can be implemented in not too many lines of code but if you implement this in octave or Matlab, you actually can get a very effective dimensionality reduction algorithm. So, that was the PCA algorithm.