Principal Component Analysis¶
Overview¶
Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called Principal Components.
In quantitative finance, PCA can be directly applied to risk management of interest rate derivative portfolios. It helps reducing the complexity of swap tradings from a function of 30-500 market instruments to, usually, just 3 or 4, which can represent the interest rate paths on a macro basis.
Implementation¶
The PCA of N components of an m-by-n matrix A is given by the following process:
- Calculate the covariance matrix of A
- Solve n-by-n covariance matrix for its n-by-n eigen-vectors (\(V\)) and n eigen-values (\(D\))
- Sort the eigen-values from largest to smallest and then select the top \(N\) eigen-values and their corresponding eigen-vectors.
Once the process is completed there are several outputs available from the library:
- ExplainedVariance: This is a vector N wide which corresponds to the selected sorted eigen-values.
- Components: These are the N eigen-vectors associated with the selected eigen-values of the original matrix.
- LoadingsMatrix: The loadings matrix represent the weigths associated to each original variable when calculating the principal components. It can be computed as follows:
Note
Due to the arbitrary sign of eigen-vectors, them being implementation dependent, calculations of the loadings matrix could return inverted values in a non-deterministic way. To avoid that, we use the same convention as matlab, where the sign for the first element of each eigen-vector must be positive, multiplying the whole vector by \(-1\) otherwise.
Below is a diagram of the internal implementation of PCA: