Canonical Correlation Analysis

Typically in multiple regression, you have one dependent variable against all regressors. However in canonical correlation analysis, the analyst can model the relationship between a set of multidimensional variables against another multidimensional set of variables.

cca

Description

Canonical correlation analysis finds basis vectors for two sets of variables such that the correlation between the projections of the variables onto these basis vectors is mutually maximized.

$$ (a',b') = \underset{a,b}{argmax} \operatorname{corr}(a^T X, b^T Y) $$

A simple scenario would explain the intuition clearly. Consider a group of subjects on whom we have variables related to exercise such as timed runs, weight lifted in dead-lifts, number of push-ups and situps. We also have information related to their blood glucose levels, BMI and blood pressure. We can run an analysis on these two multivariate sets of variables using this method.

More precisely, we define $(U_{i},V_{j})$ as follows:

$$ \begin{matrix} U_{1} = a_{11}X_{1} + a_{12}X_{2} + \dots + a_{1p}X_p\\ \vdots \\ U_{p} = a_{p1}X_{1} + a_{p2}X_{2} + \dots + a_{pp}X_p \end{matrix} $$

$$ \begin{matrix} V_{1} = b_{11}Y_{1} + b_{12}Y_{2} + \dots + b_{1q}Y_q\\ \vdots \\ V_{p} = b_{p1}Y_{1} + b_{p2}Y_{2} + \dots + b_{pq}Y_q \end{matrix} $$

The canonical correlation to maximize is the following: $$ \rho^*_i = \dfrac{\text{cov}(U_i, V_i)}{\sqrt{\text{var}(U_i) \text{var}(V_i)}} $$

where the covariance between U and V is: $$ \text{cov}(U_i, V_j) = \sum\limits_{k=1}^{p} \sum\limits_{l=1}^{q}a_{ik}b_{jl}\text{cov}(X_k, Y_l) $$

and the variance is: $$ \text{var}(V_j) = \sum\limits_{k=1}^{p} \sum\limits_{l=1}^{q} b_{jk}b_{jl}\text{cov}(Y_k, Y_l) $$

Returns

c_corrs: Canonical correlations
dfn: Degrees of freedom numerator
dfd: Degrees of freedom denominator
f stat: F-statistic
f p: right tailed p-value for F statistic
chisq stat: Chi Square statistic
chisq p: right tailed p-value for Chi Square statistic
lr (wilks): Proportion of variability not explained by model. Ranges from 0 to 1.
A: Canonical coefficients of $ \mathrm{X} $
B: Canonical coefficients of $ \mathrm{Y} $
U: Canonical scores for X matrix
V: Canonical scores for Y matrix

Canonical Correlation Analysis

Description#

Returns#

Description

Returns