## Making sense of Statistical tests

As I mentioned in a post earlier, I stumbled upon this marvel of a book “The Geometry of Multivariate Statistics” by Thomas D. Wickens. It contains the best explanation of Statistical tests I have come across; actually the only explanation I know of that doesn’t just give you a formula in a box but really makes sense.

I liked it so much that I couldn’t take the chance of forgetting it; so here I am, blogging about it.

Regression consists of splitting an observed vector $\vec{y}$ into a systematic component $\vec{\hat{y}}$ and a random component $\epsilon$.

The null hypothesis is that all the regression coefficients are false. If it is true then the model is simply $\vec{y}=\vec{\epsilon}$.

If the null hypothesis is false then the systematic component will be longer than the error and closer to the observed vector. So, the angle between the observed and predicted vectors can form one measure of the effect. Another could be a comparison of length of the predicted vector with that of the error vector.

Now the thing is, the predicted vector, the observed vector and the error vector all lie in subspaces of different dimensions. How does one compare the lengths of vectors of different dimensions? Of course you can compare them, but it wouldn’t make a whole lot of sense. What do we do then?

Consider a d-dimensional random vector $\vec{\epsilon}$.

$|\vec{\epsilon}|^2=|\vec{\epsilon_1}|^2+|\vec{\epsilon_2}|^2+\hdots+|\vec{\epsilon_d}|^2 \Rightarrow E(|\vec{\epsilon}|^2)=\sum\limits_{i=1}^{d}{|\vec{\epsilon_i}|^2}$

For a random vector such as we are talking about, the component along each axis has the same average length. Hence the length is proportional to the dimension of the space (d in this case). That is why it makes sense to divide the squared length of a random vector by the dimension of the space it is in, to get a measure of what the author calls “per-dimension squared length”. And that is where and why the degrees of freedom in the F statistic come from.

When the predicted vector is much larger than the error vector, F statistic is much greater than 1-which is the case when we reject null hypothesis. Otherwise it is just about 1, in which case, sadly we accept null hypothesis.

A lot of explanation follows this in the book-much too long for me to blog in its entirety. But that makes me think about something else which has been on my mind for a while now: what is the best way for me store the things I learn from reading and listening? Its not organized enough to be taken down in form of notes-and again, papers are easily lost. A bigger problem is the problem of accessing it when needed-sequential storage makes access almost impossible. Haphazard storage is even worse. As I think about this problem, I realize that the most efficient way of storing things is the way my mind does it-making connections upon connections upon connections which feed themselves. How interesting it would be to replicate this mode of information storage and retrieval in a computer!