Random projection

We say we want to choose vectors uniformly, but that isn’t actually possible.

What we actually mean is we want to pick vectors so that their directions are uniform.

That’s possible, and we only need to work with unit vectors anyway.

The way to generate uniform random samples of unit vectors in n dimensions is to first choose a standard normal random value for each coordinate, then normalize so the vector has length 1.

Suppose we do as we discussed above.

We work in 20,000 dimensions and choose 10,000 random vectors.

Call their span S.

Then we choose another random vector v.

What angle does v make with S?.You can’t simply project v onto each of the basis vectors for S because even though they are nearly orthogonal, they’re not exactly orthogonal.

It turns out that it doesn’t matter what the vectors are that defined S.

All that matters is that S is almost certainly a 10,000 dimensional subspace.

By symmetry it doesn’t matter which subspace it is, so we’ll assume for convenience that it’s the subspace spanned by the first 10,000 unit vectors.

The angle that v makes with S is the angle between v and its projection w onto S, and w is simply the vector you get by zeroing out the last 10,000 components of v.

The cosine of the angle between two unit vectors is their dot product.

Our vector v is a unit vector, but its projection w is not.

So the cosine of the angle iscos θ = v·w/||w||Now v·w is the sum of the squares of the first half of v‘s components.

This is probably very close to 1/2 since v is a unit vector.

This is also w·w, and so ||w|| is probably very close to the square root of 1/2.

And so cos θ is very nearly √2/2, which means θ = 45°.

So in summary, if you generate 10,000 vectors in 20,000 dimensions, then find the angle of a new random vector with the span of the previous vectors, it’s very likely to be very near 45°.

Related postsNearly all the volume is in a thin shellNearly all the area is near the equator.

. More details

Leave a Reply