A Practical Look at Vectors and Your Data

But in order for R² to be called a vector space, you must verify that it follows all the rules; which it does.

In short, there are many sets that follow the rules above so naturally you give it a name.

And now when someone talks about some arbitrary set, you know that if the elements of the set follow the rules of a vector space, then the set must be a vector space.

FieldsAnother note to make is about these scalars with which you can multiply vectors.

Scalars are elements that belong to a set called a field which has the same rules as a vector space, but with the bonus of having a multiplicative inverse as well.

Since a field is a set that has all the rules a vector space has, then a field is also a vector space.

You’ve actually been using fields all along, like the set of real numbers or the set of complex numbers.

Any time you draw a plot on a two-dimensional grid, you’re actually drawing on a two-dimensional field.

SubspacesIn practice, you might think that it’s tedious to check every rule against a set to see if the set is a vector space.

Well, you’re right.

So that’s why it’s far easier to identify a set as a subset of a vector space you already know (like the set of real numbers) and prove that this subset is nonempty and that it’s closed under the same operations as the vector space (i.

e.

under addition and scalar multiplication).

If you’re able to do this, then your subset is called a subspace which also happens to be a vector space in and of itself (again, to see for yourself, take a subspace and verify that it has all the properties of a vector space).

So if you want to prove that a set is a vector space, try to prove that it’s a subspace instead.

Since subspaces are vector spaces in their own right, you’ll have successfully shown that a set is a vector space.

Real-World DataNow all this theory is not very helpful if you can’t apply it.

So consider the first few rows of the classic Iris dataset, which is a dataset containing samples of three different species of the Iris flower.

Some rows of the Iris datasetAs you can see from the header, the features of each sample are:sepal lengthsepal widthpetal lengthpetal widthEach row of features can be viewed as an ordered list.

The first list would be (5.

1, 3.

5, 1.

4, 0.

2), the second (4.

9 , 3, 1.

4, 0.

2), and so on.

But are these ordered lists vectors?.Well, each entry in the lists is a real number and thus belongs to the set of real numbers, R.

And since each of these ordered lists are four-dimensional, then they live in R⁴.

Since R⁴ is a vector space, then these ordered lists can be called vectors.

Notice that each entry in these vectors represents a dimension in R⁴ where each dimension corresponds to a feature in the dataset (like sepal length, sepal width, etc…).

That is, each feature in your data can be considered as random variables.

And since they’re random variables, we can do some descriptive statistics like find their means and standard deviations.

As you’ve already represented each row in the table as a vector, you can calculate the means and standard deviations of each random variable in one fell swoop.

This is the power of vectors.

Pretend that the rows/vectors (5.

1, 3.

5, 1.

4, 0.

2) and (4.

9 , 3, 1.

4, 0.

2) were all the data you had in the table above.

Using two of the vector space rules, scalar multiplication and addition, we can easily calculate the means of every random variable we have.

Scaling and adding vectors to find the meanSo the mean of sepal length is 5, the mean of sepal width is 3.

25, and so on.

Representing your data as a set of vectors is not just aesthetically pleasing to look at, but also more performant in calculations since computers are optimized for computations involving vectors (turns out you can replace a lot of for loops with vector and matrix operations instead).

ConclusionIn the end, because you can represent your data as vectors which belong to a set with special rules called a vector space, many of the awesome things you do with your data like: linear feature transformations, standardizations, and dimensionality-reduction techniques can be justified by the rules of vector spaces.

And not only are vector spaces foundational to any work involving data, it constitutes the bedrock of linear algebra which is central to almost all areas of mathematics.

Even if the phenomenon you’re studying is nonlinear, linear algebra is the go-to tool to use as a first-order approximation.

In short, vectors are tightly intertwined with your data and are very useful!.So next time you’re importing data into a dataframe and performing a bunch of operations, remember that your computer is treating your data as a set of vectors and is happily applying transformations in a performant way.

And all thanks to these little guys called vectors.

Originally published at bobbywlindsey.

com on March 5, 2019.

.

. More details

Leave a Reply