Easy K-Means Clustering with C# and ML.NET

Easy K-Means Clustering with C# and ML.

NETMark FarragherBlockedUnblockFollowFollowingMar 28Building machine learning apps in C# has never been easier!ML.

NET is Microsoft’s new machine learning library.

It can run linear regression, logistic classification, clustering, deep learning, and many other machine learning algorithms.

And it’s super easy to use too.

Watch this, I’m going to build a C# app that can automatically classify flowers based on their sepal and petal characteristics.

The first thing I need is a data file to train on.

I’ll use the famous Iris Flower Dataset for this.

The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in 1936.

The data set consists of 50 samples each of three species of flowers: Iris setosa, Iris virginica and Iris versicolor.

Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

The training data file looks like this:There are five columns in the data file:The length of the Sepal in centimetersThe width of the Sepal in centimetersThe length of the Petal in centimetersThe width of the Petal in centimetersThe type of Iris flowerI will build a machine learning model that reads in the sepal and petal dimensions, and then predicts the corresponding type of Iris flower.

However, I am going to be using K-Means Clustering.

This means I will not train a machine learning model on the labelled data.

Instead, I’ll ignore the labels completely and build a model that will attempt to figure out the flower types on its own.

My model will look for patterns in the sepal and petal dimensions and attempt to group all flowers into three distinct clusters that will hopefully correspond to the three flower types.

This is called unsupervised learning.

Here’s how to set up the project in NET Core:$ dotnet new console -o Clustering$ cd ClusteringNext, I need to install the ML.

NET package:$ dotnet add package Microsoft.

ML –version 0.

10.

0Now I’m ready to add some classes.

I’ll need one to hold a bug report, and one to hold my model’s predictions.

I will modify the Program.

cs file like this:The IrisData class holds one single Iris flower measurement.

Note how each field is adorned with a Column attribute that tell the CSV data loading code which column to import data from.

I’m also declaring a ClusterPrediction class which will hold a single cluster prediction.

Now I’m going to load the training data in memory:This code calls ReadFromTextFile<…>(…) to load the entire CSV data file in memory.

Now I’m ready to start building the machine learning model:Machine learning models in ML.

NET are built with pipelines, which are sequences of data-loading, transformation, and learning components.

My pipeline is super-simple and only has two components:Concatenate which combines all input data columns into a single column called Features.

This is a required step because ML.

NET can only train on a single input column.

A KMeans clustering learner which will attempt to find 3 distinct clusters in the dataset.

With the pipeline fully assembled, I can train the model with a call to Fit(…).

I now have a fully- trained model.

So let’s wrap this up and use the trained model to make a prediction.

I’m going to input an Iris flower with sepals of 3.

3 by 1.

6 centimeters, and petals of 0.

2 by 5.

1 centimeters.

What will the model make of it?Here’s how to make the prediction:I use the CreatePredictionEngine<…>(…) method to set up a prediction engine.

The two type arguments are the input data class and the class to hold the prediction.

And once my prediction engine is set up, I can simply call Predict(…) to make a single prediction.

Here’s the code running in the Visual Studio Code debugger on my Mac:The output is a bit small, so here’s the app again running in a zsh shell:The model has placed this flower in cluster number #1.

For each cluster, the model calculates the center point or centroid.

You can imagine this being the ‘perfect’ flower of that specific type.

The reported distances are how far away my test flower is from the three cluster centroids.

I get distances of 31.

39, 39.

11, and 54.

66, which means my test flower is closest to the centroid of cluster #1.

Here’s what that looks like in a graph:The model has identified three distinct clusters of flowers: the green, blue, and red markers in the graph.

The big plus, cross, and circle are the three centroids of each cluster.

My test flower is closest to the big red plus sign, which puts it in cluster 1.

So this is an example of unsupervised learning.

I didn’t show any labels to my model during training and it figured out the three flower clusters entirely on its own.

The model has no idea what the type of my test flower is, but it can tell me that it’s in cluster 1.

Unsupervised learning is a great machine learning tool for when we are looking for structure in data and we don’t have any labels.

The model will figure out the labels on its own.

So what do you think?Are you ready to start writing C# machine learning apps with ML.

NET?Add a comment and let me know!.

. More details

Leave a Reply