Step by Step Scikit

Step by Step ScikitANKIT BHADORIYABlockedUnblockFollowFollowingMar 24Python is fascinating.

Has enormous List of application areas, just like in Machine-Learning.

The answer to ML using Python is its very own ML library called Scikit-learn.

It works with Other Python Libraries Like Numpy, Scipy, Matplotlib.

Has an open Source BSD license, with stable List of expert contributors and availability or tools for most of the machine learning task, and so it’s a pick.

But for Someone to Begin with Scikit-Learn gets really tough as there’s no clear procedure might not be known, right?So here’s your guide to tackle this problem and get your first shot at bull’s eye.

Because“Ain’t no problem that can’t be solved.

”A machine learning problem is Might not always be a linear one.

But u can Stick to Some steps and find your way out in most.

While using Scikit-learn you can prefer following steps: -1.

Install Scikit-learn with Numpy, scipy and Matplotlib platform.

(If you haven’t)2.

Loading the dataset.

3.

Dataset Summarizing4.

Dataset Visualizing5.

Evaluating multiple algorithms over a problem.

6.

Select The Best Fitting Algorithm, by analyzing the accuracy of each.

Making some predictionsStep 1.

A) Install Scikit-learn Libraries with Numpy, Scipy and Matplotlib.

You can Install Sckit-learn using command line using Pip.

Or if you have Conda distribution, then you can use the following used given below.

1.

B) Start Python and Check VersionsStart your python environment, and check the version it can be any among Python v2, v3.

Or, you can use Jupyter Notebook.

2) Load The DataNow Here Scikit provides a lot of functionality, as it already has a number of datasets that come with the Scikit-learn library.

You can use these datasets based upon your problem, if it relates to the dataset respectivelySome of the datasets that Scikit Provides are: -1.

Boston house prices dataset2.

Iris plants dataset3.

Diabetes dataset4.

Optical recognition of handwritten digits’ dataset5.

Linnerrud dataset6.

Wine recognition dataset7.

Breast cancer wisconsin (diagnostic) dataset* Real world datasets1.

The Olivetti faces dataset2.

The 20 newsgroups text datasetOr a user can import External datasets too.

3.

Summarize the DatasetOnce you load the dataset you should take a quick look at the datasets and judge it on basis on few different factors.

1.

The dataset Dimensions:-To get an idea of how many instances (rows) and how many attributes (columns) the data contain with the shape property.

Code: -# shapeprint(dataset.

shape)1.

Attributes Statistical summary:-To observe the count, mean, the min and max values as well as some percentiles.

Code: -# descriptionsprint(dataset.

describe())2.

Breakdown of the data using class variables.

To take a look at the number of instances (rows) that belong to each class.

It can be viewed as an absolute count.

# class distributionprint(dataset.

groupby(‘class’).

size())you can add a step of “Data Visualization” and, thus examine the data visually by plotting the graphs and histograms.

Which can be Variate or univariate plots.

4.

Evaluate AlgorithmNow this is a crucial step we would love not to mess up.

So to implement it effectively we would opt the following procedure.

1.

Separating the validation dataset.

–We will separate the loaded dataset into two, suppose 80% of it we will use to train our models, while 20% we will hold back as a validation dataset.

2.

Set-up the test harness –We will use fold cross validation to get the estimatation of accuracy.

Suppose This we can split our dataset into 10 parts, train on 9 and test on 1 and repeat for all combinations of train-test splits.

3.

Build different models to get the prediction right,Why?So to get an accurate prediction to respective extent there is a combination of linear and nonlinear models you can use.

· Logistic Regression (LR)· Linear Discriminant Analysis (LDA)· K-Nearest Neighbors (KNN).

· Classification and Regression Trees (CART).

· Gaussian Naive Bayes (NB).

· Support Vector Machines (SVM)And this we will do Scikit-learn as# Spot Check Algorithmsmodels = []models.

append((‘LR’, LogisticRegression()))models.

append((‘LDA’, LinearDiscriminantAnalysis()))models.

append((‘KNN’, KNeighborsClassifier()))models.

append((‘CART’, DecisionTreeClassifier()))models.

append((‘NB’, GaussianNB()))models.

append((‘SVM’, SVC()))Once you are done with this You can, write the Code for Cross validation score of models to get the results and Accuracy percentile of each model.

4.

Select the best model.

Based on the accuracy Choose the model (ML Algorithm to be used )Once you know which algorithm was working best You can start making predictions.

According to the figure given above we can say KNN was the most accurate algorithm to create the model.

And at last we will validate the result with the individual Algorithm and that’s How you are done with Your first Scikit based Machine Learning Prediction.

Code to apply Individual algorithm (Here KNN)# Make predictions on validation datasetknn = KNeighborsClassifier()knn.

fit(X_train, Y_train)predictions = knn.

predict(X_validation)print(accuracy_score(Y_validation, predictions))print(confusion_matrix(Y_validation, predictions))print(classification_report(Y_validation, predictions))And here we are getting about 90% accuracy with our prediction.

Before enjoying our little victory over Scikit based task.

There’s this dude Called Tensor follow we should talk about.

So “Tensor flow is an alternative for Scikit-learn.

”Tensor Flow: -Rather than referring it as a Machine learning library like Scikit-learn, it can bereferred as “An open source Math library for numerical computation using data flow graphs.

”In these graphs Nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

Tensor flow has more advantages over Scikit-learn, which can be observed by the number of people using it for machine learning application.

Which is huge in comparison to Scikit-learn community.

The Reasons for Tensor flow to be a better option are endless, as it serves as complete environment for an Ai developer.

We will discuss in future blogs in details.

.

. More details

Leave a Reply