I would love to know.
Share your model and score in the comments below and I will try to reproduce it and update the post (and give you full credit!)Let’s dive in.
Classification is a predictive modeling problem that predicts one label given one or more input variables.
The baseline model for classification tasks is a model that predicts the majority label.
This can be achieved in scikit-learn using the DummyClassifier class with the ‘most_frequent‘ strategy; for example:The standard evaluation for classification models is classification accuracy, although this is not ideal for imbalanced and some multi-class problems.
Nevertheless, for better or worse, this score will be used (for now).
Accuracy is reported as a fraction between 0 (0% or no skill) and 1 (100% or perfect skill).
There are two main types of classification tasks: binary and multi-class classification, divided based on the number of labels to be predicted for a given dataset as two or more than two respectively.
Given the prevalence of classification tasks in machine learning, we will treat these two subtypes of classification problems separately.
In this section, we will review the baseline and good performance on the following binary classification predictive modeling datasets:The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
Note: you may see some warnings, but they can be safely ignored.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
Note: you may see some warnings, but they can be safely ignored.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
In this section, we will review the baseline and good performance on the following multiclass classification predictive modeling datasets:The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
Regression is a predictive modeling problem that predicts a numerical value given one or more input variables.
The baseline model for classification tasks is a model that predicts the mean or median value.
This can be achieved in scikit-learn using the DummyRegressor class using the ‘median‘ strategy; for example:The standard evaluation for regression models is mean absolute error (MAE), although this is not ideal for all regression problems.
Nevertheless, for better or worse, this score will be used (for now).
MAE is reported as an error score between 0 (perfect skill) and a very large number or infinity (no skill).
In this section, we will review the baseline and good performance on the following regression predictive modeling datasets:The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
The complete code example for achieving baseline and a good result on this dataset is listed below.
Running the example, you should see the following results.
This section provides more resources on the topic if you are looking to go deeper.
In this post, you discovered standard machine learning datasets for classification and regression and the baseline and good results that one may expect to achieve on each.
Specifically, you learned:Did I miss your favorite dataset?.Let me know in the comments and I will calculate a score for it, or perhaps even add it to this post.
Can you get a better score for a dataset?.I would love to know; share your model and score in the comments below and I will try to reproduce it and update the post (and give you full credit!)Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.
with just a few lines of scikit-learn codeLearn how in my new Ebook: Machine Learning Mastery With PythonCovers self-study tutorials and end-to-end projects like: Loading data, visualization, modeling, tuning, and much more.
Skip the Academics.
Just Results.
.