Python Machine Learning: Scikit-Learn Tutorial


The text that follows is owned by the site above referred.

Here is only a small part of the article, for more please follow the link



Karlijn Willems

Machine Learning with Python

Machine learning is a branch in computer science that studies the design of algorithms that can learn.

Typical tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example.

The hope that comes with this discipline is that including the experience into its tasks will eventually improve the learning. But this improvement needs to happen in such a way that the learning itself becomes automatic so that humans like ourselves don’t need to interfere anymore is the ultimate goal.

There are close ties between this discipline and Knowledge Discovery, Data Mining, Artificial Intelligence (AI) and Statistics. Typical applications can be classified into scientific knowledge discovery and more commercial ones, ranging from the “Robot Scientist” to anti-spam filtering and recommender systems.

But above all, you will know this discipline because it’s one of the topics that you need to master if you want to excel in data science.

Today’s scikit-learn tutorial will introduce you to the basics of Python machine learning: step-by-step, it will show you how to use Python and its libraries to explore your data with the help of matplotlib, work with the well-known algorithms KMeans and Support Vector Machines (SVM) to construct models, to fit the data to these models, to predict values and to validate the models that you have build.

If you’re more interested in an R tutorial, check out our Machine Learning with R for Beginners tutorial

Loading Your Data Set

The first step to about anything in data science is loading in your data. This is also the starting point of this scikit-learn tutorial.

This discipline typically works with observed data. This data might be collected by yourself or you can browse through other sources to find data sets. But if you’re not a researcher or otherwise involved in experiments, you’ll probably do the latter.

If you’re new to this and you want to start problems on your own, finding these data sets might prove to be a challenge. However, you can typically find good data sets at the UCI Machine Learning Repository or on theKaggle website. Also, check out this KD Nuggets list with resources.

For now, you should warm up, not worry about finding any data by yourself and just load in the digits data set that comes with a Python library, called scikit-learn.

Fun fact: did you know the name originates from the fact that this library is a scientific toolbox built around SciPy? By the way, there is more than just one scikit out there. This scikit contains modules specifically for machine learning and data mining, which explains the second component of the library name. 🙂


One Reply to “Python Machine Learning: Scikit-Learn Tutorial”

Leave a Reply

Your email address will not be published.