The thing which we now need to understand is walk through some terminology. Like most fields machine learning has its own unique jargon which we need to understand.
Types of Machine Learning
Before we understand machine learning types we must understand what is called Training Data. Training Data is nothing but preparing data that can be used to create a model. So we can refer preferred data as Training data. Now the question arises why is it called Training data? Well in machine learning jargon creating a model called training a model so training data is used to create a model.
Now let’s understand machine learning types. There are two big broad categories of machine learning. One is called Supervised learning.
1. Supervised learning – Supervised learning means that the value you want to predict is available in training data.
For example, Earlier in the example we discussed credit card fraud and whether or not the transaction is fraudulent is actually contained in each record. This jargon of machine learning is labelled, so we are doing what is called supervised learning when we try to predict a new transaction is fraudulent.
2. Unsupervised learning – An alternative is called Unsupervised learning. Here the value you want to predict is not available in training data.
Both the approaches are used but most commonly supervised learning is being used.
Data Processing with Supervised Learning
The machine learning process starts with data. It might be relational data or could be in the form of NoSQL database or it could be in binary form. The source could be anything you have to read this raw data in some data preprocessing module(s). Typically it is chosen from the things your machine learning technology provides.
Raw data is very rarely in the right shape to be processed by machine learning algorithms. Majority of the machine learning process is utilized in reading the raw data to convert it into Training data.
For example, There could be holes in your data i.e. missing values or duplicates or redundant data. There could be data which expressed in two different ways or there is a piece of information which may not be predictive and won’t help to create a good model. You want to face all these issues to create training data.
The training data which we discussed in credit card fraudulent example commonly have columns. Those columns are called feature. In credit card example we saw column like credit card issued country, amount of transaction, credit card used country. Those are all feature in the jargon of Machine learning.
Since here we are talking about supervised learning the value we are trying to predict whether a transaction is fraudulent is available in training data. In the Jargon of Machine learning, we call it Target Value.
Classification of machine Learning problems
It is common to group machine learning problems into categories. There are a lot of categories but three of them are an awful lot. Let us discuss them
The problem here is that we have data and we would like to find a line or a curve that best fits that data. Regression problems are typically supervised learning scenarios. For example, we can consider an example of how many units will sell next month.
In this case, we have data and we want to group into classes, at least two or sometimes more. When new data comes in we want to determine which class it belongs to.
This is commonly used with supervised learning and example question would be like is this transaction fraudulent? which we have used throughout this machine learning tutorial.
Whenever new transaction comes in we want to predict which class it is in like fraudulent or not fraudulent.
In this scenario, we have data and we want to find clusters in that data. This is a very good example when you are going to use Unsupervised learning because we don’t have labelled data and we don’t know necessarily what we are looking for.
For example, what are our customer segments? We might not know these things up front but we can use machine learning, Unsupervised machine learning to help us figure it out.
Styles of machine Learning Algorithm Examples
The kind of problem that machine learning addresses aren’t the only thing that can be categorized. It is also useful to think about styles of machine learning algorithms that are used to solve those problems.
There is a Decision tree algorithm, there is an algorithm that uses a neural network which is some ways to emulate how the brain works. There are Bayesian algorithms that use Bayes theorem to work up probabilities and there are K-means algorithms that are used for clustering and lots more.
The details of these are way outside the scope of this course but having some broad sense of what the styles are certainly useful.