Guide to Supervised Machine Learning

March 15, 2019 0 Comments

Guide to Supervised Machine Learning



The competitive advantage of any company is built upon insights. Understanding what the information holds for you is one of the most vital requirements for business success.

Supervised Machine Learning paves the way for understanding uneven, hidden patterns in data by transforming raw data into the menagerie of insights that show you how to move forward and accomplish your goals.

The secret of the successful use of machine learning lies in knowing what exactly you want it to do. In this article, we will take a closer look at business applications of supervised learning algorithms.

What Is Supervised Machine Learning?

Supervised learning is a type of machine learning algorithm that looks for the unknown in the known. 

For example, you have known input (x) and output (Y). A simplified supervised machine learning algorithm would look like an equation:

Y = f(x)

Where your goal is to train your model in such a way that you would be able to tell what kind of Y would you get if you change x. In less technical terms, it is an algorithm designed to sort through the data and squeeze the gist of it in the process so that you could understand what the future holds for you.

Supervised machine learning is all about:

  • Scaling the scope of data;
  • Uncovering the hidden patterns in the data;
  • Extracting the most relevant insights;
  • Discovering relationships between entities;
  • Enabling predictions of the future outcomes based on available data;

How does it work?

The supervised learning algorithm is trained on a labeled dataset, i.e., the one where input and output are clearly defined.

Data Labeling means:

  • Defining an input - the types of information in the dataset that the algorithm is trained on. It shows what types of data are there and what are their defining features;
  • Defining an output - labeling sets the desired results for the algorithm. It determines the articulation of the algorithm with the data (for example, matching data on “yes/no” or “true/false” criteria).

The labeled dataset contains everything the algorithm needs to operate while setting the ground rules. The training process consists of 80% of training data and 20% of testing data.

With clearly determined values, the “learning” process is enabled, and the algorithm can "understand" what it is supposed to be looking for. From the algorithm's perspective, the whole process turns into something akin to “connect the dots” exercise.

Now let’s look at two fundamental processes of supervised machine learning - classification and regression.

Classification - Sorting out the Data

Classification is the process of differentiating and categorizing the types of information presented in the dataset into the discrete values. In other words, it is the “sorting out” part of the operation.

Here’s how it works:

  1. The algorithm labels the data according to the input samples on which the algorithm was trained.
  2. It recognizes certain types of entities, looks for similar elements and couples them into relevant categories.
  3. The algorithm is also capable of detecting anomalies in the data.

Classification process covers optical character or image recognition, and also binary recognition (whether a particular bit of data is compliant or non-compliant to certain requirements in a manner of “yes” or “no”).

Regression - Calculating the Possibilities

Regression is the part of supervised learning that is responsible for calculating the possibilities out of the available data. It is a method of forming the target value based on specific predictors that point out cause and effect relations between the variables.

The process of regression can be described as finding a model for distinguishing the data into continuous real values. In addition to that, regression can identify the distribution movement derived from the part data.

The purpose of regression is:

  • To understand the values in the data
  • To identify the relations or patterns between them.
  • To calculate predictions of certain outcomes based on past data.

Supervised Machine Learning Real Life Examples


Decision Trees - Sentiment Analysis & Lead Classification

Decision trees are a primary form of organizing the operation in machine learning, which can be used both for classification and regression models. The decision tree breaks down the dataset into exponentially smaller subsets with a deeper definition of an entity. It provides the algorithm with the decision framework.

Structure-wise, decision trees are comprised of branches with different options (nodes) going from general to specific. Each branch constitutes a sequence based on compliance to the node requirements.

Usually, the requirements of the nodes are formulated as simple as “yes” and “no”. The former enables further proceeding while the latter signifies the conclusion of the operation with the desirable result.

The depth of the decision tree depends on the requirements of the particular operation. For example, the algorithm should recognize the images of apples out of the dataset. One of the primary nodes is based on color “red,” and it asks whether the color on the image is red. If “yes” the sequence moves on. If not - the image is passed on.

Overall, decision trees use cases include:

  • Customer’s Sentiment Analysis
  • Sales Funnel Analysis

Linear Regression - Predictive Analytics

Linear Regression is the type of machine learning model that is commonly used to get the insight out of available information.

It involves determining the linear relationship between multiple input variables and a single output variable. The output value is calculated out of a linear combination of the input variables.

There are two types of linear regression:

  1. Simple linear regression - with a single independent variable used to predict the value of a dependent variable
  2. Multiple linear regression - with numerous independent variables used to predict the output of a dependent variable.

It is a nice and simple way of extracting an insight into data.

Examples of linear regression include:

  • Predictive Analytics
  • Price Optimization (Marketing and sales)
  • Analyzing sales drivers (pricing, volume, distribution, etc.)


Logistic Regression - Audience Segmentation and Lead Classification

Logistic regression is similar to linear regression, but instead of a numeral dependent variable, it uses a different type of variables, most commonly binary “yes/no” / “true/false” variations.

Its primary use case is for binary prediction. For example, it is used by insurance companies to determine whether to give a credit card to the customer or decline.

Logistic Regression also involves certain elements of classification in the process as it classifies the dependent variable into one of the available classes.

Business examples of logistic regression include:

  • Classifying the contacts, leads, customers into specific categories
  • Segmenting target audience based on relevant criteria
  • Predicting various outcomes out of input data

Random Forest Classifier - Recommender Engine, Image Classification, Feature Selection

Random Forest Classifier is one of the more elaborate variations of the decision trees.

It creates a sequence of decision trees based on a randomly organized selection from the training dataset. Then it gathers the information from the other decision trees so that it could decide on the final class of the test object.

The difference from the traditional decision trees is that random forest applies an element of randomness to a bigger extent than usual. Instead of simply looking for the most important feature upon the node split, it tries to find the best feature in the random selection of features.

This brings a large degree of diversity to the model and can seriously affect the quality of its work.

Deep decision trees may suffer from overfitting, but random forests avoid overfitting by making trees on random subsets. It takes the average of all the predictions, which cancels out the biases.

Random Forest Classifier use cases include:

  • Content Customization according to the User Behavior and Preferences
  • Image recognition and classification
  • Feature selection of the datasets (general data analysis)

Gradient Boosting Classifier - Predictive Analysis

Gradient Boosting Classifier is another method of making predictions. The process of boosting can be described as a combination weaker (less accurate) learners into a stronger whole.

Instead of creating a pool of predictors, as in bagging, boosting produces a cascade of them, where each output is the input for the following learner.  It is used to minimize prediction bias.

Gradient boosting takes a sequential approach to obtain predictions. In gradient boosting, each decision tree predicts the error of the previous decision tree — thereby boosting (improving) the error (gradient).

Gradient Boosting is widely used in sales, especially in retail and eCommerce sectors. The use cases include:

  • Inventory Management
  • Demand Forecasting
  • Price Prediction.

Support Vector Machines (SVM) - Data Classification

Support Vector Machines (aka SVM) is a type of an algorithm that can be used for both for Regression and Classification purposes.

In its core - it is a sequence of decision planes that define the boundaries of the decision. Different planes signify different classes of entities.

The algorithm performs classification by finding the hyperplane (a unifying plane between two or more planes) that maximizes the margin between the two classes with the help of support vectors. This shows what the features the data and what they might mean in a specific context.

Support Vector Machines algorithms are widely used in ad tech and other industries for:

  • Segmenting audience
  • Managing ad inventory
  • Providing a framework for understanding the possibilities of conversions in the specific audience segments of the particular types of ads.
  • Text Classification

Naive Bayes classifier is based on Bayes’ theorem with the independence assumptions between predictors, i.e., it assumes the presence of a feature in a class is unrelated to any other function. Even if these features depend on each other or upon the existence of the other elements, all of these properties work independently. Thus, the name Naive Bayes.

It is used for classification based on the normal distribution of data.

Naive Bayes model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets.

Naive Bayes use cases include:

  • Data Classification (such as spam detection)
  • Lead Classification
  • Sentiment Analysis (based on input texts, such as reviews or comments)


The vast majority of business cases for machine learning use supervised machine learning algorithms to enhance the quality of work and understand what decision would help to reach the intended goal.

As we have seen in this article, numerous business areas can benefit from the implementation of ML - sales and marketing, CEOs and business owners, the list goes on. 

You've got business data, so make the most of it with machine learning. 

Tag cloud