Types and Workflow of Machine Learning

Siddharth Das
4 min readAug 14, 2017

In today’s world machine learning is everywhere. Even though you have no idea about it or you haven’t worked on it as a developer, you’re familiar with it as a consumer. Amazon shows you a list of other recommended products that you might also like — that’s an example of machine learning.Not only Amazon, but also Netflix, Google, Siri, Pandora are using it. Virtually every moment of our lives is touched at some point by Machine Learning.

What is machine learning?

Machine learning means giving computer ability to learn to make decisions from data without being explicitly programmed. Tom Mitchell defines ML as “A computer program is said to learn from experience E, with respect to some task T, and some performance measure P, if its performance on T as measured by P improves with experience E.”

Types of Machine Learning Algorithms

AI and Machine Learning Diagram

Supervised : The training data consist of labeled inputs and known outcomes, here the algorithm generates a function that maps inputs
to desired outputs. The model is required to learn (to approximate the behavior of) a function which maps a vector into one of several classes by looking at several input-output examples of the function.

Supervised learning is again categorized into:

Classification: Here target variable consists of categories.In other words output are discrete and inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes. Spam filtering is an example of classification, where the inputs are email (or other) messages and the classes are “spam” and “not spam” . Methods used for classification are

• Linear Classifiers: Logical Regression, Naive Bayes Classifier, Perceptron, Support Vector Machine(SVM)

• Quadratic Classifiers
• K-Means Clustering
• Boosting
• Neural networks
• Bayesian Networks

Regression: It is also a supervised problem, the outputs are continuous rather than discrete. Methods used for regression are

•Linear Regression: It is one of the most widely known modeling technique.

• Logistic Regression
• Polynomial Regression
• Stepwise Regression
• Ridge Regression
• Lasso Regression
• ElasticNet Regression

Unsupervised: The machine analyzes unlabeled data and categorizes it based on similarities it has identified. These programs are used to identify groupings within data sets that may be difficult or impossible for a human to see.

Clustering: A set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task. Methods used for clustering are

• Gaussian mixtures
• K-Means Clustering
• Boosting
• Hierarchical Clustering
• K-Means Clustering
• Spectral Clustering

Semi-supervised: A combination of the above, used when there is a large amount of data but only some of it is labeled. Unsupervised learning techniques might be used to group and cluster the unlabeled data, while supervised learning techniques can be used to predict labels for it.

Reinforcement learning : where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm.Uses simple reward data to train the machine on ideal behavior within a specific context.

Workflow for machine learning:

Machine Learning Work-flow
  • Evaluating the problem : Before you even start thinking about how you might solve a problem with machine learning, you should take some time to think carefully about the problem you are trying to solve.
  • Data Acquisition : First, you need huge amounts of data. This data can be collected from any number of sources, including sensors and other objects, the cloud, and the Web. Following are links where you can get data from

http://archive.ics.uci.edu/ml/index.php

https://www.kaggle.com

  • Data Aggregation and Curation : Once the data is collected, data scientists will aggregate and label it (in the case of supervised machine learning). Split the data into training data and testing data. Python modules such as pandas, numpy are used for this purpose.
  • Model Development : Next,a suitable model is chosen so as get the desired result. Then training data is used to develop a model, which then gets trained for accuracy and optimized for performance.
  • Model Deployment and Scoring: The model is deployed in an application, where it is used to make predictions based on new data(testing data). And score is calculated by using the values held in the confusion matrix.
Confusion Matrix

Accuracy : This is the simplest scoring measure. It calculates the proportion of correctly classified instances.

Accuracy = (TP + TN) / (TP+TN+FP+FN)

Sensitivity (also called Recall or True Positive Rate): Sensitivity is the proportion of actual positives which are correctly identified as positives by the classifier.

Sensitivity = TP / (TP +FN)

Specificity (also called True Negative Rate) : Specificity relates to the classifier’s ability to identify negative results. Consider the example of medical test used to identify a certain disease. The specificity of the test is the proportion of patients that do not to have the disease and will successfully test negative for it. In other words:

Specificity: TN / (TN+FP)

Precision: This is a measure of retrieved instances that are relevant. In other words:

Precision: TP/(TP+FP)

  • Update with New Data: As more data comes in, the model becomes even more refined and more accurate.

Hope you find this useful :)

--

--

Siddharth Das

Research Associate @EVSTS | Ex Machine Learning Engineer @IntellibotRPA