ARTIFICIAL INTELLIGENCE
AN INNOVATIVE PARADIGM FOR SMART COMPUTING
DR. P. SHANMUGAVADIVU
PROFESSOR , DEPT. OF COMPUTER SCIENCE & APPLICATIONS
GANDHIGRAM RURAL INSTITUTE (DEEMED TO BE UNIVERSITY)
GANDHIGRAM, DINDIGUL, TAMIL NADU, INDIA.
COURSE OUTLINE
Day 1: Artificial Intelligence – An Overview
Day 2: Machine Learning Algorithms – Part 1
Day 3: Machine Learning Algorithms – Part 2
Day 4: Neural Networks & Deep Learning
Day 5: Convolutional Neural Networks
DAY 2: MACHINE LEARNING ALGORITHMS – PART 1
AGENDA
MACHINE LEARNING - OVERVIEW
APPLICATIONS OF ML
TYPES OF ML
ML ALGORITHMS
4
1 Machine Learning – An Overview
2 Applications of Machine Learning
3 Types of Machine Learning
4 Machine Learning Algorithms
5. Q & A
Machine Learning is the field of studythat gives computers the ability to learnwithout being explicitly programmed.
-
-[Arthur Samuel, 1959]
MACHINE LEARNING
A computer program is said to learn fromexperience E from some task T and someperformance measure P, if its P on T,improves with experience E.
-[Tom Mitchell, Carnegie Mellon University, 1997]
5
No human experts
Industrial/Manufacturing control
Mass spectrometer analysis, drug design, astronomical discovery
Black-box human expertise
Face/Handwriting/Speech recognition
Driving a car, flying a plane
Rapidly changing phenomena
Credit scoring, financial modeling
Diagnosis, fraud detection
Need for customization/personalization
Personalized news reader
Movie/Book recommendation
WHY MACHINE LEARNING?
6
ML, employs a variety of statistical, probabilistic, andoptimization techniques.
Algorithms that can learn from observational data, and canmake predictions based on it.
Explicitly used to make decisions based on learned patternsand create an analytical model for future predictions.
Data find patterns train itself produce an output.
The accuracy of classification is highly influenced by thedistribution and diversity of data.
MACHINE LEARNING - CONCEPT
7
APPLICATIONS OF MACHINE LEARNING
Healthcare Services
Language Translation
Online Fraud Detection
Online Customer Support
Email Spam and Malware Filtering
Social Media Services (Face Recognition, Similar Pins)
Virtual Personal Assistants (Smart Speakers, Smart Phones, Mobile Apps)
Prediction while commuting (Traffic Prediction, Online Transportation Networks)
https://medium.com/app-affairs/9-applications-of-machine-learning-from-day-to-day-life-112a47a429d0 8
DETECTION
Text, Speech & Image Interpretation
Human Behaviour & Identity
Abuse & Fraud Detection
PREDICTION GENERATION
Recommendation on Individual
Behaviour & Condition
Collective Behaviour
Visual Art
Music
Text
Design
THE POTENTIALS OF ML
9
ML FOR PREDICTIVE ANALYTICS
TheMechanics
10
Machine Learning
Supervised
Regression
Classification
Recommendation
UnsupervisedClustering
Association Rules
TYPES OF MACHINE LEARNING
11
SUPERVISED LEARNING
Labelled data are used to train the algorithms
Algorithms are trained using annotated data, where the input and the output are known
Uses the data patterns to predict the output for new data labels
It is mainly used in Predicting Modelling
12
UNSUPERVISED LEARNING
Unlabeled data are used to train the algorithm, which means it used against data that has no historical labels.
The purpose is to explore the data and find some structure within.
This learning technique works well on transactional data.
It is mainly used in Descriptive Modelling
13
Supervised Learning
Unsupervised Learning
Regression
• Linear
• Logistic
Classification
• Decision Trees
• Random Forest
• Naïve Bayes
• Support Vector Machine
• Neural Networks
Recommendation
• User-based
• Item-based
Clustering
• K-Means
• K-Nearest Neighbors
Association Rules
• Market Basket Analysis
14
SAMPLE ILLUSTRATION
15
REGRESSION
A regression problem is when the output variable is a real or continuous value
Examples:
Predicting age of a person
Predicting nationality of a person
Predicting whether stock price of a company will increase tomorrow
16
CLASSIFICATION
Classification is the process of categorizing a given set of data into classes.
Perform mapping function from input variable to discrete output variables.
Structured or unstructured data.
Main goal is to identify which category or class that the new data will fall in.
17
Types:• Simple Linear Regression: It is characterized
by one independent variable.• Multiple Linear Regression: It is characterized
by multiple independent variables.
I. LINEAR REGRESSION
It is a kind of predictive modelling where the possible output(Y) for the given input(X) ispredicted based on the previous data or values.
The main aim is to find the best fit line, which minimizes error
It is used to predict values within a continuous range rather than trying to classify theminto categories.
The known parameters are used to make a continuous and constant slope which is usedto predict the unknown or the result.
MACHINE LEARNING ALGORITHMS
18
While training the model :x: input training data (univariate – one input variable(parameter))y: labels to data (supervised learning)
When training the model :It fits the best line to predict the value of y for a given value of x.
The model gets the best regression fit line by finding the best a and b values.b: intercepta: coefficient of x
Once find the best a and b values, then the best fit line will produce. Finally using the model for prediction, it will predict the value of y for the input value of x.
Equation:
LINEAR REGRESSION…
19
The values a and b must be chosen so that the error is minimum.
If sum of squared error is taken as a metric to evaluate the model, then the goal is to obtain aline that best reduces the error.
LINEAR REGRESSION…
20
II. LOGISTIC REGRESSION
The appropriate regression analysis to conduct when the dependent variable has a binarysolution (output belongs to either of the two classes (1 or 0).
It is a classification algorithm that uses one or more independent variables to determine anoutcome.
Goal – to find the best fitting relationship between the dependent variable and a set ofindependent variables.
Merits – Understanding the influence of independent variables on the outcome of thedependent variable
Demerits – only works if the predicted variable is binary.
21
Logistic Regression
Target Variables Examples
Binomial 2 possible types “win” Vs “loss”, “pass” Vs “fail”,
Multinomial 3 or more (not ordered) “disease A” Vs “disease B” Vs “disease C”.
Ordinal Deals with ordered categoriesCategory:“very poor”, “poor”, “good”, “very good”. Score : 0, 1, 2, 3
LOGISTIC REGRESSION...
Two important parts of logistic regression
Hypothesis and Sigmoid Curve.
Hypothesis can derive the likelihood of the event.
Hypothesis Expectation:
Generated data can fit into a log function that creates an S-shaped curve known as “sigmoid”. (Converts any value from -∞ to + ∞ to a discrete value).
22
The hypothesis of logistic regression tends to limit the cost function between 0 and 1.Therefore linear functions fail to represent it as it can have a value greater than 1 or less than 0which is not possible as per the hypothesis of logistic regression.
Sigmoid function maps any real value into another value between 0 and 1. In machine learning, it is used to map predictions to probabilities, using the Formula:
Where,f(x) = output between 0 and 1 (probability estimate)x = input to the functione = base of natural log
LOGISTIC REGRESSION...
23
III. DECISION TREE
It is a Tree which is developed based on certain decisions taken by the algorithm in accordancewith the given data that it has been trained on.
Decision Tree uses the features in the given data to perform Supervised Learning and develop atree-like structure (data structure) whose branches are developed in such a way that given thefeature-set, the decision tree can predict the expected output relatively accurately.
First, it breaks down a data set into smaller subsetswith an associated decision tree.
Decision nodes - two or more branches and a leafnode - classification or decision.
The topmost decision node - best predictor calledroot node.
24
Entropy is a measure of “purity” of an arbitrary collection of information.
Information Gain: The amount of relevant information that is gained from a given randomsample size can be calculated
Entropy (E) is used to calculate Information Gain, which is used to identify which attribute of agiven dataset provides the highest amount of information.
The attribute which provides the highest amount of information for the given dataset isconsidered to have more contribution towards the outcome of the classifier and hence is giventhe higher priority in the tree.
DECISION TREE…
25
Advantages:
Specific
Easy to use
Versatile
Resistant to data abnormalities
Visualization of the decision taken
DECISION TREE…
https://www.knowledgehut.com/blog/data-science/classification-and-regression-trees-in-mach
Applications:
Select a flight to travel
Selecting alternative products
Sentiment Analysis
Energy Consumption
Fault Diagnosis
Limitations:
Sensitivity to hyperparameter tuning
Overfitting
Underfitting
26
The first step in developing a machine learning model is training and validation, by partition thedataset, which involves choosing what percentage of your data to use for the training, validation, andholdout sets.
27
TRAINING, VALIDATION, AND TESTING/HOLD OUT DATASETS
Taining Set(60-80%) Validation Set(15-25%) Testing Set(15-25%)
The sample of data used tofit the model
The sample of data used to provide anunbiased evaluation of a model fit on thetraining dataset while tuning modelhyperparameters.
The sample of data used to providean unbiased evaluation of a finalmodel fit on the training dataset
It uncovers or learnsrelationships betweenthe features and the targetvariable.
It is used to find how accurately itidentifies relationships between theknown outcomes for the target variableand the dataset’s other features.
It provides a final estimate ofthe model’s performance after ithas been trained and validated.
OVERFITTING AND UNDERFITTING
Underfit Model: A model that fails to sufficiently learn the problem and performs poorly on a training dataset and does not perform well on a holdout sample.
Overfit Model: A model that learns the training dataset too well, performing well on the training dataset but does not perform well on a hold out sample.
Good Fit Model: A model that suitably learns the training dataset and generalizes well to the hold out dataset.
28
Bias is the difference between the model’s average prediction and the expected value.
Variance in data is the variability of the model in a case where different Training Data is used.
29
BIAS-VARIANCE TRADEOFF
Characteristics of a biased model:
Underfitting
Low Training Accuracy
Inability to solve complex problems
Characteristics of a model with Variance
Overfitting
Low Testing Accuracy
Overcomplicating simpler problems
30
Train longer
Train a more complex model
Obtain more features
Decrease regularization
New model architecture
Obtain more data
Decrease number of features
Increase Regularization
New model architecture
BIAS-VARIANCE TRADEOFF
Detection and Solution to High Bias problem - if the training error is high:
Detection and Solution to High Variance problem - if a validation error is high:
31
Two variants of Machine learning algorithms are: Supervised Learning
Unsupervised Learning
Machine Learning is used for Prediction, Decision- making & Generation
Regression techniques are used for Predictive & Descriptive Analysis
Decision trees are used for Classification
TAKE-AWAY POINTS
WEB RESOURCES
REFERENCES
32
Q&A
33