No matter what ML course you have chosen, the first learning goal of data science statistics modules will be the LR (linear regression), better to say, Simple Linear Regression in Machine Learning . In addition, we call this type of widely useful ML algorithm with an abbreviation of SLR.
In this blog, we’ll evaluate the foundational approach of Simple Linear Regression in Machine Learning in ML modelling.
What is SLR in Machine Learning?
Simple Linear Regression in Machine Learning (SLR) is a tactic that can help to review and evaluate relationships between two factors; where one of several factors is adjustable, this is certainly self-sufficient and can also be referred to as ‘explanatory’ or ‘stimulus’ or ‘predictor’ factors (variable). The other one is a subordinate factor, additionally known as a ‘response’ or ‘outcome’ factor.
Now, if you ask why ‘simple?’ Well, the phrase “Simple” relates to two factors used in this regression evaluation method. A line that is certainly straight used to mold linear regression and grant an explanation for the association between factors.
While you get to indulge in Machine Learning problems and then land on the expected and profitable outcomes, you need to find certain inter-relationships between a set of the above two types of variables. So here comes the application of Simple Linear Regression in Machine Learning .
What are the real-life applications of SLR algorithms?
If we sit to lists out the real-life instances of SLR in ML, then the list will be an endless entity. However, the handiest real-world example of the SLR application is as follows.
- Suppose you have decided to take a train your company employee with the basics of data analytics to improve your business outcomes. Now the amount you are going to invest in this training will be a self-sufficient factor. Therefore, based on the training-related investment, the percentage of ROI concerning your business decision improvement will be the outcome factor.
- Suppose you have planned to buy a second-hand car. But finding it difficult to set your budget based on car performance. To ensure the performance and parts availability, you have decided to consider up to a certain amount of age of the car. In such a scenario, you can apply SLR to set your budget. Here the age of the car will be a self-sufficient factor while the budget will be the outcome factor.
- Suppose you work for an e-commerce company in the marketing domain. A few months back your company have implemented new advertising strategies. But now you want to evaluate the profit level in monthly advertising cost with respect to the monthly sales rate. Here you can apply the SLR for ML modeling.
SLR can be the ultimate solution to lots of complex problems to a moderate business problem. Just keep one thing in mind, don’t forget to approach the linearity condition correctly.
What is the linearity condition in SLR?
SLR tries to solve the noticeable changes in the value of the subordinate factor (dependent)
Y with the familiarity of the values of the predictor (independent) variables X.
Hence, the equation 𝛼𝑖 + 𝛽𝑖𝑋 gives the predicted values of Yi for the provided credit of Xi. Hence,
So, you can consider 𝛼𝑖 + 𝛽𝑖𝑋 as the conditional credit that is certainly expected of Yi concerning the provided value of Xi.
Here 𝛼 and 𝛽 are the linear regression coefficients.
While doing SLR, the most vital thing to remember is that the linearity symptom in linear regression is characterized by the characteristics of regression coefficients and not regarding the explanatory variables in the data design.
Therefore, the useful formula of the SLR becomes as follows.
𝑌𝑖 = 𝛼𝑖 + 𝛽𝑖𝑋𝑖2+ 𝜀𝑖
⇒𝑌𝑖 = 𝛼𝑖 + 𝛽𝑖 ln(𝑋𝑖 ) + 𝜀𝑖
What can simple linear regression tell us that correlation does not tell us?
Although correlation apparently seems to be similar to the simple linear regression in actuality, there lies a range of differences between these two.
Difference1: Correlation quantifies the amount to which two factors are all related. Besides, fitting a line through the data set is not the case of correlation.
Difference 2: In case you need to quantify both the factors, correlation is often used. It infrequently works if one factor is something that you rightfully control. On the contrary, with Simple linear regression, the X factor is often something that you manipulate (it may be a time series or range of salary or price, etc. ). The Y factor is something that can be scaled (measured).
How does SLR work?
To make an SLR work to find out the solution to your identified problem, you need to follow a seven-step mathematical process as follows.
Step#1: Visualise the inter-connections between the identified factors through graphical outcomes. The standard type of graph used in SLR is a scatter plot.
Step#2: Utilise the OLS technique to calculate the regression parameters and defining the proper execution of the relationship between the variables.
Step#3:Calculate error that is standard of regression estimation.
Step#4: Calculate proper forecast interludes predicated upon your own postulates that are inclined to become normally discarded (normal distribution) depending on a prophesied charge of X.
Step#5: Validate the consequence of regression parameters received.
Step#6: Validate the best fitting quality for the model for the intact model. Only keep in mind, while dealing with the SLR algorithm, the value of p associated with the F-test and the linear regression coefficient remain identical.
Step#7: Identify the determinant and correlation coefficients.
Why use a scatter diagram in SLR?
While you choose SLR as your regression model, then the first thing you need to do is assessing the relationship between your identified factors.
To perform this inter-relationship identification, the best graphical visualization seems to be the scatter plot. The reason for choosing the scatter plot for this purpose is,
- Apart from the best-fit line, the dots (data points of identified variables) helps a lot to visualize the hidden pattern of inter-relationship between the factors.
- If the factors proved to be mutually inter-related, then the estimated equation for the identified relationship can be predicted. Then, with the help of this predicted equation, you can proceed with your ML algorithm modeling.
In case simple linear regression applies to a business problem, then the identified factors usually can be fo following six types of the scattered plot:
Fig:1
The above plot indicates an immediate linear connection between 2 sorts of factors (dependent and independent).
Fig:2
The above plot indicates an immediate but curvy linear connection between 2 sorts of factors.
Fig:3
The above plot indicates an immediate but inverted linear connection between 2 sorts of factors
Fig:4
The above plot indicates an inverted and curvy linear connection between 2 sorts of factors.
Fig:5
The above plot indicates a direct and inverted linear connection between 2 sorts of factors, unlike figure 3. But the extent of scattering is much higher in this case.
Fig 6:
The above plot indicates the non-linear relationship between the factors.
How to calculate the SLR in ML modeling?
To model, an ML algorithm utilizing SLR can be done either with Python or R. Here, I will explain the python programming variant.
To program an SLR model using python, six prime steps have to be followed cautiously. The prime steps are as follows.
- #1: Dataset Importing
- #2: Data Pre-processing
- #3: Segregation of the train and test sets
- #4: Assessing the linear regression model concerning the training dataset
- #5: Predicting evaluation success
#6: Conceiving the evaluation benefits
Now while using python programming, the generic step from 1 to 5 remains almost the same. However, depending on which type of graphs or chart you will be using, step 6 alters a bit. So the generic python programming for SLR regression is as follows.
Dataset Importing
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
dataset = pd.read_csv('file name.csv')
dataset.head()
data pre-processing
X = dataset.iloc[:, :-1].values #X is the array of self-sufficient factors
Y = dataset.iloc[:,1].values #Y is the vector consisting of subordinate factor.
segregation of the train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test, train_test_split(X,Y,test_size=1/3,random_state=0) # test size ⅓ is used as of the policy of 20-80 or 30-70 splitting.
Assessing the linear regression model with respect to the training dataset
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,Y_train) #This step provides the out of linear equation going to be used on the considered dataset.
Predicting evaluation success
y_pred = regressor.predict(X_test)
y_pred
y_test
Where to learn SLR?
If you want to learn more about the application of SLR in ML, you can join IBM certified Learnbay Data science and AI certification courses. The data science course syllabus of Learnbay offers balanced learning scopes on both statistics and programming- the two key pillars of data science career growth. Our AI and ML courses are available for both fresh graduates and working professionals. All of our courses are entitled to real-time industrial projects and live online classes. Our course is available in all the prime cities across India, such as Mumbai, Kolkata, Bengaluru, Hyderabad, Delhi, Lucknow, and Patna.
To learn more about Learnaby Data science, AI, and ML courses, and book a telephonic counseling session, click here.