Gradient Descent for Machine Learning

By Admin Published in Data Science 5-7 mins
Table of content
Related Posts
Win the COVID-19

April 24, 2021

Model vs Algorithm in ML

April 29, 2021

Is AI a threat to humanity?
Akash Kumar

August 18, 2019

Tuples - An Immutable Derived Datatype
Vineeth Kumar

August 18, 2022

Young Data Scientists

December 17, 2021

Random forest model(RFM)

December 20, 2020

Data Science is Important!

December, 2021

Data Science at Intern Level

January 7, 2022

Text Stemming In NLP

July 5, 2022

Clustering & Types Of Clustering

November 17, 2020

Support Vector Machine

November 25, 2020

Operators in Python - Operation using Symbol
Vineeth Kumar

September 14, 2022

Basics of Functions In Python - A Glance
Vineeth Kumar

September 9, 2022

It is an optimization algorithm to find the values of parameters(coefficient) of a function(f) that minimizes the cost function. It is used when parameters can not be calculated by Linear algebra. To minimize the cost function J(w) parameterized by a model parameter w. It tells about the slope of the cost function. To minimize the cost function we move in the direction of Gradient descent. It helps to scale the large dataset.

We can fit the line with linear regression which is a straight line and squiggle with Logistic regression now we can fit the data into the line with the Gradient descent. Gradient descent optimizes these things and many more.

For example, we have a simple dataset:

We have weight in X-axis and Height in Y-axis with values (x1,y1)=(0.4, 1.3)

x2,y2= (1.2, 1.6)

x3,y3= (2, 3.1)

When we fit a line with linear regression we optimize the intercept and slope.

Height = intercpet+slope*Weight (simple line equation)

Here we can find the initialize the slope as 0.64 to find the intercept so for that just plug in the Least square estimate for the slope 0.64 and intercept as 0.

Height = intercpet+slope*Weight

Height = intercept+0.64*weight

The very first step we do is pick a random value for the intercept. This is an initial guess that gradient descent something to improve upon.

Predicted Height = 0+0.64*0.4

Predicted Height = 0.25

Residual = Observed – Predicted

1.4-0.25 = (1.15)

Similarly, we will calculate the residuals for all the three points in the dataset, so have all three predicted values are 1.15, 0.84, 1.82

Now we will calculate Sum of Squared Residuals = (1.15)^2+(0.84)^2+(1.82)^2 = 5.34

This is the new value for y-axis If we want to plot the graph with value 5.34 on the y-axis , we have

This graph represents the Sum of squared residuals with intercept zero, If we have intercept 0.5 the sum of squared residuals comes down on the graph. We can find the sum of squared residuals by changing the value of the intercept and residuals. Gradient descent does only a few calculations to find the optimal values and increases the number of calculations closer to the optimal values.

It identifies the optimal values with big steps if the values are far from each other and it takes baby steps when values are close to each other.

After getting all values of Sum of squared residuals now we have an equation of a curve, Thus we can take a derivative of this function & determine the slope & value of the intercept.

d/d intercept = d/d intercept*(1.3 – (intercept+ 0.64*0.4))^2

  • d/d intercept*(1.6 – (intercept+ 0.64*0.1.2)^2

  • d/d intercept*(3.1 – (intercept+ 0.64*2))^2

we will apply the chian rule to solve the derivative

d/d inetrcept = 2(1.4-(intercept+0.640.3))(-1) so that we have

d/d inetrcept = -2(1.4-(intercept+0.64*0.3))

  • -2(1.6 – (intercept+ 0.64*1.2)

  • -2(3.1 – (intercept+ 0.64*2)

Now we have the derivative so with the help of this derivative we can find that where the sum of squared residuals is lowest. With the help of the least square method to find the optimal number of intercepts we only determine the value where the value of slope would be 0, but with the help of gradient descent, we can find the minimum value by taking steps from the initial guess until we reach the best value.

This makes the GD very efficient when it is not possible to solve for where the derivative is 0, The closer we get to the optimal value for the intercept the closer the slope of the curve gets to zero 0. This means that the slope of the curve is 0.

We need to take baby steps when we close to the optimal value when the slope is near to 0 and if the slope is far from 0 then we need to take big steps as we are far from the optimal values.

d/d intercept = -2(1.4-(intercept+0.64*0.3))

  • -2(1.6 – (intercept+ 0.64*1.2)

  • -2(3.1 – (intercept+ 0.64*2)

d/d inetrcept = -2(1.4-(0+0.64*0.3))

  • -2(1.6 – (0+ 0.64*1.2)

  • -2(3.1 – (0+ 0.64*2) = -7.71

Step size = slope * learning rate

Step size = -7.71*0.01 = -0.77

New parameter = Old parameters – step size

= 0-(-0.77)= 0.77

What is learning rate?

In Machine Learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

There are following steps to solve the Gradient descent :

  1. Take the derivative of the loss function
  2. Pick a random value of parameters
  3. Plug the parameters to the derivative.
  4. Calculate the step size: Step size = slope * learning rate
  5. Calculate the new parameters: New parameteres = Old parameters – step size

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine Learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.

Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.


#Data Science