The objective of this post is to explain the steps I took to implementing univariate linear regression in Python. Do note that I’m not using libraries with inbuilt ML models like `sklearn`

and `sci-py`

here.

Here is our gradient descent function, utilising mean squared error as the cost function.

def gradientDescent(theta, alpha, iterations): m = ex1data.shape[0] # finding the number of trial examples n = ex1data.shape[1] # finding the number of features + 1 for iteration in range(iterations): total0 = 0 total1 = 0 for row in range(m): # iterating over each training example hypothesis = 0 for val in range(n-1): hypothesis += ex1data.values[row][val] * theta[val][0] load = hypothesis - ex1data.values[row][n-1] total0 += load*ex1data.values[row][0] total1 += load*ex1data.values[row][1] temp0 = theta[0][0] - ((alpha*total0)/m) temp1 = theta[1][0] - ((alpha*total1)/m) theta = [[round(temp0, 4)], [round(temp1, 4)]] return theta

We carry out gradient descent 1500 times here, by setting iterations equal to 1500. Our starting values of theta are 0. Now, for those of you who don’t know how gradient descent works, here’s a short explanation that attempts to cover the crux of it. Intuitively, we subtract each of our output values as given by the hypothesis function by the target value we’re trying to predict, then square the difference. The gradient descent update rule subtracts the partial derivative of this (beyond us mortals for now) from the existing values of theta – updating them. This entire process repeats 1500 times until gradient descent converges to an optimal value of theta, or the minimum point of the cost function.

Now, we plot our data to see what it looks like.

m = ex1data.shape[0] print ('No. of training examples --> {}'.format(m)) # outputting the number of traning examples for the user eye = [] for i in range(0,m): eye.append(1) # creating an array of 1s and adding it to X if len(ex1data.columns) == 2: # to avoid an error wherein ex1data already has the column of 1s ex1data.insert(0, "feature1", eye) print ('here is theta (initial)') theta = [[0], [0]] matrix_print(theta)

Now, we firstly add a column vector consisting entirely of 1s, of dimensions m by 1, to our feature matrix. Hence, we have a feature matrix of m by 2, where one column pertains to our variable data and another to a column vector of 1s. We then initialise both values of theta to 0. Our learning rate, `alpha`

, will be set to 0.01 for gradient descent, and we will execute gradient descent 1500 times. After running gradient descent and plotting our predicted values against the actual dataset, this is what we get:

Pretty cool, right?

This entire example was based on solving the Week 1 problem set of Andrew Ng’s machine learning course on coursera.org through Python. So credits to Stanford University. Stay quarantined, stay safe!