Understanding Internals of Linear Regression

As I talked in one of my previous blogs, Regression is a type of Supervised Machine Learning where given a set of Input features, underlying algorithm returns a continuous valued output. Let’s take an example to understand it better:

In the above data-set, we are providing number of Independent features like house size, number of bedrooms, number of floors, number of bathrooms and age of the house. On the right most column is the actual price, the house was sold. This is generally called target or output. When a similar data-set is provided to an appropriate learning algorithm and after the learning phase is complete, we would expect the algorithm to predict the expected selling price of a house with similar features. One of a typical requirements is mentioned in the last row of the above data-set. As you can imagine, depending upon the values of input features like house size, bedrooms, age etc, the price of the house can be anywhere between 100K USD to 1 Million USD. That’s the reason of it being also called a Continuous Valued output.

Some of the common use-cases of Regression can be predicting population of a city, Sensex Index, Stock Inventory and many more. In-Fact Regression is one of the earliest forms of Statistical analysis and also one of the most widely used Machine Learning techniques.

In this blog, we will try to implement the simplest form of Regression referred as Linear Regression. For any Machine Learning algorithm, a typical work-flow looks like :

We provide Training Data-Set to a given Algorithm. It uses a Hypothesis which is essentially a mathematical formula to derive inferences about the data-set. Once the hypothesis has been formulated, we say that Algorithm has been trained. When we provide to algorithm a set of Input features, it return us an Output based upon its hypothesis. In the above scenario, we would provide details about House Size, Number of Bedrooms, Number of Floors, Number of Bathrooms and Age to the algorithm it in-turn would respond us with an expected price the house can fetch in the market.

Let’s see what can be the hypothesis / formula for our scenario. For easier understanding, let’s simplify the context a bit and think about having only house size as a single input feature against which we need to predict house prices. If we plot both these variables on a graph, it would look something like:

If we can draw a straight line across the points, then we should be able to predict the values of houses when different sizes are provided as inputs. Let’s take a look at different lines we can draw on the above graph:

The line which touches or is quite close to as many points as possible on the graph is most likely to provide us the best predictions while the line which touches least or is far from a majority of points is likely to provide us worst predictions. In our case line h1 appears to be the best possible scenario. The ability to draw such a straight line across a given data-set is in-essence the goal of any Linear Regression Algorithm. As we are attempting to draw a straight line, we call it a Linear Regression solution. In Mathematical terms, we can formulate our hypothesis as follows:

h_{\theta}x = \theta_{0} + \theta_{1}x

where \theta_{0} is the initial constant value we choose and \theta_{1} is the value we will multiple by a given house size. The goal of our hypothesis is to find values of \theta_{0} and \theta_{1} in such a way that difference between actual values provided in our training data-set and predicted values is minimum. Again if we have to represent it in mathematical terms, the formula would be :

J(\theta_{0}, \theta_{1}) = \dfrac {1}{2m} \sum \limits_{i=1}^{m} (h_{\theta} (x^{(i)}) - y^{(i)})^2

where \theta_{0} and \theta_{1} are the chosen parameters, m is the number of rows in our training data-set and h_{\theta} (x^{(i)} is the predicted value for ith element in our training data-set and y^{(i)} is the actual target value of ith element in our training data-set. In Machine Learning terminology, this function is popularly referred as Cost Function or Squared Error Function as well. Let’s take couple of examples to understand the functioning better:
Continue reading


Indic Threads Conference comes to Delhi

IndicThreads have been organizing Independent Technology conferences in Pune since 2006. They have conducted several very successful and high quality conferences in the areas of Java, Cloud Computing, Mobile Development and others. A complete listing of their conferences is available here.

I have spoken in IndicThreads conferences on couple of occasions in 2009 and 2010. After having thoroughly enjoyed the experience, I talked several times with Harshad Oak and Sangeeta Oak, people behind Indic Threads about the need of such a conference in Delhi area as well. At the start of 2012, the ideas finally materialized and they both decided to plan an Indic Threads conference in Gurgaon. With this followed several discussions about the suitable venue, partners, spreading the word, speakers and all the logistics to have an impact-ful Inaugural session of the conference.

With previous history of Indic Threads and our local contacts, an impressive list of Speakers covering a wide array of topics was managed. We were able to find an appropriate venue in the form of Fortune Select Hotel in Gurgaon.

Finally everything came together on Friday, 13th July when the conference started at 9:00 AM with more than 70 participants joining in. Proceedings were kick-started by an Inaugural address from Sanket Atal, CTO of MakeMyTrip.com about how to become 10x SoftwareEngineer. Considering it being a Technology conference for and by Software Developers, I felt it to be very apt way of getting started. Next 2 days were full of interesting talks on wide area of domains we as Software developers are expected to work on. This included JavaEE7 platform, NoSQL Databases, Mobile Applications, Scala, Hadoop and Node.js.

I also presented my experiences on Machine Learning. The slides for my talk are embedded below:

Continue reading