Regression : Regression means to fit a curve to the given data such that error is least. The method is used is called Least Squares Method.
Least Squares Method to fit a Straight Line
Let input and output data is given as pairs \( \left(x_i , y_i \right)\) for \( i = 1, 2, 3, ….., N\) where \(N\) denotes number of data pairs.
Let Straight line fit to the data is given as
\[y=\alpha+\beta x \hspace{8.0 cm}(1.1)\]
where we need to determine the values of unknowns \(\alpha , \beta \).
applying summation from \(i = 1 \) to \(N\) both sides
\[\Rightarrow \sum\limits_{i = 1}^N {y_i} =\alpha\sum\limits_{i = 1}^N {1} +\beta \sum\limits_{i = 1}^N {x_i} \]
Since \(\sum\limits_{i = 1}^N {1} = N\)
\[\Rightarrow \sum\limits_{i = 1}^N {y_i} =\alpha N +\beta\sum\limits_{i = 1}^N {x_i} \hspace{8.0 cm}(1.2)\]
Multiplying by \(x\) in eqn. (1.1)
\[xy=\alpha x+\beta x^2 \]
applying summation from \(i = 1 \) to \(N\) both sides
\[\Rightarrow \sum\limits_{i = 1}^N {x_i y_i} =\alpha\sum\limits_{i = 1}^N {x_i} +\beta\sum\limits_{i = 1}^N {x_i^2} \hspace{8.0 cm}(1.3)\]
writing eqn. \((1.2)\) and eqn. \((1.3)\) in matrix form
\[\left( {\begin{array}{*{20}{c}}<br>N&{\sum\limits_{i = 1}^N {{x_i}} }\\<br>{\sum\limits_{i = 1}^N {{x_i}} }&{\sum\limits_{i = 1}^N {x_i^2} }<br>\end{array}} \right)\left( \begin{array}{l}<br>\alpha \\<br>\beta<br>\end{array} \right) = \left( \begin{array}{l}<br>\sum\limits_{i = 1}^N {{y_i}} \\<br>\sum\limits_{i = 1}^N {{x_i}{y_i}}<br>\end{array} \right)\hspace{5.0 cm}(1.4)\]
by solving this matrix form for unknowns \(\alpha , \beta\) we get
\[\alpha = \frac{{\sum\limits_{i = 1}^N {{y_i}} \sum\limits_{i = 1}^N {x_i^2 – \sum\limits_{i = 1}^N {{x_i}} \sum\limits_{i = 1}^N {{x_i}{y_i}} } }}{{N\sum\limits_{i = 1}^N {x_i^2} – {{\left( {\sum\limits_{i = 1}^N {{x_i}} } \right)}^2}}}{\rm{ }}\]
\[\beta =\frac{{N\sum\limits_{i = 1}^N {{x_i}{y_i} – \sum\limits_{i = 1}^N {{x_i}} \sum\limits_{i = 1}^N {{y_i}} } }}{{N\sum\limits_{i = 1}^N {x_i^2} – {{\left( {\sum\limits_{i = 1}^N {{x_i}} } \right)}^2}}}\]
Now substitute values of \(\alpha , \beta\) into eqn. \((1.1)\) to get the fitted straight line to the data.
Example (1)
Using least-squares method to fit a straight line curve that passes through data points \((0, 1), (1, 2), (3,4), (7,5),(9,8)\)
Solution:
Suppose linear model is of the form
\[y=\alpha+\beta x \hspace{12.0cm}(1)\]
calculating required values to solve the system of linear equations (1.2), (1.3) given data
where number of data pairs is \( N=5 \)
Substituting these values into eqns. (1.2), (1.3)
\[\Rightarrow 20=5\alpha+20\beta \hspace{11.0cm}(2)\]
\[\Rightarrow 121=20\alpha+140\beta \hspace{10.0cm} (3)\]
Solving eqns. (2) , (3)
\[\Rightarrow \alpha=1.2666 , \beta= 0.6833\]
now substituting into eqn. (1), we obtained straight line curve
\[ y =1.2667+0.6834 x \]
Example (2)
Fit a linear curve to the data given in the table
Solution:
Let linear fit to the given data be
\[y=\alpha+\beta x \hspace{8.0 cm}(1)\]
Since number of data pairs is \( N = 8\).
Calculating required values to solve matrix form \((1.4)\) as follows
\[\Rightarrow \sum\limits_{i = 1}^8 {x_i}= 36 , \sum\limits_{i = 1}^8 {y_i} = 69.31, \sum\limits_{i = 1}^8 {x_i^2} = 204 ,\sum\limits_{i = 1}^8 {x_iy_i}= 396.47\]
substituting in to eqn. (1.4)
\[\left( {\begin{array}{*{20}{c}}<br>8&36\\<br>36&204<br>\end{array}} \right)\left( \begin{array}{l}<br>\alpha \\<br>\beta<br>\end{array} \right) = \left( \begin{array}{l}<br>69.31\\<br>396.47<br>\end{array} \right)\]
Solving this form of linear equations
\[\Rightarrow \alpha =- 0.3914 , \beta = 2.0123 \]
substituting into eqn. \((1)\)
\[y=- 0.3914+2.0123 x \]
Plot of data pairs and fitted straight line
Example (3)
For the sample data given in the table is for a product whose advertisement cost and sales is given for the relationship of advertising cost and sales profit find the slope of the least squares of the best fit regression line then write the best fit linear model
Solution:
Since size of the data is \(N =9\)
Let relationship of advertising cost and sales profit is \[y=\alpha+\beta x \hspace{8.0 cm}(1)\]
where \(\beta\) is the slope of the line.
Table of the required data
Now substituting into formulas of coefficients of regression line
\[\alpha = \frac{3191.5 *2025029250-45480*12215408 }{9*2025029250-45480^2} = 365.6238\]
\[\beta =\frac{9 *12215408-45480*3191.5 }{9*2025029250-45480^2} =-0.00217 \]
Thus least squares fit line is \( y =365.6238-0.00217x \)
It’s slope \(\beta =-0.00217\)
Least Squares Method to fit a second degree polynomial
Let given data pairs \( \left(x_i , y_i \right)\) for \( i = 1, 2, 3, ….., N\)
To fit a polynomial of degree \(2\) in the form
\[y=\alpha+\beta x +\gamma x^2 \hspace{8.0 cm}(2.1)\]
We will calculate values of coefficients \(\alpha , \beta , \gamma\).
taking summation from \(i = 1 \) to \(N\)
\[\Rightarrow \sum\limits_{i = 1}^N {y_i} =\alpha\sum\limits_{i = 1}^N {1} +\beta \sum\limits_{i = 1}^N {x_i} +\gamma \sum\limits_{i = 1}^N {x_i^2}\]
we have \(\sum\limits_{i = 1}^N {1} = N\)
\[\Rightarrow \sum\limits_{i = 1}^N {y_i} =\alpha N +\beta\sum\limits_{i = 1}^N {x_i}+\gamma \sum\limits_{i = 1}^N {x_i^2} \hspace{8.0 cm}(2.2)\]
Multiplying by \(x\) in eqn. (2.1)
\[xy=\alpha x+\beta x^2+\gamma x^3 \]
applying summation from \(i = 1 \) to \(N\) to this equation
\[\Rightarrow \sum\limits_{i = 1}^N {x_i y_i} =\alpha\sum\limits_{i = 1}^N {x_i} +\beta\sum\limits_{i = 1}^N {x_i^2} +\gamma \sum\limits_{i = 1}^N {x_i^3} \hspace{6.0 cm}(2.3)\]
now multiplying by \(x^2\) in eqn. (2.1)
\[x^2y=\alpha x^2+\beta x^3+\gamma x^4 \]
applying summation
\[\Rightarrow \sum\limits_{i = 1}^N {x_i^2 y_i} =\alpha\sum\limits_{i = 1}^N {x_i^2} +\beta\sum\limits_{i = 1}^N {x_i^3} +\gamma \sum\limits_{i = 1}^N {x_i^4} \hspace{6.0 cm}(2.4)\]
writing eqns. \( (2.2), (2.3), (2.4)\) into matrix form
\[\left( {\begin{array}{*{20}{c}}<br>N&{\sum\limits_{i = 1}^N {{x_i}} }&{\sum\limits_{i = 1}^N {x_i^2} }\\<br>{\sum\limits_{i = 1}^N {{x_i}} }&{\sum\limits_{i = 1}^N {x_i^2} }&{\sum\limits_{i = 1}^N {x_i^3} }\\{\sum\limits_{i = 1}^N {x_i^2} }&{\sum\limits_{i = 1}^N {x_i^3} }&{\sum\limits_{i = 1}^N {x_i^4} }<br>\end{array}} \right)\left( \begin{array}{l}<br>\alpha \\<br>\beta\\ \gamma<br>\end{array} \right) = \left( \begin{array}{l}<br>\sum\limits_{i = 1}^N {{y_i}} \\<br>\sum\limits_{i = 1}^N {{x_i}{y_i}}\\\sum\limits_{i = 1}^N {x_i^2{y_i}} <br>\end{array} \right)\hspace{5.0 cm}(2.5)\]
by solving this matrix, we will obtain \(\alpha , \beta , \gamma\)
then substituting values of \(\alpha , \beta , \gamma\) into eqn. \((2.1)\) to get the fitted second degree polynomial.
Standard Error
\[ SE =\sqrt {\frac{{\sum\limits_{i = 1}^N {{{\left( {{y_i} – f({x_i})} \right)}^2}} }}{N}} \hspace{3 cm}(2.6)\]
Correlation coefficient
\[\frac{{\sum\limits_{i = 1}^N {{x_i}{y_i} – \frac{{\sum\limits_{i = 1}^N {{x_i}\sum\limits_{i = 1}^N {{y_i}} } }}{N}} }}{{\sqrt {\left( {\sum\limits_{i = 1}^N {{x_i}^2 – \frac{{{{\left( {\sum\limits_{i = 1}^N {{x_i}} } \right)}^2}}}{N}} } \right)\left( {\sum\limits_{i = 1}^N {{y_i}^2 – \frac{{{{\left( {\sum\limits_{i = 1}^N {{y_i}} } \right)}^2}}}{N}} } \right)} }}\hspace{3 cm}(2.7)\]
Example (1)
Fit a second degree polynomial to the following data
Solution:
\[f(x)=\alpha+\beta x +\gamma x^2\]
Calculating required values to solve the system (2.5)
\(N = 6\)
\[ \sum\limits_{i = 1}^6 {x_i} = 12, \sum\limits_{i = 1}^6 {y_i} =32.2, \sum\limits_{i = 1}^6 {x_i^2} = 35.2, \sum\limits_{i = 1}^6 {x_i^3} =115.2 , \sum\limits_{i = 1}^6 {x_i^4} =400.9984, \sum\limits_{i = 1}^6 {x_iy_i} = 109.2, \sum\limits_{i = 1}^6 {x_i^2y_i} = 383.3984 \]
Substituting in to system (2.5)
\[\left( {\begin{array}{*{20}{c}}<br>6&12&35.2\\<br>12&35.2&115.2\\35.2&115.2&400.9984<br>\end{array}} \right)\left( \begin{array}{l}<br>\alpha \\<br>\beta\\ \gamma<br>\end{array} \right) = \left( \begin{array}{l}<br>32.2\\<br>109.2\\383.3984<br>\end{array} \right)\]
now solving resulting system
\[\Rightarrow \alpha =-\frac{1}{2} , \beta = 0 , \gamma = 1 \]
thus second degree polynomial becomes
\[f(x) = x^2 – \frac{1}{2} \]
Example (2)
Fit a second order polynomial to the following data
and then find standard error and correlation coefficient.
Solution:
Let second order polynomial be given as
\[f(x) =\alpha+\beta x +\gamma x^2\]
to determine \(\alpha, \beta , \gamma \)
To solve the system \((2.5)\) we need to calculate required data in the following table
Here \(N = 5\)
substituting into the system \((2.5)\)
\[\left( {\begin{array}{*{20}{c}}<br>5&5&7.5\\<br>5&7.5&12.5\\7.5&12.5&22.125<br>\end{array}} \right)\left( \begin{array}{l}<br>\alpha \\<br>\beta\\ \gamma<br>\end{array} \right) = \left( \begin{array}{l}<br>8.75 \\<br>11.25\\18.5625<br>\end{array} \right)\]
Solving this system we get \(\alpha = 1, \beta = 0 , \gamma = 0.5 \)
thus \(f(x) =0.5 x^2+1 = \frac{x^2}{2}+1\)
standard error
\[ SE =\sqrt {\frac{{\sum\limits_{i = 1}^N {{{\left( {{y_i} – f({x_i})} \right)}^2}} }}{N}} \]
calculating
hence \[ SE = 0\]
Correlation coefficient
\[r = \frac{{\sum\limits_{i = 1}^N {{x_i}{y_i} – \frac{{\sum\limits_{i = 1}^N {{x_i}\sum\limits_{i = 1}^N {{y_i}} } }}{N}} }}{{\sqrt {\left( {\sum\limits_{i = 1}^N {{x_i}^2 – \frac{{{{\left( {\sum\limits_{i = 1}^N {{x_i}} } \right)}^2}}}{N}} } \right)\left( {\sum\limits_{i = 1}^N {{y_i}^2 – \frac{{{{\left( {\sum\limits_{i = 1}^N {{y_i}} } \right)}^2}}}{N}} } \right)} }}\]
\[ =\frac{11.25-\frac{5*8.75}{5}}{\sqrt{(7.5-\frac{5^2}{5})(18.03125-\frac{8.75^2}{5})}} \]
\[ r = 0.95892\]