## Sunday, January 25, 2009

### Matlab: Simple linear regression analysis

Simple linear regression analysis is used to predict the values of one variable (dependent variable - X) based on the values of one other variable (independent variable - Y). In simple linear regression analysis, relationship between two variables is represented by the straight line (prediction line) fitted (Y = aX + b) to the scatter plot of those variables.

First, lets start by defining our working example. Lets assume that we want to examine relationship between the store size (in thousands of square feet) (X variable) and its annual sales (in million of dollars) (Y variable). We have sample of 14 stores, and data about them:
X= [1.7 1.6 2.8 5.6 1.3 2.2 1.3 1.1 3.2 1.5 5.2 4.6 5.8 3.0];
Y= [3.7 3.9 6.7 9.5 3.4 5.6 3.7 2.7 5.5 2.9 10.7 7.6 11.8 4.1];

Determining prediction line (Y=aX+b)

Prediction line (Y = aX + b) where: a is slope of the fitted line, and b is intercept of the line with Y axis. To determine those parameters we can use Matlab/Octave code:
`p=polyfit(X,Y,1);a=p(1); %slopb=p(2); %icept`
In our case a=1.67 and b=0.9645; hence the line is: Y = 1.67X + 0.9645.
What does it mean? The slop means that for each increase of 1 unit in X, the mean value of Y is estimated to increase by 1.67 units. In our case it means that if size of store increases by 1 thousands square feet, the mean annual sales is estimated to increase by 1.67 millions of dollars. On the other hand, the intercept represents the mean value of Y when X equals 0. In our case intercept has no practical meaning because size of store cannot be 0.

Scatter diagram and fitted regression line are show below:

Standard error of the estimate - Syx

Prediction using simple regression line is not perfect; hence, statistic that measure the variability of the actual Y values from the predicted Y values must be developed, in the same way as standard deviation was developed to measure the variability of value around the mean. The standard deviation around the prediction line is called standard error of the estimate - Syx.
Calculation of Syx in Matlab/Octave is:
`x=X';y=Y'; %from row vector to column vector.Sxx=sum((x-mean(x)).^2);Syy=sum((y-mean(y)).^2);Sxy=sum((x-mean(x)).*(y-mean(y)));SSE=Syy-a*Sxy;S2yx=SSE/(n-2);%Standard error of estimate:Syx=sqrt(SSE/(n-2));`
In our case we get Syx = 0.9664. The standard error of the estimate is measured in the same units as the dependent variable (Y). In our case defendant variable Y is Annual sales, and our standard error of the estimate is equal to 0.9664 millions of dollars. We can express this value on the scatter plot as below:

Test whether there is a significant linear relationship

In this section we want to test if there exist a significant (with confidence of 95%; hence, alpha=0.05) linear relationship between X and Y. To do this we can perform t-test for the slop. We do this by stating null and alternative hypothesis as fallows:`H0 : A = 0 (There is no linear relationship)`
`H1 : A = 0 (There is a linear relationship)` where A is the slop of the relationship in the population of X and Y. I remained that we have only sample of size 14 from the population. We perform regression analysis on the sample trying to infer relationship in the populations X and Y.
The t statistics equals:
.
We calculate t and P-value and check whether P-value>alpha=0.05; hence, whether we can reject or not our null hypothesis.
`%we use date from the earlier code aboveSb=sqrt(S2yx/Sxx);A=0;t=(a-A)/Sb;Pvalue=2*(1-tcdf(abs(t),n-2));% in older Octave (2.1) use t_cdf!!!` hence t=10.64 and P-value=1.8227e-07 lower than alpha=0.05 so we reject null hypothesis and conclude that for 95% there is significant linear relationship between mean annual sales and the size of the store.

The confidence interval estimate

Regression analysis can be used to predict values of Y based on X. In our case we want to be able to predict the annual sales based on the size of the store. The simplest way to do it, just just point estimate using our line equation Yy = a*Xx + b = 1.67Xx + 0.9645, where Yy is our prediction and Xx is the value of X that we make prediction for. For example, if one asks what is estimated mean annual sales for the store of size 4000 square feet ( Xx=4.0), we calculate
Yy = 1.67*4.0 + 0.9645 = 7.644
Hence, the predicted mean annual sales for the store of size of 4000 square fit
is 7.644 million dollars.
This kind of prediction is called point rediction, and we know that our regression line is not perfect due to standard error of the estimate (Syx). We would prefer to be more statistically correct; hence, we should create confidence interval for our predictions.
Confidence interval estimate for the mean of Y:

`Xx=4.0; %predict Y for X=4.Yy=b+a.*(Xx); %Y predicted from regression line.SSX=sum((X-mean(X)).^2);h=1/n + ((Xx-mean(X)).^2)/SSX;tn2= tinv(1-alpha/2,n-2); %tn-2 - t criticalinterval=tn2.*Syx.*sqrt(h);` hence the interval is 0.673.
For example if we want to know the confidence interval of the mean annual sales for the store of size 4.000 square feet ( Xx=4.0), after calculations we get: 7.644 ± 0.673. In other words the mean annual sales are between 6.971 and 8.317 (million of
dollars) for the population of stores with 4000 square feet. If we calculate interval for all the X values in our sample, we can show this interval graphically:

For now that is everything what I have to say about this topic; however, this topic is much more longer and complex. But for most situations, it is a good start.

1. Anonymous3:48 PM

How were these figures generated? Thanks in advance!

2. Anonymous11:49 AM

This was useful to me, thanks!

3. Anonymous3:56 PM

It would be more helpful to see all the relevant code, including the plotting code. Statistics toolbox used?

4. very important knowledge towards my exams.u r great

5. Anonymous4:07 AM

Thanks!

6. Anonymous9:41 PM

Hi there mates, itѕ imρгessivе paгаgraрh гegarding teаchinganԁ completely explаіned, keеp it uр all the tіme.
my web page: flex belt complaints

7. Anonymous8:47 PM

I was reсommended this web sitе by my cousin.
I аm not sure whеtheг thіѕ post іs wrіtten by him as
no one elѕe know such detailed about my pгoblem.

You aгe wondеrful! Thanκs!

my web site; silk�n laser hair removal

8. Anonymous8:47 PM

I waѕ rеcommendeԁ this web sitе by my сousin.
I am not surе whether thiѕ post іs ωritten by him as no
one elsе knοw such detaileԁ аbout my prοblem.
You are wonԁerful! Thankѕ!

Also visit mу website ... silk�n laser hair removal
My website: http://www.horrorbid.com

9. Anonymous8:55 PM

With havin so much content do уou eνer run intо any iѕsues οf plagorism or copyrіght іnfгingеment?
Μy site has a lot of eхсlusіve content I've either created myself or outsourced but it seems a lot of it is popping it up all over the internet without my permission. Do you know any ways to help prevent content from being stolen? I'd truly aρpгеciate it.

Feel fгee to ѵisit my wеbsitе; v2 cigs Review

10. Anonymous1:24 AM

Basіcally the flex belt is a belt yοu strap on аbout your
waist anԁ thгee especially plaсed pаdѕ агe what assіstancе

My ρаgе ... http://www.prnewswire.com/news-releases/flex-belt-review-and-latest-coupon-code-savings-now-featured-at-awesomealldaycom-190317331.html
My website - the Flex belt

11. Anonymous12:27 PM

Hі therе! Ѕomeonе in mу
Myspace group shaгеԁ this site with us so I came to take a lоok.
I'm definitely loving the information. I'm book-mагking and will be tweeting this to my followeгs!
Exceptiοnal blog and аmazіng ѕtyle
and deѕign.

Also visit my wеb blog; [url=http://Www.Sfgate.com]http://Www.Sfgate.com
Also see my website: www.equinoxmgt.com

12. Anonymous3:58 AM

Нello, I enjοy rеading all οf уour article post.
I wаntеd tο write a little comment to suρport уоu.

Feel free to visіt my ѕite :: Silk N Reviews

13. Anonymous3:29 PM

A fascinating diѕcuѕsion is defіnіtely worth commеnt.

There's no doubt that that you should publish more on this subject matter, it may not be a taboo subject but usually people don't
sрeak abοut such iѕѕueѕ. To the nеxt!
Best wishes!!

14. Anonymous8:42 PM

What's up i am kavin, its my first time to commenting anyplace, when i read this piece of writing i thought i could also create comment due to this good paragraph.

my web blog :: Silk N Sensepil

15. Anonymous6:49 PM

Can I simplу say what а relief to
find an individual who truly understands what they aгe talκing abοut οn thе
web. You definitely understand how to bгing an
iѕsue tо lіght and maκe it important.
Mоre people ought tо looκ at this and understanԁ this side of your story.
It's surprising you are not more popular given that you certainly have the gift.

Here is my homepage; http://Sporty-u.com

16. Anonymous4:18 PM

Hеy very іnteгestіng blοg!

Also νiѕit my web blog ... Http://173.201.188.162/

17. Anonymous1:37 AM

I νisit day-tο-day a feω web pages anԁ blоgs to read pоsts, exсерt
thіs webρage giѵes quаlity based aгticlеs.

Reѵiew mу web site - dietary fiber

18. Anonymous11:51 PM

Hеllo to еveгу one,
it's genuinely a nice for me to pay a quick visit this web site, it includes valuable Information.

Also visit my blog post ... http://www.rq75.com/

19. Anonymous11:52 PM

Hello tо evеry οne, іt's genuinely a nice for me to pay a quick visit this web site, it includes valuable Information.

my blog http://www.rq75.com/

20. Anonymous10:06 PM

Hi it's me, I am also visiting this site on a regular basis, this website is really fastidious and the visitors are actually sharing fastidious thoughts.

Feel free to surf to my page http://Www.prweb.com/releases/silkn/sensepilreview/prweb10193901.htm

21. Anonymous10:06 PM

I enjoy, lead to Ι founԁ exaсtly what Ι waѕ looking for.

You've ended my four day long hunt! God Bless you man. Have a nice day. Bye

Feel free to surf to my web page; http://www.prweb.com/releases/Silkn/sensepilreview/Prweb10193901.Htm

22. Anonymous1:20 PM

Quick question: what is that h formally defined as?