Regression Analysis (explain a phenomena or history)
Explain the known using the unknown
Estimate coefficients
If past history is accurate, you can predict the future
Use independent variable to get information on dependent variables
Correlation is the relationship between two variables
Process:
Theory, correlation, scatter diagrams for each variable
Look for relationship (linear or nonlinear)
% of variance in dependent variable explaining the model
Overall significance (y = B0 + B1X1 + B2X2 . .
Y = the explained (variance)
B0 = slope
(B0 + B1X1 + B2X2) is the explained or y predicted
E = unexplained error
Null hypothesis – none of the independent variables are significant
All Bs = 0, at least B
¹ 0 (H1) (predictors of y)Test statistic
F is the distribution of ratio to variance
P value = actual error (if less than 5% you are ok)
Compute p value and either accept or reject H1
Type I error: have a model, but should reject (use incorrect model)
Type II error: accept as null, could have used (missing an opportunity)
Significance level leads to the probability of making an error (
µ )Individual Significance
Low p value, low probability of making an error
High confidence that the coefficient is not zero
Compute p value and decide on H0
H1 = B
< 0, B > 0 (Select only one based on the relationship)Stop when you have all significant variables
The F statistic is equal to the regression mean square (MSR) divided by the error mean square (MSE). Where P = number of explanatory variables in the regression model
F = test statistic from an F distribution with P and n-P-1-1 degrees of freedom.
The decision rule is to reject H0 at the
Test for overall significance Excel output:
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
6 |
2228586.427 |
371431.07 |
1.0736532 |
0.42042919 |
Residual |
15 |
5189260.346 |
345950.69 |
||
Total |
21 |
7417846.773 |
df
Regression = P, the number of explanatory variablesdf Total = n of observations - 1
F = MRS/MSE
Significance F = p value The p value is the probability of obtaining a test statistic equal to or more extreme than the result obtained from the sample data. The p value is often referred to as the observed level of significance, the smallest level at which H0 can be rejected for a given data set.
R2 = SSR/SST - Measures the proportion of variation that is explained by the independent variable X in the regression model.
Regression analysis I used primarily for the purpose of prediction. The goal in regression analysis is the development of a statistical model that can be used to predict values of a dependent variable or response variable based on the values of least one explanatory or independent variable.
Correlation analysis, in contrast to regression, is used to measure the strength of the association between numerical values.
There are four major assumptions of regression: