Regression Analysis (explain a phenomena or history)

Explain the known using the unknown

Estimate coefficients

If past history is accurate, you can predict the future

Use independent variable to get information on dependent variables

Correlation is the relationship between two variables

Process:

Identify Variables (independent and dependent)

Theory, correlation, scatter diagrams for each variable

Look for relationship (linear or nonlinear)

Collect and organize data (from multiple sources)

Estimate coefficients (using least squares method – r²)

% of variance in dependent variable explaining the model

Test model (for overall and individual significance)

Overall significance (y = B₀ + B₁X₁ + B₂X₂ . . ± E)

Y = the explained (variance)

B₀ = slope

(B₀ + B₁X₁ + B₂X₂) is the explained or y predicted

E = unexplained error

Null hypothesis – none of the independent variables are significant

All Bs = 0, at least B ¹ 0 (H₁) (predictors of y)

Test statistic

F is the distribution of ratio to variance

P value = actual error (if less than 5% you are ok)

Compute p value and either accept or reject H₁

Type I error: have a model, but should reject (use incorrect model)

Type II error: accept as null, could have used (missing an opportunity)

Significance level leads to the probability of making an error (µ )

Individual Significance

Low p value, low probability of making an error

High confidence that the coefficient is not zero

Compute p value and decide on H0

H1 = B < 0, B > 0 (Select only one based on the relationship)

Stop when you have all significant variables

Validate and implement

The F statistic is equal to the regression mean square (MSR) divided by the error mean square (MSE). Where P = number of explanatory variables in the regression model

F = test statistic from an F distribution with P and n-P-1-1 degrees of freedom.

The decision rule is to reject H₀ at the a level of significance if F > F u_(p,n,-p-1); otherwise do not reject H₀.

Test for overall significance Excel output:

ANOVA
	df	SS	MS	F	Significance F
Regression	6	2228586.427	371431.07	1.0736532	0.42042919
Residual	15	5189260.346	345950.69
Total	21	7417846.773

df Regression = P, the number of explanatory variables

df Total = n of observations - 1

F = MRS/MSE

Significance F = p value The p value is the probability of obtaining a test statistic equal to or more extreme than the result obtained from the sample data. The p value is often referred to as the observed level of significance, the smallest level at which H0 can be rejected for a given data set.

If the p-value is greater than or equal to a , the null hypothesis is not rejected

If the p-value is smaller than a , the null hypothesis is rejected.

R² = SSR/SST - Measures the proportion of variation that is explained by the independent variable X in the regression model.

Regression analysis I used primarily for the purpose of prediction. The goal in regression analysis is the development of a statistical model that can be used to predict values of a dependent variable or response variable based on the values of least one explanatory or independent variable.

Correlation analysis, in contrast to regression, is used to measure the strength of the association between numerical values.

There are four major assumptions of regression:

Normality: requires the values of Y be normally distributed at each value X.
Homosecasticity: requires that the variation around the line of regression be constant for all values of X. This means that Y varies the same amount when X is a low value as when X is a high value.
Independence of errors: requires that the error (residual difference between observed and predicted values of Y) should be independent for each value of X.
Linearity: states that the relationship among variables is linear.