Chapter 3 Linear Regression, Q13
set.seed(1)
x = rnorm(100,mean=0,sd=1)
er = rnorm(100,mean=0,sd=sqrt(0.25))
y = -1+0.5*x+er
length(y)
length(y)=100, Beta0 = -1 , Beta1 = 0.5
plot(y,x)
The relationship between x and y is linear.
lm.model = lm(y~x)
summary(lm.model)
The estimated coefficients are almost equal to the actual values of the coefficients.
plot(x,y)
abline(lm.model,col="red")
abline(-1,0.5,col="blue")
legend(-2,0.5, legend = c("least square fit", "pop. regression"), col=c("red","blue"), lwd=2)
lm.model_quad = lm(y~x+I(x^2))
summary(lm.model_quad)
anova(lm.model,lm.model_quad)
set.seed(1)
x2 = rnorm(100,mean=0,sd=1)
er2 = rnorm(100,mean=0,sd=sqrt(0.1))
y2 = -1+0.5*x2+er2
plot(x2,y2)
lm.model2 = lm(y2~x2)
summary(lm.model2)
The residual standard error has decreased and the R-squared value has increased.
plot(x2,y2)
abline(lm.model2, col="red")
abline(-1,0.5, col="blue")
legend(-2,0.0, legend=c("least square fit","pop. regression"), col=c("red","blue"),lwd=2)
lm.model2_quad = lm(y2~x2+I(x2^2))
summary(lm.model2_quad)
anova(lm.model2,lm.model2_quad)
According to the anova test the p-value of the f-statistic is larger than 5%. Hence the null hypothesis stands true that both models fit the data equally well.
set.seed(1)
x3 = rnorm(100,mean=0,sd=1)
er3 = rnorm(100,mean=0,sd=sqrt(0.6))
y3 = -1+0.5*x3+er3
lm.model3 = lm(y3~x3)
summary(lm.model3)
plot(x3,y3)
abline(lm.model3, col="red")
abline(-1,0.5, col="blue")
legend(-2,0.0, legend=c("least square fit","pop. regression"), col=c("red","blue"),lwd=2)
The residual standard error has increased to 0.75 and the R-squared value has fallen to 0.27
lm.model3_quad = lm(y3~x3+I(x3^2))
summary(lm.model3_quad)
anova(lm.model3,lm.model3_quad)
#Original dataset
confint(lm.model)
#Less noisy dataset
confint(lm.model2)
#Noiser dataset
confint(lm.model3)
The confidence intervals of Beta0 and Beta1 are the narrowest in the less noisy dataset and the widest in the noiser dataset.