Chapter 3- Linear Regression, Question 8

In [2]:
Auto= read.csv("../../datasets/Auto.csv",header=T,na.strings="?")
Auto=na.omit(Auto)

(a)

In [3]:
lm.model = lm(mpg~horsepower, data=Auto)
summary(lm.model)
Call:
lm(formula = mpg ~ horsepower, data = Auto)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.5710  -3.2592  -0.3435   2.7630  16.9240 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.935861   0.717499   55.66   <2e-16 ***
horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared:  0.6059,	Adjusted R-squared:  0.6049 
F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

(i)

The large F-statistic and its small p-value shows that there is a strong relationship between predictor and response. We can reject the null hypothesis that the predictor is equal to zero.

(ii)

In [14]:
mean(Auto$mpg)
23.4459183673469

The residual standard error 4.906 and mean response is 23.446 which gives a percentage error of 20.9%. The R-squared value is 0.6059. This indicates that 60.59% of the variability in the response has been explained (removed) by the model.

(iii)

The coefficient of horsepower is negative. The response "mpg" decreases with the increase in "horsepower." Therefore the relationship is negative.

(iv)

In [8]:
predict(lm.model,data.frame(horsepower=98))
1: 24.4670771525124
In [10]:
#95% confidence interval
predict(lm.model,data.frame(horsepower=98),interval="confidence")
fitlwrupr
124.4670823.9730824.96108
In [11]:
#95% prediction interval
predict(lm.model,data.frame(horsepower=98),interval="prediction")
fitlwrupr
124.4670814.8094 34.12476

(b)

In [12]:
plot(Auto$horsepower, Auto$mpg)
abline(lm.model)

(c)

In [13]:
par(mfrow=c(2,2))
plot(lm.model)

Residuals vs Fitted values graph forms a pattern which indicates nonlinearity between the response and predictor.

In [ ]: