Chapter 3 Linear Regression, Q9
Auto = read.csv("../../datasets/Auto.csv",header=T,na.strings="?")
Auto = na.omit(Auto)
pairs(Auto)
auto_subset = subset(Auto, select=-name)
cor(auto_subset)
lm.model = lm(mpg~.-name, data=Auto)
summary(lm.model)
The F-statistic which is a lot larger than 1 and its very small p-value disqualify the null hypothesis that all the predictors are equal to zero. Hence the alternate hypothesis, that there is a relationship between one of the predictors and the response, stands true.
weight, year, origin and displacement have statistically significant relationship to the response. Because their p-values are very small.
The coefficient of the year variable is 0.750773. We can conclude that on average the fuel efficiency of the cars increases by 7.5 mpg in every 10 years.
par(mfrow=c(2,2))
plot(lm.model)
lm.model_interaction = lm(mpg~.+cylinders*displacement+cylinders:year-name,data=Auto)
summary(lm.model_interaction)
Both interactions between cylinders and displacement, and cylinders and year appear to be statistically significant as their p-values are very small.
lm.model_logY_transform = lm(log(mpg)~.+cylinders*displacement-name,data=Auto)
summary(lm.model_logY_transform)
par(mfrow=c(2,2))
plot(lm.model_logY_transform)
The Residuals vs Fitted values graph indicates that Log transforming f(x) values (mpg) has almost removed the nonlinearity between the predictors and response.
lm.model_sqrtY_transform = lm(sqrt(mpg)~.+cylinders*displacement-name,data=Auto)
summary(lm.model_logY_transform)
par(mfrow=c(2,2))
plot(lm.model_sqrtY_transform)
Both log transformation and taking square root of the f(x) values (mpg) have not helped remove the outliers. But these transformations have just decreased the nonlinearity between the predictors and the response.
lm.model_Xsq = lm(mpg~horsepower+I(horsepower^2),data=Auto)
summary(lm.model_Xsq)
Both the horsepower and horsepower^2 variables are statistically significant as their p-values are very small.