Chapter 8 Tree Based Methods - Question 11

library(ISLR)

caravan_dataset = Caravan
p = rep(0,nrow(caravan_dataset))
p[caravan_dataset$Purchase=="Yes"]=1
caravan_dataset$Purchase = p

a¶

set.seed(1)
train = 1:1000

b¶

library(gbm)
boost.model = gbm(Purchase~.,data=caravan_dataset[train,],shrinkage=0.01,n.trees=1000,distribution="bernoulli")

Warning message in gbm.fit(x, y, offset = offset, distribution = distribution, w = w, :
“variable 50: PVRAAUT has no variation.”Warning message in gbm.fit(x, y, offset = offset, distribution = distribution, w = w, :
“variable 71: AVRAAUT has no variation.”

summary(boost.model)

The most important variables are PPERSAUT, MKOOPKLA and MBERMIDD.

c¶

Boosting¶

yhat = predict(boost.model,newdata=caravan_dataset[-train,],n.trees=1000,type="response")
purchase.pred = rep("No",length(yhat))
purchase.pred[yhat>0.2]="Yes"
table(purchase.pred,Caravan$Purchase[-train])

             
purchase.pred   No  Yes
          No  4410  256
          Yes  123   33

#Fraction of people predicted to make a purchase that do in fact make one. (Fraction of True Positives)
33/(123+33)

knn¶

#knn
library(class)
std.x = scale(Caravan[,-86])
train.x = std.x[train,]
train.y = Caravan[train,86]
test.x = std.x[-train,]

set.seed(1)
knn.pred = knn(train.x,test.x,train.y,k=5)
table(knn.pred,Caravan$Purchase[-train])

        
knn.pred   No  Yes
     No  4506  279
     Yes   27   10

#Fraction of people predicted to make a purchase that do in fact make one. (Fraction of True Positives)
10/(27+10)

Logistic Regression¶

#Logistic Regression
glm.model = glm(Purchase~.,data=Caravan,family=binomial,subset=train)

Warning message:
“glm.fit: fitted probabilities numerically 0 or 1 occurred”

yhat = predict(glm.model,newdata=Caravan[-train,],type="response")
glm.pred = rep("No",length(yhat))
glm.pred[yhat>=0.5] = "Yes"
table(glm.pred,Caravan$Purchase[-train])

Warning message in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
“prediction from a rank-deficient fit may be misleading”

        
glm.pred   No  Yes
     No  4446  274
     Yes   87   15

#Fraction of people predicted to make a purchase that do in fact make one. (Fraction of True Positives)
15/(15+87)

Results¶

Boosting produces 21% True Positives.

KNN produces 27% True Positives.

Logistic Regression produces 14% True Positives.

	var	rel.inf
PPERSAUT	PPERSAUT	14.6350478
MKOOPKLA	MKOOPKLA	9.4709165
MOPLHOOG	MOPLHOOG	7.3145742
MBERMIDD	MBERMIDD	6.0865197
PBRAND	PBRAND	4.6676612
MGODGE	MGODGE	4.4946326
ABRAND	ABRAND	4.3242776
MINK3045	MINK3045	4.1759062
MOSTYPE	MOSTYPE	2.8640258
PWAPART	PWAPART	2.7819107
MAUT1	MAUT1	2.6192915
MBERARBG	MBERARBG	2.1048051
MSKA	MSKA	2.1018515
MAUT2	MAUT2	2.0217251
MSKC	MSKC	1.9868434
MINKGEM	MINKGEM	1.9212271
MGODPR	MGODPR	1.9177754
MBERHOOG	MBERHOOG	1.8071062
MGODOV	MGODOV	1.7869391
PBYSTAND	PBYSTAND	1.5727959
MSKB1	MSKB1	1.4355140
MFWEKIND	MFWEKIND	1.3726426
MRELGE	MRELGE	1.2080518
MOPLMIDD	MOPLMIDD	0.9379197
MINK7512	MINK7512	0.9259072
MINK4575	MINK4575	0.9174599
MGODRK	MGODRK	0.9076554
MFGEKIND	MFGEKIND	0.8574537
MZPART	MZPART	0.8253107
MRELOV	MRELOV	0.8073125
⋮	⋮	⋮
PAANHANG	PAANHANG	0
PTRACTOR	PTRACTOR	0
PWERKT	PWERKT	0
PBROM	PBROM	0
PPERSONG	PPERSONG	0
PGEZONG	PGEZONG	0
PWAOREG	PWAOREG	0
PZEILPL	PZEILPL	0
PPLEZIER	PPLEZIER	0
PFIETS	PFIETS	0
PINBOED	PINBOED	0
AWAPART	AWAPART	0
AWABEDR	AWABEDR	0
AWALAND	AWALAND	0
ABESAUT	ABESAUT	0
AMOTSCO	AMOTSCO	0
AVRAAUT	AVRAAUT	0
AAANHANG	AAANHANG	0
ATRACTOR	ATRACTOR	0
AWERKT	AWERKT	0
ABROM	ABROM	0
ALEVEN	ALEVEN	0
APERSONG	APERSONG	0
AGEZONG	AGEZONG	0
AWAOREG	AWAOREG	0
AZEILPL	AZEILPL	0
APLEZIER	APLEZIER	0
AFIETS	AFIETS	0
AINBOED	AINBOED	0
ABYSTAND	ABYSTAND	0