Chapter 8 Tree Based Methods - Question 9

In [31]:
library(ISLR)
In [32]:
summary(OJ)
 Purchase WeekofPurchase     StoreID        PriceCH         PriceMM     
 CH:653   Min.   :227.0   Min.   :1.00   Min.   :1.690   Min.   :1.690  
 MM:417   1st Qu.:240.0   1st Qu.:2.00   1st Qu.:1.790   1st Qu.:1.990  
          Median :257.0   Median :3.00   Median :1.860   Median :2.090  
          Mean   :254.4   Mean   :3.96   Mean   :1.867   Mean   :2.085  
          3rd Qu.:268.0   3rd Qu.:7.00   3rd Qu.:1.990   3rd Qu.:2.180  
          Max.   :278.0   Max.   :7.00   Max.   :2.090   Max.   :2.290  
     DiscCH            DiscMM         SpecialCH        SpecialMM     
 Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.00000   Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.05186   Mean   :0.1234   Mean   :0.1477   Mean   :0.1617  
 3rd Qu.:0.00000   3rd Qu.:0.2300   3rd Qu.:0.0000   3rd Qu.:0.0000  
 Max.   :0.50000   Max.   :0.8000   Max.   :1.0000   Max.   :1.0000  
    LoyalCH          SalePriceMM     SalePriceCH      PriceDiff       Store7   
 Min.   :0.000011   Min.   :1.190   Min.   :1.390   Min.   :-0.6700   No :714  
 1st Qu.:0.325257   1st Qu.:1.690   1st Qu.:1.750   1st Qu.: 0.0000   Yes:356  
 Median :0.600000   Median :2.090   Median :1.860   Median : 0.2300            
 Mean   :0.565782   Mean   :1.962   Mean   :1.816   Mean   : 0.1465            
 3rd Qu.:0.850873   3rd Qu.:2.130   3rd Qu.:1.890   3rd Qu.: 0.3200            
 Max.   :0.999947   Max.   :2.290   Max.   :2.090   Max.   : 0.6400            
   PctDiscMM        PctDiscCH       ListPriceDiff       STORE      
 Min.   :0.0000   Min.   :0.00000   Min.   :0.000   Min.   :0.000  
 1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.140   1st Qu.:0.000  
 Median :0.0000   Median :0.00000   Median :0.240   Median :2.000  
 Mean   :0.0593   Mean   :0.02731   Mean   :0.218   Mean   :1.631  
 3rd Qu.:0.1127   3rd Qu.:0.00000   3rd Qu.:0.300   3rd Qu.:3.000  
 Max.   :0.4020   Max.   :0.25269   Max.   :0.440   Max.   :4.000  

a

In [33]:
set.seed(1)
train = sample(1:nrow(OJ),800)

b

In [34]:
library(tree)
oj.model = tree(Purchase~.,data=OJ,subset=train)
summary(oj.model)
Classification tree:
tree(formula = Purchase ~ ., data = OJ, subset = train)
Variables actually used in tree construction:
[1] "LoyalCH"       "PriceDiff"     "SpecialCH"     "ListPriceDiff"
Number of terminal nodes:  8 
Residual mean deviance:  0.7305 = 578.6 / 792 
Misclassification error rate: 0.165 = 132 / 800 

The tree uses the predictors LoyalCH, PriceDiff, SpecialCh and ListPriceDiff. It has 8 terminal nodes and its training error rate is 16.5%

c

In [35]:
oj.model
node), split, n, deviance, yval, (yprob)
      * denotes terminal node

 1) root 800 1064.00 CH ( 0.61750 0.38250 )  
   2) LoyalCH < 0.508643 350  409.30 MM ( 0.27143 0.72857 )  
     4) LoyalCH < 0.264232 166  122.10 MM ( 0.12048 0.87952 )  
       8) LoyalCH < 0.0356415 57   10.07 MM ( 0.01754 0.98246 ) *
       9) LoyalCH > 0.0356415 109  100.90 MM ( 0.17431 0.82569 ) *
     5) LoyalCH > 0.264232 184  248.80 MM ( 0.40761 0.59239 )  
      10) PriceDiff < 0.195 83   91.66 MM ( 0.24096 0.75904 )  
        20) SpecialCH < 0.5 70   60.89 MM ( 0.15714 0.84286 ) *
        21) SpecialCH > 0.5 13   16.05 CH ( 0.69231 0.30769 ) *
      11) PriceDiff > 0.195 101  139.20 CH ( 0.54455 0.45545 ) *
   3) LoyalCH > 0.508643 450  318.10 CH ( 0.88667 0.11333 )  
     6) LoyalCH < 0.764572 172  188.90 CH ( 0.76163 0.23837 )  
      12) ListPriceDiff < 0.235 70   95.61 CH ( 0.57143 0.42857 ) *
      13) ListPriceDiff > 0.235 102   69.76 CH ( 0.89216 0.10784 ) *
     7) LoyalCH > 0.764572 278   86.14 CH ( 0.96403 0.03597 ) *

The node number 20 is a terminal node which is indicated by asterisk symbol * . It uses the predictor, "SpecialCH" for splitting with a splitting value of 0.5. There are 70 samples in the region where SpecialCH < 0.5. Its deviance is 60.89 and the prediction of the tree for all the samples in this region is MM. The portion of samples in this region that actually belongs to MM is 0.84 and the portion of samples belonging to CH is 0.15.

d

In [36]:
plot(oj.model)
text(oj.model)

At the root node the variable, "LoyalCh" is used for splitting the variable space into two regions. This indicates that "LoyalCH" is the most important variable in the model. Its splitting value is 0.5. Which means that the samples that have a "LoyalCH" value less than 0.5 belong to the left branch of the tree and the rest belong to the right branch. These two regions are further divided into smaller parts with certian splitting values as shown in the figure above. At the bottom of the tree there are 8 terminal nodes. The terminal nodes predict the class of the samples.

e

In [37]:
yhat = predict(oj.model,newdata=OJ[-train,],type="class")
table(yhat,OJ$Purchase[-train])
    
yhat  CH  MM
  CH 147  49
  MM  12  62
In [41]:
mean((yhat!=OJ$Purchase[-train]))*100
22.5925925925926

The test error rate is 22.5%

f

In [45]:
cv.oj.model = cv.tree(oj.model)
cv.oj.model
$size
[1] 8 7 6 5 4 3 2 1

$dev
[1]  682.0334  672.8953  669.5341  673.7752  677.2774  745.9670  747.9226
[8] 1068.4603

$k
[1]      -Inf  11.20965  14.72877  17.88334  23.55203  38.37537  43.02529
[8] 337.08200

$method
[1] "deviance"

attr(,"class")
[1] "prune"         "tree.sequence"

The tree with 6 terminal nodes gives the lowest deviance and hence is the best model in this case.

g

In [49]:
plot(cv.oj.model$size,cv.oj.model$dev,xlab="size",ylab="deviance",type="b")

h

According to the graph the tree with 6 terminal nodes gives the lowest cross-validated classification error rate.

i

In [51]:
pruned.oj.model = prune.tree(oj.model,best=6)
plot(pruned.oj.model)
text(pruned.oj.model)

j

In [52]:
summary(pruned.oj.model)
Classification tree:
snip.tree(tree = oj.model, nodes = c(4L, 10L))
Variables actually used in tree construction:
[1] "LoyalCH"       "PriceDiff"     "ListPriceDiff"
Number of terminal nodes:  6 
Residual mean deviance:  0.7614 = 604.5 / 794 
Misclassification error rate: 0.1713 = 137 / 800 
In [53]:
summary(oj.model)
Classification tree:
tree(formula = Purchase ~ ., data = OJ, subset = train)
Variables actually used in tree construction:
[1] "LoyalCH"       "PriceDiff"     "SpecialCH"     "ListPriceDiff"
Number of terminal nodes:  8 
Residual mean deviance:  0.7305 = 578.6 / 792 
Misclassification error rate: 0.165 = 132 / 800 

The training error rate of the pruned tree is 0.17 which is slightly higher than that of the unpruned tree.

k

In [56]:
#test error rate of unpruned tree
yhat = predict(oj.model,newdata=OJ[-train,],type="class")
mean(yhat!=OJ$Purchase[-train])*100
22.5925925925926
In [55]:
#test error rate of pruned tree
yhat = predict(pruned.oj.model,newdata=OJ[-train,],type="class")
mean(yhat!=OJ$Purchase[-train])*100
21.8518518518519

The pruned tree gives a lower test error rate.