Behroz Ahmad Ali
bhrz.ali@gmail.com
Compressive strength is the resistance of a material or structure to breaking under compressive forces. The compressive strength of concrete determines its performance during its service condition. Therefore the study of the concrete compressive strength is of immense importance.
The charactersitics of concrete are dependent on the types of ingredients used and their proportions. The main constituents of concrete are cement, water and aggregates with varying proportions, but usually some other materials are also included in the mix to get the required compressive strength and properties.
As the concrete dries up and hardens over time, its compressive strength increases. The required compressive strength of concrete can vary from 17 MPa for residential purposes up to 70 MPa for some commerical applications.
The dataset used here has the concentrations of the constituents of concrete and its compressive strength after some number of days. The original owner of dataset is Prof. I-Cheng Yeh, Department of Information Management, Chung-Hua University, Taiwan. This dataset is freely available in UCI repository.
The following information has been taken from UCI repository. https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
The information given in the dataset:
Cement (component 1) -- quantitative -- kg in a m3 mixture -- Input Variable
Blast Furnace Slag (component 2) -- quantitative -- kg in a m3 mixture -- Input Variable
Fly Ash (component 3) -- quantitative -- kg in a m3 mixture -- Input Variable
Water (component 4) -- quantitative -- kg in a m3 mixture -- Input Variable
Superplasticizer (component 5) -- quantitative -- kg in a m3 mixture -- Input Variable
Coarse Aggregate (component 6) -- quantitative -- kg in a m3 mixture -- Input Variable
Fine Aggregate (component 7) -- quantitative -- kg in a m3 mixture -- Input Variable
Age -- quantitative -- Day (1~365) -- Input Variable
Concrete compressive strength -- quantitative -- MPa -- Output Variable
Here we will create a linear regression model that can predict the compressive strength of concrete based on its ingredients and age.
#Read the data
concrete1 = read.csv("Concrete_Data.csv",header=T,na.strings="?")
#Remove rows with missing values if any.
concrete1 = na.omit(concrete1)
#Make the names of the headings shorter
concrete = concrete1
names(concrete) = c('cement_component','blast_furnace_slag','fly_ash','water','superplasticizer',
'coarse_aggregate','fine_aggregate','age','compressive_strength')
#Creating a test data
set.seed(5)
test = sample(nrow(concrete),200)
#Creating a Linear Regression model using all the attributes (Input Variable).
lm.model = lm(compressive_strength~.,data=concrete,subset=-test)
summary(lm.model)
#Mean Squared Error of the lm.model on the test data
predict_strength = predict(lm.model,concrete[test,])
mean((concrete[test,]$compressive_strength-predict_strength)^2)
As we can see the p-values of the t-statistic of all the coefficients are close to zero except for that of coarse_aggregate and fine_aggregate. This indicates that other than the coefficients of coarse_aggregate and fine_aggregate, all the other coefficients of the model are significant for the accuracy of the fit.
#Removing the coarse_aggregate and fine_aggregate attributes from the model
lm.model2 = lm(compressive_strength~.-coarse_aggregate-fine_aggregate,data=concrete,subset=-test)
summary(lm.model2)
#Mean Squared Error of the lm.model2 on test data
predict_strength2 = predict(lm.model2,concrete[test,])
mean((concrete[test,]$compressive_strength-predict_strength2)^2)
Removal of "coarse_aggregate" and "fine_aggregate" attributes doesn't improve the R-squared value of the model and the mean squared error of the model on the test data doesn't decrease. Therefore we will go ahead with the first model, lm.model.
#Residual Standard Error of lm.model
sqrt(deviance(lm.model)/lm.model$df.residual)
#Mean response of the dataset
mean(concrete[-test,]$compressive_strength)
The residual standard error of lm.model is around 10.62 and the mean response is about 36.00. The error rate is as following.
#Error rate
10.62/36*100
summary(lm.model)$r.squared
The R-squared value of lm.model is 0.61. This indicates that 61% of the variability in the response has been explained by the model.
cor(concrete[test,]$compressive_strength,predict(lm.model,concrete[test,]))
There is a correlation of approximately 0.8 between the response in the test data and the output of lm.model. This is a reasonable amount of correlation. Which shows that our model fits the test data very well.
#Mean squared error of lm.model on training data
mean((concrete[-test,]$compressive_strength-predict(lm.model,concrete[-test,]))^2)
#Mean squared error of lm.model on test data
mean((concrete[test,]$compressive_strength-predict(lm.model,concrete[test,]))^2)
new_data = data.frame(cement_component=310,
blast_furnace_slag=0,
fly_ash=0,
water=150,
superplasticizer=5,
coarse_aggregate=1047,
fine_aggregate=676,
age=28)
predict(lm.model,new_data,interval="confidence")