Predicting Thermal Performance of Different Roof Systems by Using Decision Tree Method

A. U. Weerasuriya Abstract: This paper describes the use of decision tree method to predict thermal performance of several roof systems under different climate conditions. The decision tree method is a data mining technique which has competitive advantages over other methods such as simple and clear procedure, easy to understand without having rigorous mathematical and computational knowledge, etc. Results of 80 energy simulation cases were used to demonstrate the applicability of this method in building energy simulation. These 80 simulation cases are based on five locations in five different climate zones, eight different roof systems, and two extreme climate conditions; warmest and coldest in a year of a particular location. The modelled decision tree has prediction accuracy of 84% on training data and 100% on test data. Addition to that, decision tree automatically ranked the best selection of roof system under prevailing climate conditions. The predicted values shown in each classified data subsets can be used as a reference with an accuracy of 6%to predict the indoor room temperature with the use of a particular roof system. Finally, derived decision rules and simplified guidelines from constructed decision tree are also provided in a tabular format for non-engineer users.


Introduction
The significant percentage of total energy consumption of a building is used to restore the acceptable occupant thermal comfort level [1,2]. The direct and indirect heat transfer of building components are the main factors affecting the occupant thermal comfort by increasing the indoor temperature. Among other common building components, roof itself generates significant heat loading due to its vast surface area and the orientation which is directly facing to the sky. Therefore a designer can maintain acceptable indoor thermal environment for occupants by selecting a suitable roof system and carefully controlling its thermal properties. Energy simulation techniques have been widely used to assess the thermal performance of the whole or part of a building [3,4]. However, its accuracy on predicting energy demand of an occupied building is lower than that for an onoccupied building due to the uncertainty of latent heat generation from human bodies and electrical appliances. There are several drawbacks of energy simulation methods such as steep learning curve to operate software, the necessity to perform separate simulation for every case-study, more suited to evaluate designed buildings rather than those in early design stage, use of simplified methods and limited numbers of factors considered for analysis. Therefore, other techniques which are capable of overcoming these shortcomings have been adapted to model building energy demand. The traditional regression analysis method and Artificial Neural Network (ANN) method are two of the most popular techniques successfully used by researchers in the past [3]. The simple and efficient regression analysis method is based on statistical analysis and regression equation which is able to combine effect of various climate variables with building physics in order to predict building energy demand [5,6,7]. However, complicated nature of regression equation demands that the user has a good mathematical background. The structure of ANN is similar to biological neural networks. It is able to build complex relationships with different factors in the building energy simulation process and thus, get more accurate output [8,9,10,11,12]. Nevertheless an ANN model cannot be understood and interpreted easily as it operates as a -black box‖ within the analysis process. Decision tree method is one of the data mining techniques that has been used in scientific and medical fields [13, 14, 15. 16, 17] to make decisions based on the consideration of several inputs simultaneously. The flow chart like structure allows understanding and interpreting the analysis easily even for a user without specific mathematical knowledge.
However, the use of the decision tree method in building energy simulation is very sparse. Yu et al. [18] demonstrated the use of decision tree method in building energy simulation in detail analysis by predicting energy use intensity of houses in Japan.Tso and Yau [19] compared the accuracy of the decision tree method with regression analysis and ANN method and found that its accuracy was almost the same as other two methods.

Climate data and building physical data
Both input data sets of climate and building physical properties should include wide range of data in order to construct a successful decision tree. The following methodology was adopted to collect necessary input data for decision tree analysis in the absence of field measurement data.  A two-storey house with total floor area of 99 m 2 was selected for the energy simulation modeling. This builiding is similar to the modelled building that was used by Halwathura and Jayasinghe [20]. Sloped and flat roof shapes were used for three basic roof types; concrete flat roof, calicut tile roof and asbestos sheet roof. Some modifications were introduced for these roofs such as a sloped ceiling for asesbestos sheet roof and calicut tile roof, insulate concrete roof, green roof and concrete roof with parapet walls for concrete roof slab. The insulated slab system is similar to model proposed by Halwathura and Jayasinghe [21,22]. The green roof has a 10 cm grass layer on top of the roof as proposed by Dareeju et al [23]. More details about roof systems used for this study are shown in Table 2.  [20]. Two types of data inputs are needed for a DEROB simulation, one data set for climate data and other about the building model. The orientation of building is north-south direction and all windows are also in north and south directions only. The 225 mm thick cement plastered brick walls are at the perimeter and 115 mm plastered brick walls used as internal walls. Floor is made with 75 mm thick concrete with a tile paved surface. First floor slab is 125mm thick and at the bottom side there is 15 mm thick soffit plaster and and upside is paved with ceramic tiles. There is a balcony at first floor level, which is protected by a shading device. All windows are wooden framed single glazed windows and doors are timber panelled type. There are altogether 80 simulation cases ( 8 roof types x 5 locations x 2 months) used to build the decision tree. In every case, the indoor temperature of the upper floor volumes were extracted because those volumes are directly under influence of roof system rather than spaces in ground floor level.

Analysis of monthly average temperature
Outdoor air temperature is one of the main factors influencing occupant comfort level. The amount of variation of the outdoor temperature from the neural temperature would be a better measurement to determine required level of thermal performance of a roof system. Figure 1 shows the boxplot graph for montly average out door air temperature for the selected locations. According to the Figure 1, Colombo has minimum temperature variation, which is minimum and maximum monthly temperature values are close to the annual average temperature. For other four locations larger deviations can be observed among average and highest and lowest temperatures. Thus the selection of two extreme temperature cases for this study can be justified. The annual average temperature is above 9 o C for all five locations and that value is close to 20 o C except for Chicago city. Only Chicago has the lowest temperature below the freezing point. DEROB-LTH was used as the energy simualation software for this study. It was used by previous researchers [20, 21,23] and accuracy was evaluated with field measurements [20]. Two types of data inputs are needed for a DEROB simulation, one data set for climate data and other about the building model. The orientation of building is north-south direction and all windows are also in north and south directions only. The 225 mm thick cement plastered brick walls are at the perimeter and 115 mm plastered brick walls used as internal walls. Floor is made with 75 mm thick concrete with a tile paved surface. First floor slab is 125mm thick and at the bottom side there is 15 mm thick soffit plaster and and upside is paved with ceramic tiles. There is a balcony at first floor level, which is protected by a shading device. All windows are wooden framed single glazed windows and doors are timber panelled type. There are altogether 80 simulation cases ( 8 roof types x 5 locations x 2 months) used to build the decision tree. In every case, the indoor temperature of the upper floor volumes were extracted because those volumes are directly under influence of roof system rather than spaces in ground floor level.

Analysis of monthly average temperature
Outdoor air temperature is one of the main factors influencing occupant comfort level. The amount of variation of the outdoor temperature from the neural temperature would be a better measurement to determine required level of thermal performance of a roof system. Figure 1 shows the boxplot graph for montly average out door air temperature for the selected locations. According to the Figure 1, Colombo has minimum temperature variation, which is minimum and maximum monthly temperature values are close to the annual average temperature. For other four locations larger deviations can be observed among average and highest and lowest temperatures. Thus the selection of two extreme temperature cases for this study can be justified. The annual average temperature is above 9 o C for all five locations and that value is close to 20 o C except for Chicago city. Only Chicago has the lowest temperature below the freezing point. DEROB-LTH was used as the energy simualation software for this study. It was used by previous researchers [20, 21,23] and accuracy was evaluated with field measurements [20]. Two types of data inputs are needed for a DEROB simulation, one data set for climate data and other about the building model. The orientation of building is north-south direction and all windows are also in north and south directions only. The 225 mm thick cement plastered brick walls are at the perimeter and 115 mm plastered brick walls used as internal walls. Floor is made with 75 mm thick concrete with a tile paved surface. First floor slab is 125mm thick and at the bottom side there is 15 mm thick soffit plaster and and upside is paved with ceramic tiles. There is a balcony at first floor level, which is protected by a shading device. All windows are wooden framed single glazed windows and doors are timber panelled type. There are altogether 80 simulation cases ( 8 roof types x 5 locations x 2 months) used to build the decision tree. In every case, the indoor temperature of the upper floor volumes were extracted because those volumes are directly under influence of roof system rather than spaces in ground floor level.

Analysis of monthly average temperature
Outdoor air temperature is one of the main factors influencing occupant comfort level. The amount of variation of the outdoor temperature from the neural temperature would be a better measurement to determine required level of thermal performance of a roof system. Figure 1 shows the boxplot graph for montly average out door air temperature for the selected locations. According to the Figure 1, Colombo has minimum temperature variation, which is minimum and maximum monthly temperature values are close to the annual average temperature. For other four locations larger deviations can be observed among average and highest and lowest temperatures. Thus the selection of two extreme temperature cases for this study can be justified. The annual average temperature is above 9 o C for all five locations and that value is close to 20 o C except for Chicago city. Only Chicago has the lowest temperature below the freezing point. Selection of attributes for the decision tree. There are several climate and building physical factors affecting thermal performance of a roof. Some of these factors are numerical attributes such as tempeature, humidity and some of them are categorical attributes such as shape of the roof, roof covering material, etc. it is necessary to convert numerical attributes to categorical attributes to obtain a more accurate decision tree. For the simplicity, only binary categorical attributes were used for this study for an example temperature is simplified in to two categorical attributes-HIGH‖temperature or -LOW‖ temperature. The annual average values of numerical attributes were used for the binary separation of those attributes The attributes used for constructing decision tree are listed in Table 3. It is necessary to have even distribution of categorical variables in each location to build an unbiased decision tree model. According to the Figure 2, the categorical distribution at each location has fairly even distribution, that percentage is varying between 25% to 47%. In order to demonstrate the thermal performance of a roof system, normalised average indoor temperature was used as the prediction attribute in the decision tree. This value is calculated as Equation (5). The ‗high difference' is defined as the normalised average indoor temperature value exceeds1.04 or 0.96. The advantage of this parameter is that it is directly combined with the outdoor temperature and thus easy to understand even by non-engineering user. It is also more convenient to use in heating/cooling load calculations as it enables the determination of indoor temperature implicitly.  each location has fairly even distribution, that such as tempeature, humidity and some of each location has fairly even distribution, that such as tempeature, humidity and some of each location has fairly even distribution, that percentage is varying between 25% to 47%. In percentage is varying between 25% to 47%. In order to demonstrate the thermal performance of a roof system, normalised average indoor temperature was used as the prediction attribute in the decision tree. This value is calculated as in the decision tree. This value is calculated as Equation (5). The 'high difference' is defined in the Equation (5). The 'high difference' is defined in the decision tree. This value is calculated as Equation (5). The 'high difference' is defined decision tree. This value is calculated as Equation (5). The 'high difference' is defined as the normalised average indoor temperature value exceeds1.04 or 0.96.

Table 3 -Attributes used for the decision tree
The advantage of this parameter is that it is directly combined with the outdoor temperature directly combined with the outdoor temperature and thus easy to understand even by nonengineering user. It is also more convenient to ring user. It is also more engineering user. It is also more convenient to ring user. It is also more engineering user. It is also more convenient to use in heating/cooling load calculations as it enables the determination of indoor temperature implicitly.

4.2.
Selection of attributes for the decision tree. There are several climate and building physical factors affecting thermal performance of a roof. Some of these factors are numerical attributes such as tempeature, humidity and some of them are categorical attributes such as shape of the roof, roof covering material, etc. it is necessary to convert numerical attributes to categorical attributes to obtain a more accurate decision tree. For the simplicity, only binary categorical attributes were used for this study for an example temperature is simplified in to two categorical attributes"HIGH"temperature or "LOW" temperature. The annual average values of numerical attributes were used for the binary separation of those attributes The attributes used for constructing decision tree are listed in Table 3. It is necessary to have even distribution of categorical variables in each location to build an unbiased decision tree model. According to the Figure 2, the categorical distribution at 5. The decision tree

Generation of decision tree
The steps of constructing a decision tree can be shown as Figure 3. There are two stages of the procedure named, learning and classification. In learning stage, first divide the whole data set in to two subsets called training and test data sets. In this study total 80 data sets were divided into two subsets such as 75 data sets for training and 5 data sets for test. The decision tree is generated and its accuracy is calculated by analysing the training data set. In the classification stage, if the accuracy of the decision tree is acceptable it can be used for future projects. If the accuracy is not adequate, then it is necessary to identify the reasons and fix them and regenerate a new decision tree.

Figure 3 -Flow chart of making a decision tree (Yu et al. [18])
At each node it is necessary to calculate entropy of parent and children data sets, information gain, split information and gain ratio for selecting the split attribute. The same calculation prodcedure should be repeated at each node until one of the following criteria is met 1. All records in a partition share the same target class value. 2. There are no remaining predictor attributes that can be used to further split a partition. 3. There are no more records for a particular value of a predictor variable. Thus, it is a time consuming repetitive process. Therefore, an open source data mining software WEKA was used for this study. WEKA was originally developed by University Waikato, New Zealand and previously used by Yu et al[18] for a similar study. There are different decision tree algorithms within the WEKA. J48 algorithm was selected for this study by using trial and error method, which gaves the highest accuracy for training data set. The generated decision tree is shown in Figure 4. The generated decision tree has four levels and 15 nodes. Each node represents either a split test or a decision rule. The Root node and internal nodes show details of split test such as number of data sets and split attribute. Leaf nodes express the decision rules. However, leaf nodes with entropy value 0 are labled as LEAF and otherwise named as STOP. STOP nodes are resulted when there is no significant effects that can be obeserved on information gain ratio in further candidate spliting tests. In both LEAF and STOP nodes, there are information about number of data, calssification result, predicted normalised indoor temperature (NIT), and the lable LEAF or STOP. More details about nodes are shown in Figure 5. The WEKA analysis report shows some information regarding accuracy of the constructed decision tree. According to the report that accuracy is 84%. algorithm was selected for this study by using trial and error method, which gaves the highest accuracy for training data set. The generated decision tree is shown in Figure 4. The generated decision tree has four levels and 15 nodes. Each node represents either a split test or a decision rule. The Root node and internal nodes show details of split test such as number of data sets and split attribute. Leaf nodes express the decision rules. However, leaf nodes with entropy value 0 are labled as LEAF and otherwise named as STOP. STOP nodes are resulted when there is no significant effects that can be obeserved on information gain ratio in further candidate spliting tests. In both LEAF and STOP nodes, there are information about number of data, calssification result, predicted normalised indoor temperature (NIT), and the lable LEAF or STOP. More details about nodes are shown in Figure 5. The WEKA analysis report shows some information regarding accuracy of the constructed decision tree. According to the report that accuracy is 84%. Though this 5 data sets for test. The decision tree is generated and its accuracy is calculated by analysing the training data set. In the classification stage, if the accuracy of the decision tree is acceptable it can be used for future projects. If the accuracy is not adequate, then it is necessary to identify the reasons and fix them and regenerate a new decision tree. At each node it is necessary to calculate entropy of parent and children data sets, information gain, split information and gain ratio for selecting the split attribute. The same calculation prodcedure should be repeated at each node until one of the following criteria is met 1. All records in a partition share the same target class value. 2. There are no remaining predictor attributes that can be used to further split a partition. 3. There are no more records for a particular value of a predictor variable.

5.
The decision tree

Generation of decision tree
The steps of constructing a decision tree can be shown as Figure 3. There are two stages of the procedure named, learning and classification. In learning stage, first divide the whole data set in to two subsets called training and test data sets. In this study total 80 data sets were divided into two subsets such as 75 data sets for training and accuracy is not very high it is acceptable compared to the low number of data sets used as the training set. Information regarding misclassification can be found from the confusion matrix as shown below.
The above matrix implies that 40 LOW NIT cases have 34 correct classifications with 6 missclassification instances. There are 6 misclassification cases in HIGH. NIT cases among total 35 number cases

5.2.
Evaluation of decision tree Before, using the constructed decision tree to predict thermal performance of roof systems in future projects it is necessary to assess its accuracy by using test data sets. In this study, only five data sets were randomly selected as test data sets due to limited number of available data. The five test data sets selected are shown in Table 4 with their properties. The predictions of the decision tree are listed in Table 5 with classification result and predicted NIT value.
The percentage error in predicted value and the actual NIT value is also shown. All five test cases are correctly predicted by the decision tree. This prediction accuracy 100% is higher than accuracy (84%) of decision tree. It is believed that this occurs due to limited numbers of test data sets used for evaluation. However, the maximum percentage error is much lower as 5.83 for the test data set, which indicates the better prediction ability of the decision tree.

Table 5 -Summary of results of evaluation of the decision tree
Another aspect of the decision tree is that each LEAF or STOP node represents a decision rule. The constructed decision tree has 8 LEAF and STOP nodes which can be used to derive 8 different decision rules. For an example, node 6 expresses that if roof is not a green roof and with high humidity level and roof with a ceiling then the normalised indoor temperature is low. All derived decision rules are listed in Table 6. The priority order of selection of different roof system under high (>20 o C) and low (<20 o C) temperature is shown in Table 7. The green roof is the first choice under both climate conditions suggests that it out performed other roof systems in any climate zone. Ceiling is also a better remedy to achieve acceptable indoor temperature in both high and low outdoor temperature conditions. However, insulated roof system only performs well under high out door temperature condition. The shape of the roof and use of parapet walls are only effective under low outdoor temperatures but yet their significance is less compared to use of green roof or installing a ceiling.

Conclusion
The concrete roof with some improvements such as sloped ceiling, green roof, insulated roof slab and roof slab protected with parapet walls. The total 80 numbers of data were divided into two and 75 data sets were used as the training data sets and balance five data sets were used as test data sets. Constructed decision tree has 15 nodes in seven levels. The accuracy of the decision tree is 84% for training data set and 100% for the test data set. There are eight decision rules, which can be derived from the decision tree. However, the accuracy of decision tree is limited by number of data set used for the study. It is also necessary to mention that there are some other parameters needed to consider doing an energy simulation such as accurate ventilation rate, internal latent heat gain from human bodies and electrical appliances which are not considered in this study. More consideration should be paid when interpreting numerical attributes due to results of splitting tests depend on the used threshold values. Thus, threshold values should be selected in a fair and rational manner. Even the decision tree method leads accurate predictions on energy simulation results in this study it is recommended to use field measurements to verify its validity under different prevailing site conditions in future studies. It is also necessary to test applicability of this method to design more energy efficient buildings in different building categories such as commercial, public, apartment buildings, etc.