Comparison of GIS Computational Methods Using Real Life Spatial Data – Building and Population Density Computation in a Semi Urban Area

Spatial data sets are developed for various purposes, with different software and by resource personnel who have varying skill levels. Though these data sets are often adequate for visualization of spatial variations, they pose significant problems in case of attempts to perform even simple GIS computations. This problem of different data origin, faced by GIS experts when preparing base data sets for modelling, consumes a significant time which is usually not properly appreciated by the managers or decision makers. When such irregular data sets are suitably identified, adjusted, and incorporated appropriately, GIS computations would become a simple routine exercise. The present work is a corporative assessment of the computation of building area density and the population density in a semi urban area of approximately 25km2 near the city of Colombo, Sri Lanka. Population density is a simple computation carried out by identifying the number of people living on land extents which could be occupied for dwelling. Building area density computation too is a straight forward computation executed with the identification of the area occupied by buildings in comparison with the area available for building purposes. The present work in a step by step manner describes the methodology used for each computation, the associated problems, encountered constraints, steps taken to overcome the issues and recommended techniques to carrying out similar works.


Introduction
The purpose of a Geographic information system (GIS) is to provide a spatial framework to support decisions for the rational use of geographically distributed resources, [3]. A geographic data model is an abstraction of the real world that employs a set of data objects which support Map display, query, editing and analysis. With the refinement of the graphics, hardware and mapping software in 1960s and 1970s, the maps were generated with CAD (Computer Aided Design) tools. The GIS software introduction in 1980s commenced a second generation Geographic data model and, this necessitates input data to be of homogeneous collections of Points, Lines and Polygons [12]. In order to model the reality in GIS environments, it is often required to have suitably developed base datasets. In general there are quite a large proportion of data sets already developed on the CAD systems. Many data sets which are developed on GIS platforms also fall into various categories depending on the purpose, software used, type, etc. Purpose of a map varies from Cartographic to detailed analysis. Therefore, though some of these data sets are often adequate for presentations or visualization of spatial variations, they pose significant problems in case of attempts to perform even simple GIS computations.
These problems which arise as a result of different data sources, consume a significant time when preparing quality databases. Usually managers and decision makers, who anticipate quick results, do not appreciate time taken for the preparation of base data sets for GIS computations. Most input datasets have issues pertaining to registration. Data which originates from different projections need to be projected into a common coordinate system for data to appear in the same space [7]. Tomlinson indicates the importance of dataset units, precision, accuracy and standard adhered to etc., in the design of a conceptual database for a relational GIS model. GIS and CAD interoperability plays an important role in the database creation since it is common in CAD not to emphasize importance of closed Polygons. ArcGIS reads CAD files as GIS content enabling copy and pasting or using any number of tools that copy data such as Copy features, Merge or Append, feature class to feature class etc.,' 5 ', users often come across difficulties in data conversions. It is commonly known that when developing a GIS data model, the most time consuming effort is the preprocessing of data. Jeong,Liang and Liang |4] ; repeatedly indicate with an extensive literature survey support, that data are usually maintained in different formats, in disparate systems and hence require significant resources to convert into formats which are useful for scientific purposes. Though there are many instances in literature where such problems and issues have been generally mentioned, a comparative process based study of such difficulties, with an actual dataset is required to critically address the constraints and then evaluate available options.
As such, the present work has carried out a comparative study of commonly known computational methods for calculation of the spatial distribution of population density and the building area density while assessing data issues, methods and time consumption. The study describes a step by step methodology used for each process, and key actions related to the base data layer preparation, map registration, land use and other feature identification through a combination of topographic map data of different scale along with a satellite imagery while arriving at the appropriate building, land cover and administrative boundaries for the computations.

Study Area
The study location of approximately 2369 Ha (25km 2 ) in a semi urban area near the city of Colombo comprises of 22 Grama Niladari administrative units (GND) falling within the two Divisional Secretary administrative units (DSD) called Sri Jayewardenepura Kotte and Kaduwela. The area is bounded by 60 51' 37" and 60 54' 27" North latitudes and 79° 54' 6" and 79° 58' 20" East longitudes. Selection of study area was based on the need to capture various types of land cover, a significant coverage of buildings and availability of roads. As the study objective was to compare methodologies on approximate area of 25km 2 was selected for simplicity. Administrative boundaries were selected to ensure easy reader understandability. Study area is approximately 10 Km South East from Colombo ( Figure 1). The Land cover distribution of the area shows that water bodies and roads of the selected area is approximately 10% and that The other 90% consists of gardens, paddy lands, rubber, coconut, other plantations, grass lands and marsh. The National State Assembly of the Government of Sri Lanka is also located in the Project area.
The population in the entire study area is approximately 92500 [81 . Study spatial extent is a semi-urbanized area in the vicinity of the commercial Capital city of Colombo with an average population density of 4 persons per 1000 m 2 . Road coverage consists of about 2.8 km long main roads of classes A and B. Roads pertaining to the class group containing C, D totals to 26.5 km. Study area has approximately 19500 buildings, averaging to about 4.75 persons per building.

Objective
Objective of the study is to collect spatial data from available sources and carryout computation of the spatial variation of population and building area density pertaining to each Grama Niladari administrative unit in a selected semi urban study area of about 25 km 2 , and to critically evaluate the issues in Vector GIS data preparation and associated computation methods with respect to time and effort.

Methodology
In ecology, population density is defined as the number of individuals of a population per unit area of living space [1 '. Wikipedia [1°) indicates that the population density is a measure referring to the number of people per unit area of land. It has several other detailed definitions such as arithmetic, physiological, agricultural, residential, urban, ecological optimum etc., 191 . Similarly, building density could be defined as the number of units in a given area, but there is a confusion in the base land area calculation.
GIS maps having sufficient details enable the computation of the base land area with ease. This has lead to different types of density computations to carryout effective spatial planning both in urban and rural area. A detailed comparative presentation is given in DCAUL (2003), 2 . Having identified buildable and non buildable lands in the study area, the computations to find spatial distribution of area covered by buildings, and also the spread of population assumed that population density is the number living in buildable lands and building area density is the foot print of buildings on buildable land extents.
A GIS user survey was conducted to identify the common methods that are used to carry out either the above or similar computations on ArcGIS platform when using commonly available spatial data sets. These methods were then used for computations with detailed identification of processes and computational steps. Figure 2 shows the overall workflow diagram depicting the key aspects such as methods identification, base data layer preparation, comparative method usage and evaluation etc. Upon user inquiry, it was revealed that there were three common GIS based methods for building area density  Table 01. Recommended Methods were compared in order to separate the common set of computation methods in to (i) preprocessing base data preparation and (ii) task execution required for the GIS computations. Process flowchart for the component of preprocessing and GIS data set preparation methodology is shown in Figure 3. The same for Building Area Density and Population Density are shown in Figure 4 and Figure 5 respectively. The user identified method of pre-processing for the GIS layer preparation in case of GND shape file was significantly different to that of buildings and buildable area maps. Case Study computations incorporated user identified methodologies along with manual hard copy based methods for both Building Area Density and Population ENGINEER Separate Land use and Buildings as two layers. Prepare GND administrative boundary layer. 3. Define water and roads spatial extents as non-Buildable area, create a separate Shape file. 4. Select each GND from GND layer and corresponding buildings; assign GND name to buildings using select by location. 5. Compute building area of each GND with Dissolve tool. 6. Identify buildable land at each GND. 7. Export attribute file of GND wise building area (of 5 above) to join with GND-wise buildable area; using GND name as common attribute. 8. Carryout attribute table operation for computing, Building area /GND buildable area as Building Area Density.       Computational accuracies from each method were summarized to represent the differences observed during GIS analysis.

Data
Data layers collected and used for the study, their descriptions, types and sources are shown in Table: 03.

(a) Comparative assessment and relative
indicator values for each sub -activity area are shown in Table 02. In the event 6-8 line features of roads were buffered to convert as polygons. A similar operation in event 7-9 consumed 120 minutes because of the missing data of water bodies had to be digitized along with the buffering required for single line representations.
(b) Manual computation results of Building area density and Population density for three selected GND were compared with those computed using GIS. Results are shown in Table 04. Percentage errors computed are graphically shown in Figure 6 and Figure 7. Results are summarized and tabulated in Table 05 and 06.
(c) Building area density and Population density computations for GND in the study area are compared and shown in Table 07. Relative error computations are in Table 08 and Figure 9. This Table also compares the same values computed using the base area as the Gross area of each GND which is the area without reductions for Non-Buildable extents.     2. Computational accuracy indicated that there was a considerable difference between the manual results and the rest of the methods. This is acceptable since the manual area computations included visual approximations and averaging. Therefore, the order of the magnitude of results indicates that the GIS based results are acceptable. There were differences observed between the results of GIS based methods. Method 03 indicates a considerable difference of Building area density when compared with Method 02, though there is only a marginal difference when compared with the Standard method (Table 05, 07 and Figure 06). A detailed scrutiny of the database revealed that the source data used to extract building polygons consisted of multiple identical Polygons representing the same feature. This may have occurred at the cartographic data layer preparation where the concerns of feature attribute accounting does not get included as an objective. Also there were instances which had Polygons encompassing smaller Polygons which is a common feature in cartography. These created a multiplication of error in polygon area computations after overlay operations. Each overlay operation significantly multiplied the number of Polygons and hence the area. Though not quite similar and not mentioned in detail, merging errors and cartographic errors have been sited 6 as errors that compound due to inherent problems of cartography.

Figure 7: Computational Population -Density Error in Computation with Manual
In this study it was noted that such data issues are extremely difficult to trace and especially so when working with large datasets. Therefore, due to reasons which are common when using CAD datasets, the Method 02 was indicating results with a difference. The Method 03 which used the Symmetrical Difference technique of ArcGIS did not encounter this problem since it dealt with computing the inverse of Polygon area. Two examples are shown in Figure 8 when the graphic interface and the attribute table extracts attempt to show the dataset concerns described above.
During Sample GND computations, errors of different magnitudes could be observed. This is due to the varying number of feature details encompassed by each GND. 3.0 Comparison with manual value also indicate that the comparative errors are in the range of 0-12% which shows that manual computations also provide reasonable results for planning though they consume a significant time.
4.0 The comparative computations and step by step documentation of each activity that was taken into consideration attempts to provide the users with an indication of the needs and precautions that should be affected during database preparation, checking and most importantly during planning. The study reveals the need to establish quality guidelines and also the need to ensure quality flagging. Even though there are many users, communities and organizations in the country who prepare base data, the lack of quality flagging of spatial data creates a colossal loss to the nation as a result of time loss in the process of repeated r 7.0 The study area also indicates a wide variety of methodologies even carrying out several simple computations that would not even be expected at a very high accuracy. The attempts made here are to present the availability of various options and the need for a critical evaluation of objectives prior to making a methodology selection. The study also indicates the need to perform intermittent evaluation of the methodology, the process followed thus far and the achievement of results that would satisfy the objectives. The present work showed that even for a small area of approximately 25km 2 , the calculation accuracies changed from method to method due to different reasons. GIS users and decision makers need to carefully understand the reasons that had been presented in the above text, so that a GIS could facilitate their resource planning and managerial requirements.

Conclusions
1.0 GIS database preparation activities should be carefully planned according to the objectives of intended work, and available data formats to ensure satisfactory results.
2.0 Computational options within GIS environments should be evaluated and selected to suit project objectives and accuracies.
3.0 The indicator used to compare the methods with Process, Time and Complexity of Operation indicate as representative values. Therefore, this indicator can be used for similar work. 4.0 Data imports to GIS environments should be carried out in an educated manner ensuring that the suitable checks are affected.
5.0 GIS database preparation and checking prior to computations consume significant time periods and therefore, should be carefully carried out while ensuring ability to use same for repetitive work.
6.0 There is a great advantage of ensuring quality of spatial data and also facilitating repeated use. Therefore, in the national interest, it is necessary to have an apex body to satisfy spatial data policy and implementations that are for the development of the nation. This can be carried out in a manner similar to National Spatial Data Infrastructure arrangement practices elsewhere in the world.