We have proposed a novel way to take a sample when there is limited information on the population under study. It uses technology and data (Global Positioning Systems and satellite and aerial photographs) that are now widely available, and overcomes problems with other approaches.
The method has several strengths: it reduces the work for interviewers, minimizes their discretion in choosing buildings and is safer for them. It allows random selection with known probabilities, and minimizes ‘pocketing’ within clusters by spreading out the sample within the cluster. Unlike many previous techniques, it incorporates population (household) density, which permits calculation of correct sampling probabilities. Enumeration of buildings is needed for only very small areas, a task that can be done before going into the field; and interviewers only need to enumerate households for multi-residential buildings.
Given our experience, we raise several other issues. The latest satellite or aerial photographs for the GPS locations available to researchers can be out of date; interviewers should confirm the correct number of buildings when they visit the location. In our survey, we deliberately used older photos (before July 2006), since we wanted to learn about people who had left the area or had their homes destroyed. Most surveys will require recent photos of sufficient resolution to discern between buildings in dense areas. We used two separate mapping tools, both geocoded, that were recent aerial photos of the areas covered. We used both Google Earth and maps obtained from a local aerial mapping firm (in ArcGIS formatting) that had conducted a survey of the region less than a year before the conflict. Google Earth photos had been taken on May 31, 2006, less than 1.5 months before the onset of hostilities. In the cases in which resolution was poor, as in some rural areas (an issue only for Google Earth), the maps were cross-referenced for accuracy in detail. The resolution of the privately purchased maps was often significantly better than Google Earth’s photos, in which case we used the former.
There is also the question of defining when a building is ‘in’ the circle surrounding the GPS point – what part / proportion should be inside the circle. We recommend basing this decision on the ‘centre’ of the building; irregular shapes might cause error, though this is likely negligible.
We used circles with radius 20 m. In practice, the length of the radius may depend on the density of buildings in the areas under study. The circles surrounding two (or more) GPS points might overlap. Strictly, adjustments need to be made in computing both the probability of selecting buildings in the overlap and the fraction of the town area covered by the circles surrounding the points. Some preliminary simulations suggest any biases from failing to do this are minimal. This does depend partly on the area of the town and the number and radius of the circles, since they determine the likelihood of overlap. An alternative that can prevent this problem is to adapt the grid approach others have used [e.g., Grais et al., 2007]. On a map of the area under study, one could superimpose a grid of non-overlapping squares. Then a defined number of squares could be randomly sampled, and as with circles, buildings can be enumerated and one randomly chosen. (The question of whether a building is truly inside a square still applies.)
Another possible amendment to our method deals with the question of what to do if the building chosen is non-residential. Rather than ask interviewers to identify residential buildings within 20 m and randomly choose one, before the interviews begin one could designate 2nd or 3rd choice buildings within the circle. This would reduce the work of interviewers and limit their discretionary decisions.
The safety and security of interviewers needs to be maintained, even at the expense of efficiency of the design or complete adherence to protocol. This was done in a survey in Iraq . We too were concerned that outside interviewers might be at some risk; for example, that they might be seen as spies and a priori we excluded two Palestinian refugee camps. We also deemed it imprudent for interviewers to map out the boundaries of the selected towns on site. Indeed, because of intervention by Hizbollah security personnel, we were not allowed to conduct the survey in Bint Jbeil, where we had anticipated surveying 200 households, and Khiam. Since the region of these towns was one of the hardest hit during the conflict, we likely underestimated numbers of casualties and rights violations. As well, interviewers could find locations with GPS units rather than satellite photographs. We did not do this, as we were worried about the safety of interviewers if they were known to be using GPS technology. The more recent availability of ‘smart’ phones with GPS capability may circumvent this concern, as observers might simply assume the interviewers were using their phones.
As well, talking with local leadership about the study, in particular the nature of the maps and the random choice of locations, before conducting interviews decreased the amount of suspicion and increased acceptance of the survey teams by local residents. Even though we did this through our local interviewing firm to great effect, we were not allowed into two of the strongholds of Hizbollah, whose local leaderships’ biggest concern was the nature of the questions on ownership of and attitudes towards small arms.
It may be feasible to adjust estimates using alternative sources. For example, the Iraq Body Count has collected data on the numbers killed, as reported in newspapers or other sources . By using data on the relative proportions of people reported killed in different cities, the Iraq Family Health Survey Study Group estimated the undercount in their survey . We did not have relevant information, so had to accept the limitations in our data from failure to cover the whole area.
We recognize that many agencies planning surveys have limited expertise and resources. Google Earth is free and readily available with internet access. Agencies, we believe, will find the tool very attractive for this reason. In addition, random selection of GPS coordinates can easily be conducted in almost any statistical package or spreadsheet application, including Excel. In addition, importing those points into Google Earth tools can be conducted easily with open access free software (e.g., GPS Visualizer: http://www.gpsvisualizer.com/map_input?form=googleearth).
Though the technical expertise necessary to carry out these processes may seem daunting for some agencies, we believe with a limited amount of training most program officials will be able to easily and quickly use this process in emergency and/or difficult settings. Using Google Earth is intuitive and can be learned quickly. Additionally, training in randomly selecting GPS coordinates and mapping them to the software should be relatively brief. Once that is done, Google Earth tools can be used to delineate the 20 m radius for each point, demarcating the buildings, and randomly selecting one. The maps are then printed directly from the program and given to the interviewers. In an Appendix, we show the calculations needed to compute sample weights, and the syntax for doing this in SPSS, a widely used statistical package.