Hello, we are doing a baseline study for the evaluation of a programme. The programme is going to be implemented in 6 provinces and 18 counties. We want to have information by each county. How does this influence our sampling design?

I was thinking…

1º- Randomly select x number of project communities/microareas (all have the same size, around 250 people, and there are 30 by each county) from each of the 18 counties.

2º- Randomly select x number of households from each microarea.

But would this two-stage sampling design give us a sample size sufficient to have accurate indicators at county-level? Or would I need to account the counties as domains/sub-samples and use a stratified random sampling design?

Many thanks in advance for any help you can give us.
Jordana

Dear Jordana:

A complete answer to your seemingly simple question would involve an explanation of statistics and sampling methodology. I would recommend consulting a textbook. However, I will try to address some of the major points below.

First of all, you need to determine what type of random sampling you will do. It seems from your description that the basic sampling unit is household. If there is a list of all the households in your 6 provinces, you may be able to do one-stage simple or systematic random sampling. However, the distance between selected households may be prohibitively large. If this is true, you may need to do 2-stage cluster sampling for purely logistic reasons.

You would then calculate a sample size for each stratum, which in your case is the county. The sample size calculation requires that you make some assumptions about the prevalence of your outcomes, how much precision you need, and what design effect you expect. Then the sample size would be multiplied by 18 to determine the sample size for the entire study. So you can easily see that increasing the number of strata can greatly increase your sample size.

If you do cluster sampling, you want to select as many clusters in each county as is feasible to minimize the size of each cluster. The larger the cluster, the higher is the design effect and the poorer is your precision.

Regarding your question, the sample size has nothing to do with accuracy. Accuracy is whether your point estimate from your survey is near the actual value in the population because of the absence of bias. This is determined by whether the sampling and measurements were done correctly. In contrast, the sample size influences precision which is a reflection of the degree of sampling error. So whether or not your results are useful depend on both a lack of bias (and therefore having accurate results) and an acceptable degree of precision (and therefore being relative certain that the difference between the survey estimate and the true population value is not high because of random sampling error). For a better explanation of these concepts, see http://conflict.lshtm.ac.uk/page_39.htm, http://www.unscn.org/en/resource_portal/index.php?&themes=201&resource=602, or any basic text on surveys and sampling.

Regardless, you should account for the stratification by county during data analysis because, if the outcome is different in different counties, you will get better precision than if you did not account for the stratification. Most larger statistical analysis software programs, such as SAS, SPSS, Stata, and R, will allow accounting for cluster and stratified sampling during analysis.

Bradley A. Woodruff
Technical Expert

Answered:

8 years ago

Dear Jordana,
It seems that your interest is county based information. If this is so, then you need to have 18 independent surveys. I am assuming one survey per county given each county are homogenous. For each county, you need to calculate sample size. the sample size for each county could be different as the size depends on expected prevalence and other parameters.
Regarding the sampling methodology, given the context of your area, I am also assuming two-stage cluster sampling. Stage I - you need to select clusters from list villages in your case communities/microarea and Stage II - select households randomly from the list of households in the village.
if your stratification is based on other factors such as livelihood or agro-ecology or provincial level, you cannot analyze /disaggregate the results by county. for example, if you do six SMART surveys ( one per province), you cannot analysis, by county but you can say about the province. such provincial level analysis might under or over estimate the findings specially if the counties are very heterogeneous.. .

Anonymous

Answered:

8 years ago

Dear Jordana,
I do agree with Kiross on the above approach, however, it is quite laborious and resource intensive. The approach will be guided by the objectives of your study, in some cases you could put into account the livelihood zones of the counties and probably do your survey at slightly larger area compared to a county, e.g. livelihood zones such as pastorals, mixed farming etc. and proceed with the 2-stage cluster approach.

Kennedy Musumba

Answered:

8 years ago

Dear Jordana,
I do agree with Kiross on the above approach, however, it is quite laborious and resource intensive. The approach will be guided by the objectives of your study, in some cases you could put into account the livelihood zones of the counties and probably do your survey at slightly larger area compared to a county, e.g. livelihood zones such as pastorals, mixed farming etc. and proceed with the 2-stage cluster approach.

Kennedy Musumba

Answered:

8 years ago

Regarding this discussion, the sampling stratification scheme is determined by a) what results are needed to make program decisions, and b) the resources available. If county-specific results are needed, then you need to apply the calculated sample size to each of the 18 counties if resources are available to do this. Such fine stratification is often expensive because it greatly increases the overall required sample size, but if you need to do it, you must do it.

Regarding the suggested stratification by livelihood zone or other criteria, it must be kept in mind that to do stratified sampling, you must be able to define the value for the stratification variable for each and every sampling unit. This means that you would have to be able to determine the livelihood zone for each primary sampling unit, in this case each "microarea". Usually, it is impossible to determine which livelihood zone each primary sampling unit belongs to. Livelihood zone maps supply only general geographic locations; they cannot define the livelihood zone for each primary sampling unit. Moreover, populations move, and in few primary sampling units does every household derive their livelihoods from the same activity or category of activities. In fact, there are very few criteria which can be used to stratify the first stage of cluster sampling. If the primary sampling unit is census enumeration area, census data may have for each such unit socio-economic data, racial or ethnic distribution, linguistic distribution, or other demographic information; however, for other primary sampling units, such as village or subdistrict, similar information is not available. So although it's easy to say stratify by livelihood zone, it's usually impossible to do it correctly.

Bradley A. Woodruff
Technical Expert

Answered:

8 years ago

Dear Brad, Kiross and Kennedy,

Many many thanks for your help regarding this matter. The programme is for CHW in Angola, and we don’t have readily available experts on sampling to consult here.

I thought it was important to have data at county-level, due to the variance of the prevalence of the indicators (although I do not know these values at county or province level- only have data for regional and national levels), and I thought this would be needed for monitoring and evaluation purposes of the programme.

For the sample size calculation, I used the calculator for proportions of Openepi, using the indicator: 11% (estimate for rural areas) as the proportion of children under 6 months exclusively breastfed

http://www.openepi.com/SampleSize/SSPropor.htm

For a finite population of 2700 children under 6 months (calculated as 2% of the study population of 135,000 people), 5% of absolute precision, 95% confidence level, this gave me a sample size of 143, which I then multiplied by a design factor of 2 and added 10% to account for non-response, giving me a sample size of 315 children under 6 months. Converting it to people and families, gives me around 2,250 families. We have the list of households for each community/microarea to do a two-stage cluster sampling. With a cluster size of 20 households, was planning on randomly selecting 7 communities/microareas from each county, which would lead to 140 households per county.

So this would not be a sufficient sample to give county-level data? I would really need to have 18 "independent surveys", one for each county with the sample of 315 children under 6 months/2250 households? Or at the baseline level this is not of crucial importance, since we will then have monthly monitoring data at county level?

We also do not have the resources to do a sample size totalling 40,500 households (2250 per county). For the livelihood approach for stratification, we don’t have information at county-level.

Again, my many thanks.
Jordana

Anonymous

Answered:

8 years ago

Dear Jordana,
I was working in Angola on health and nutrition programmes at community with CHW. I used LQAS methodology for coverage assessments. Such methodology allows comparability between different counties and small sampling. I have some slides in Portuguese that I can share with you. Please contact me through my email: elisadmuriel@yahoo.es

Elisa Dominguez

Answered:

8 years ago

Dear Jordana:

First of all, it appears that you are planning a quite complex survey with complex sampling and multiple outcomes. For this reason, I would highly recommend you find someone local or within your organization with experience in statistics, epidemiology, and survey methodology with whom you can discuss all the essential details during survey planning and implementation as they come up before you spend a lot of time and money to collect data which may not be useful due to methodologic problems. There are many pitfalls which must be avoided during planning and implementation,any one of which may threaten the validity of the survey results.

The most important thing when calculating sample size is formulating the appropriate assumptions and deciding on the level of precision you need in your final results. If you calculate a sample size to achieve a certain level of precision which you need to make program decisions, then this sample size must be applied to each stratum for which you need that level of precision. You have calculated a sample size of 315 children based on a desired precision of +/- 5 percentage points around an estimate of 11%. If you want this precision in each county, then this sample size must be selected in each county.

Regarding sampling, it seems that your target group is children less than 6 months of age. According to the Angola 2011 DHS, children less than 5 years of age represent 21.3% of the population. Because of infant and under-5 mortality, children less than 6 months of age probably make up somewhat more than 1/10th of children less than 5 years of age, so let's say they make up 2.5% of the population. The World Bank estimates an average household size of 8.63, so on average, there is only 0.22 children less than 6 months of age per household, meaning that if teams selected households without regard to household members, they would have to visit 5 households to find one child less than 6 months of age. This would mean selecting 1432 households in each county, for a total of 25,773 in all 18 counties. However, if you are interested only in children less than 6 months of age and do not wish to collect data from households with a child of this age, then at 4 out of 5 households, you would only need to ask if there are any eligible children in the household. This would be relatively quick.

Finally, the size of the design effect, and therefore the effect of cluster sampling on your precision, is determined in part by the average size of each cluster. You will achieve a lower design effect and higher precision if you split the sample size of 315 children into more clusters and thereby decrease the size of each cluster.

I hope this helps. Let me reiterate the recommendation to find a knowledgeable person locally or within your organization with whom to discuss details and questions as they come up.

Bradley A. Woodruff
Technical Expert

Answered:

8 years ago

Dear Brad,

Thank you very much for your help with this. I tried to contract an expert in sampling but the process did not move forward and now the start of the programme is eminent. I don't think this will be done in time anymore since the process here is too bureaucratic. So I decided to review it myself and see if i can carry it out. I have studied statistics but it has been some time and my field experience has been with qualitative studies. I thought this was the step to take, and I have been reading on sampling, study design and implementation issues as much as I can. Do you think I should not attempt this?

For the study we will require children under 6 months, under 5 years old, and women 15-49 years of age who were recently pregnant. I calculated the sample sizes required for indicators from each of these age groups, and the one who gave a bigger sample size was the for children under 6 months. For the sampling frame we will have available the listing of all households from each community/microarea as well as sex and age of household members.

Best
Jordana

 

 

Anonymous

Answered:

8 years ago

Dear Jordana:

Please contact me at bradleyawoodruff@gmail.com. I would like to provide whatever assistance I can, but the technical details may be of less interest to other en-net participants.

Bradley A. Woodruff
Technical Expert

Answered:

8 years ago
Please login to post an answer:
Login