There is a project that aims to determine change in the knowledge-attitude-practice (KAP) among general public after certain intervention and after 12 months period. The design of the project is intervention and non-intervention (two districts).
In real example, a previous survey showed that among general public, 90% had shame associated with a disease. Our target is that by educating, this would reduce by 11% (NOT 11 percent point) i.e. to 80% level from 90%. So, I need to calculate the sample size (this takes into account two proportions, 90% and 80%):-
By assuming 80% power, 5% alpha risk, 219 per district is the required sample population. I assumedly take 30 clusters (the one intervention district has 200 villages i.e. clusters) as it is recommended by the WHO. So, 219/30=7.3 households per cluster (i.e. village). The primary sampling unit are the households and the proposal is to take three elements from each households (members of the family) to individually respond to the questionnaire. So, in total, 219 households and 219*3=657 individuals
Questions:
1. Is it OK to use this assumedly 30 clusters number?
2. How the required number of clusters could be estimated i.e. to take xx (?) clusters from 200 clusters (villages) of this intervention district as we don’t want to reach all 200 villages.
SECOND
Above approach has number of disadvantages: - that the change (i.e. effect size) is an assumption; no source that determine that the change in KAP due to intervention (effect size) should be by 11% and not by 5% or 20% or 30%. Secondly, after the study is done, the “actual” effect size obtained could be <11%; meaning that the calculated sample size was infact lower than it should have been (sample size inversely proportional to effect size) for the “real” effect of the intervention. This may make results biased and less reliable.
An approach that may counter above problems is:-
To conduct two prevalence of KAP surveys individually, before and after the intervention i.e. without assuming and pre-conditioning that our intervention would bring this much (XX %) amount of improvement in the KAP (i.e. 11% improvement, as above). After having two surveys conducted, then we estimate how much change has actually occurred. The sample size is determined by using “one proportion” and formula as below:-
The primary sampling unit will be household, not individuals. Within each household, three individuals will be invited to respond to the questionnaire--1 <18 years age person, 1 adult male, 1 adult female.
The formula shown below gives a maximum possible sample size of 210 households
N=2* 1.962 (p-(1-p))/d2
p=0.5; 1-p=0.5; d=0.1, totaling 192 PLUS 10% i.e. 211
Thus, dividing 211 with the required number of cluster (villages) according to standard, i.e. 30, would mean 7 households per village. From each household (n=211), three respondents (1 <18 years age, 1 adult male, 1 adult female) as mentioned above will be invited (no random selection) to respond to KAP (7 houses*3 from each house, 21 per village). This means a total of 630 respondents (21*30=630) in each district.
This approach has following advantages:-
No particular effect size of the intervention is pre-assumed before doing the survey while the maximum sample size that can be obtained, is taken (refer to formula above).
Possible disadvantages:
The formula used in the second approach doesn’t take into account “statistical power” but does take into account confidence interval. Power is safeguard again false negative and confidence interval is safeguard against false positive. Power is an important aspect that indicates that the probability of detecting an effect that exists in reality.
Questions:
1. Is this 2nd approach correct even if “power” aspect is not taken into account vis-à-vis our objective (before doing the 1st baseline survey) i.e. to determine the change in a parameter due to an intervention? For this objective (i.e. to determine the change in parameter due to an intervention) can be studied by doing two “independent” surveys (i.e. “one proportion” is used i.e. 90% in sample size calculation, formula as above in second approach) before and after the intervention?
N=2* 1.962 (p-(1-p))/d2
p=0.9 (because of 90%); d=0.1
*This formula doesn’t take into account two proportions i.e. before intervention 90% and after intervention 80%
2. Are there any ways by which lack of “power” in the calculation of sample size in the 2nd approach be compensated? Is absence of “power” (which is usually 80%) in the calculation formula of sample size a problem?
Power 80%
Statistical Power: Statistical power is inversely related to beta or the probability of making a Type II error. In short, power = 1 – ß.
In plain English, statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.
A lot of questions ... lest us start by addressing a few of them and see where that takes us ...
Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get:
Estimated sample size for two-sample comparison of percentages
Test H: p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2
Assumptions:
alpha = 5% (two-sided)
power = 80%
p1 = 90%
p2 = 80%
Estimated sample size:
n1 = 199
n2 = 199
So your sample size in each district should be about 398.
You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get:
Estimated sample size for two-sample comparison of percentages
Test H: p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2
Assumptions:
alpha = 5% (one-sided)
power = 80%
p1 = 90%
p2 = 80%
Estimated sample size:
n1 = 157
n2 = 157
Giving a sample size of about 314 in each group.
If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only.
I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district.
A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough).
Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better.
Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c.
WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that.
I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple).
In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach).
I hope this helps.
Please do not hesitate to ask follow-up questions.
Mark Myatt
Technical Expert
Answered:
10 years agoDear Friend
Thank you so much first of all; I must say that you are doing an excellent job to put sincere efforts and share your knowledge with those who are unknown to you! Please accept my sincere thanks for this!
I answer within your replies (para with ### are the answers):
A lot of questions ... lest us start by addressing a few of them and see where that takes us ...
Sample size calculations usually assume a simple random sample. Cluster samples usually have a lower effective sample size than a simple random sample. This means that you should take a larger sample size than calculated. It is usually to take a sample size twice that calculated for a simple random sample. Using GNU sampsize I get:
Estimated sample size for two-sample comparison of percentages
Test H: p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2
Assumptions:
alpha = 5% (two-sided)
power = 80%
p1 = 90%
p2 = 80%
Estimated sample size:
n1 = 199
n2 = 199
So your sample size in each district should be about 398.
###I agree with the design effect. But do we add design effect even when there is two sample comparison? Does it not to be applied in only one sample i.e. e.g. like prevalence surveys where you don’t aim for a change from time 1 to time 2?
You might consider using a single-tailed hypothesis test as you are expected and are interested in a difference in one direction. Using GNU sampsize I get:
Estimated sample size for two-sample comparison of percentages
Test H: p1 = p2, where p1 is the percentage in population 1
and p2 is the percentage in population 2
Assumptions:
alpha = 5% (one-sided)
power = 80%
p1 = 90%
p2 = 80%
Estimated sample size:
n1 = 157
n2 = 157
Giving a sample size of about 314 in each group.
###This is also a strong argument. But, is it still OK to take single-tailed hypothesis, even when it is possible that the mass-media may increase the knowledge of the population; or may reduce (strange!) or may not have any effect at all from the baseline level?
If you have baseline data you may want to use a one-sample test (i.e. for 80% rather than 90%) This would need a sample size of about 140 in one (i.e. the intervention) group only.
I would probably go for a one-sample test in one sample from the intervention district with 90% or 95% power. Sample sizes would be 204 and 266 respectively. This would be safe if you think that a secular trend in reduction of stigma is not operating and so will not need a "control" district.
###Thank you again for this good argument as well. But we are testing not just knowledge; we are also testing a strategy for treatment coverage as well, so I guess two districts would be taken. We also need to know what level of knowledge control district has.
A general rule is to prefer many small clusters over a few large clusters. using m = 30 clusters is usually a safe choice but you could go for fewer. This would be cheaper. With n = 266 you might (e.g.) go for 24 clusters of 12 (that come to 264 ... close enough).
###As for I understood, according to you, there is no specific mathematical formula to calculate the number of clusters; it just depends on convenience and desirable cluster size? I have seen surveys that have taken >40 or >50 surveys, do they also not use any particular mathematical formula and decide on number of clusters as per convenience and desirable cluster size? 30 clusters are generally for immunization programs, generally, and our program is on neurology, would that be no issue?
24812=288, and not 264
Picking clusters can be done using the PPS sampling approach as used in SMART surveys or the spatially stratified approach as used in RAM type surveys. Households within each cluster could be selected as described in the SMART survey manual. You do not need to sample all 200 villages. See above ... 24 would probably do you well, 30 might be better.
###Yes, I would use PPS; and plan to sample households by simple random sampling, what do you think?. I would read SMART survey manual as well. Thank you for this link.
Data analysis would require specifying the sample design. This can be done in packages such as STATA, SPSS, EpiInfo, SUDAAN, SAS, &c.
WRT the effect size. If you knew this in advance then you would not need to do a survey. Select the level of effect that you deem to be usefully or substantively significant. If you think a drop from 90% to 80% to be a success then use that.
###sorry, but I didn't understand "If you knew this in advance then you would not need to do a survey."
I would avoid a before-after paired study as this often proved to be a lot of work (best to reserve these to interventions in (e.g.) schools where follow-up is simple).
###Thank you. Since we employ radio for mass-media, don’t you think that we wouldn’t need to go to the same participants, as those of the baseline survey, for the final-line survey? Meaning, a random sample of public for baseline and then another random sample of public for final-line, participants may or may not be same?
In summary ... I think you can do a more powerful and cheaper on-sample study with a single tailed test (or a 95% CI approach).
I hope this helps.
Please do not hesitate to ask follow-up questions.
###Thank you so much for your time and patience. Also, I would be glad if you may share your insights on the second approach I had mentioned in my original message:-
Since assuming a change from 90% to 80% is only an assumption and something may not be obtained in reality. Could these limitations in the two-sample comparison of proportions be overcome by doing two independent surveys (that use the formula as below that gives maximum sample size and we do not assume anything that our intervention mass-media would bring this much of improvement) that takes into account 95% CI but doesn’t take into account statistical power (80% or 90/95% power). Could this also be one of the alternative methods according to you? Project cost is not a limitation factor.
Since power is an important component to show that the difference existed in reality; below formulas and above second approach doesn’t take into account this power aspect. Power could be estimated after two surveys (baseline and final-line) have been conducted but not pre-assumed.
N=2* 1.962 (p-(1-p))/d2 (for survey baseline)
p=0.9 (because of 90%); d=0.1
N=2* 1.962 (p-(1-p))/d2 (for survey final-line)
p=0.8 (because of 80% or whatever is obtained in baseline survey); d=0.1
Best regards!
Anonymous
Answered:
10 years ago1. Nutrition Surveillance Specialist
Publication link: https://www.unicef.org/about/employ/?job=530700
We are looking for a consultant to support the countries of the region in their planning and implementation of nutritional surveys. Also, the work includes an important aspect of collaboration with regional platforms such as the Cadre Harmonisé.
2. Nutrition Routine information
Publication link: https://www.unicef.org/about/employ/?job=530703
We are looking for a consultant to support countries in the region in their routine nutrition information systems. The person should have advanced knowledge of DHIS2 software.
Mark Myatt
Technical Expert
Answered:
10 years agoThank you so much again for all your help and timely answers!, I have no pending queries now, I have got the answers I was looking for, Thank you so much!
Just to be sure on one last para "Statistical tests vs. CIs", as far as I have understood, you do not support the second approach that I had mentioned (two prevalence surveys without assuming that a particular effect size would be obtained, 10% in our example), but prefer to pre-assume that an intervention would bring x% of change (10% in our example) and then according to this assumption, we calculate the sample size. Have I understood correctly?
Nonetheless, thank you so much for all your help!, Best regards.
Anonymous
Answered:
10 years agoThe point I was trying to make is that the two approaches are equivalent to each other. The only difference between them is the mechanics of the testing. With two surveys you will still want to ask whether the two prevalences differ from each other and you will then fall back on a significance test and that is where we started from.
Best, IMO, to decide what is the smallest effect worth detecting and then calculate the sample size sufficient to detect that with acceptable levels of error. The sample size calculation (and the sample size required) for either approach will be the same.
Mark Myatt
Technical Expert
Answered:
10 years agoThank you so much, sir. Sorry, I misunderstood initially your point. Best regards!!
Anonymous
Answered:
10 years ago