Name conversion of sample size to household

I have got problem in reconciling between manual calculation and the ENA for SMART 2011 (version November 16th 2013). Here are the assumptions; GAM 17.5%, precision 4, DEFF 1.3, HH size 7, and under five population 14%, and contingency 3%. Based on the software, the sample size is 491 children. The issue is when it converted to HH. According to the software, the total HHs needed are 573 whereas in my calculation it reads 516.051 which round off 517.

Dear I do not know what formula you used for conversions. Hold in mind that for the SMART survey work with the 6-59 months, not 0-59. You need to make a correction to your number of household by dividing by 0.9 I hope this helps! 516.051/0.9 = 573

Anonymous

Answered:

11 years ago

Hi Kiross What was the formula you used for manual calculation?

Anonymous

Answered:

11 years ago

I used the following steps; if HH size is 7 and under five population is 14%, then the number of under five per household is 0.98 (7*14%). To convert the sample size into HH, sample size divided by number of children per household which is 491/0.98 = 501.0204. Finally add the contingency (3%) = 501.02*1.03 = 516.05 rounded off 517

Anonymous

Answered:

11 years ago

The formula used by ENA is as follows (and this slightly differs from your calculation): No of HH=No of children/(HH size X % of children under 5 X 0.9)+Contingency The no of children under 5 is expressed as a decimal, ie. 0.14 in this case The 0.9 represents the proportion of children under 5 who are 6-59m In this case your calculation would then be =491/(7x0.14x0.9)+3%=1.03x556 =573 I hope its clear. Let me know if now.

Anonymous

Answered:

11 years ago

I would do this: (1) Calculate the sample size. A common formula is:


   n = (p * (1 - p)) / (precision / 1.96)^2
   n = (0.175 * (1 - 0.175)) / (0.04 / 1.96)^2 = 347

This is for a simple random sample so I'd multiply this by the expected design effect. Your 1.3 seems a little low to me but we will go with that here:


   n = DEFF * (p * (1 - p)) / (precision / 1.96)^2
   n = 1.3 * (0.175 * (1 - 0.175)) / (0.04 / 1.96)^2 = 451

I would then decided on the number of PSUs (clusters) I would take. This is typically m = 30. Trying:


   451 / 30 = 15

So I would go for 30 clusters of 15 children which gives n = 450 (close enough to 451 and we usually oversample a little ... when (e.g.) you have already found 15 children you might next sample a HH with 2 children and the within-PSU sample size will be 16 not 15). (2) Calculate the number of HHs needed to get n = 450. NOTE : YOU DO NOT USUALLY NEED TO DO THIS. SMART type surveys typically use a quota sample collected using the EPI proximity sampling method and you just keep sampling until the quota is met. The only time you need to know the number of HHs needed is if you can and intend to take a random or systematic sample of HHs. Anyway ... you have 14% aged 0-5 years. This means you will have:


   0.14 * 0.9 = 0.126 = 12.6%

children aged 6 - 59 months. If the mean HH size is 7 then you would expect each HH to hold about:


   0.126 * 7 = 0.882

children aged 6 - 59 months. In order to find 450 children you would need to sample about:


   450 / 0.882 = 510

households. I don't know what you mean by "contingency" as SMART type surveys have a quota sample so refusals and absences are noted and replaced by other children from the PSU. The 3% looks very small to me if it is to account for refusals and absences (I'd use 10% or more). I think you would multiply the sample size calculated above by 1.03 giving:


   n = 510 * 1.03 = 525

We'd probably need to sample per cluster so we divided this by the number of clusters:


   525 / 30 = 17.5

. We'd probable then plan to sample 18 HH per clusters. I hope this is of some help.

Anonymous

Answered:

11 years ago

The figure i used (14%) was for children 6-59 months not for 0-59 months. i did exclude under six (10% of the U5 population). in this case, no need of including 0.9 in the calculation. now I understand that i have to insert percentage of 0-59. when i used 15.5%, now it is equivalent with my manual calculation for 6-59 months.

Anonymous

Answered:

11 years ago

That's great.

Anonymous

Answered:

11 years ago

Quick clarification on the SMART methods described above. The SMART methodology recommends using the fixed household method and not quota sampling: only a certain number of households selected randomly on the field will be visited; refusals and absences are not replaced by other children in the cluster. At the end of the survey, some clusters will have more children than others and the total should not differ significantly from what was planned. Therefore, survey teams should be given only the target number of households per cluster (and not the number of children per cluster) in order to avoid confusion and unnecessary errors. When survey teams have a target number of children to reach, they may have a tendency to skip households that don’t have children. Other indicators collected during the same survey (such as mortality, water and sanitation, food security, etc.) are measured at the household level. Measuring these indicators only in households with small children and excluding all other households will create a serious bias. With regards to household selection techniques, SMART recommends using simple or systematic random sampling methods to choose households within the clusters (in the 2nd stage sampling) since they are better than modified EPI in terms of representativeness of the sample and introduce less bias. These 2 methods are based on the selection of households either from a list (simple) or with a sampling interval (systematic). Therefore, when using one of these 2 sampling techniques, it is more logical to have a fixed number of households as a target to reach for each cluster. In other words, since it is only possible to estimate the approximate number of eligible children per household prior to data collection (which might not reflect the actual number found in selected houses), it will be impossible to know in advance the number of HH to select that will contain the exact number of children under 5. And lastly, as sample size calculation may be done no only for anthropometry, but also for mortality or other indicators measured on the household level, it is easier to compare and reconcile these sample size requirements both sample sizes (for example, anthropometry and mortality sample sizes) if expressed in the same units (i.e., households). Best of luck with your survey Tefera, for further information on the sampling methods recommended by SMART, please refer to the Sampling Paper on the SMART methodology website: www.smartmethodology.org Thank you, Victoria

Anonymous

Answered:

11 years ago

Dear victoria, Thanks for your reply. My question question was something else

Anonymous

Answered:

11 years ago

I think we need to be clear about terminology. I also think we should be clear about what SMART does and does not include. If you sample a fixed number of anything (children, households, &c.) from a primary sampling unit then you have a QUOTA SAMPLE. SMART uses a quota sample of households. It is, however, not that straightforward as SMART (the anthropometry module) takes a quota sample of households in the expectation of meeting a quota sample of children. If it does not do this then the PPS sample is compromised. The household quota is calculated to in order to collect a child quota. SMART uses a quota sample of household so as to allow the use of systematic and random sampling of households. This can be problematic as it requires some estimate of the number of eligible individuals in each household in advance of data collection. This data will not always be available (especially in emergencies where there has been displacement with some or all households sheltering displaced persons ... in this case the PPS procedure is also likely to be problematic). Another problem with the quota of households approach is that it makes a strong assumption that all households contain roughly the same numbers of eligibles. If (e.g.) the pattern of acceptance of family planning is patchy, or the pattern of polygyny is patchy, or the ratio of extended to nuclear households is patchy then you may end up with some types of community overshooting their sample size and others undershooting their sample size. When this happens the PPS sample is compromised and the sample is biased towards (e.g.) those not using family planning and those practising polygyny or towards extended households. I am not sure about the "water and sanitation" and other indicators as SMART concentrates on GAM estimation and mortality (and food security using a semi-quantitative method). I do think that you can collect such indicators within the SMART sampling framework (it is just a modified EPI method) but that is not really SMART. The last time I looked, the ENA for SMART software did not support a wide indicators. WRT "Measuring these indicators only in households with small children and excluding all other households will create a serious bias" ... It is not usually a problem for a child survival program to "optimally bias" a sample to include only households with young children. This is usually an intended and desired "selection bias". Any survey with eligibility criteria will have this sort of "bias". I would not consider this to be a "serious bias problem". Some selection biases are unintended. In SMART (and may other household surveys) pastoralist, itinerant traders, the homeless, &c. tend to have zero selection probability so this types of survey are EPSeM (equal probability selection methods) in name only. This may be a "serious bias". WRT household selection methods ... in some parts of the SMART literature EPI proximity sampling is recommended. In other parts the EPI proximity sampling method is deprecated in favour of systematic or random sampling. The literature does not explicitly ban the use of EPI proximity sampling. Just the opposite. It encourages the use of EPI proximity sampling. The 2012 SMART sampling guide (e.g) has "Modified EPI Method" at the two (out of seven) end nodes of the sampling method choice algorithm on page 29. The issue with EPI proximity sample is loss of variation due to the sample design rather than bias. This matter has been well studied as EPI is a very important child survival program. The original EPI proximity method tended to a "centre of community" bias. The method outlined in the 2012 sampling guide on page 30 should correct this. I hope this clarifies some misconceptions with terminology and with the SMART method as described in the SMART projects own literature.

Anonymous

Answered:

11 years ago

I have a comment concerning Mark’s response. First, what is a “quota sample”? While definitions may vary, my interpretation of a quota sample is that a number of elements (households or individuals) are approached and asked to participate until a fixed number of elements provide a response. In the classical EPI survey of immunization status, survey teams would go door to door until they found seven eligible children whose parents agreed to participate. This is a non-probability approach to sampling that survey statistician's generally discourage or disparage (see Kish, "Survey Sampling"). This differs from the situation where a cluster survey is performed (e.g., 30 clusters) and a fixed number are households are to be visited in each cluster (e.g., 10 households). When a survey team visits the cluster, they usually attempt to randomly or systematically select 10 households, they visit these households and request participation in the survey. If any households refuse to participate or are not present in at the time of the survey, there are no replacement households. Therefore, some clusters may have 10 participating households, some 9, and so on. This is an approach that survey statisticians usually find acceptable, however, if the response varies dramatically from cluster to cluster there may be a need to weight for non-response in the analysis. For survey planning purposes, an estimate of the household response is needed as well as individual response for sample size calculation estimates. So, I would disagree with Mark that SMART has a quota sampling of households – there are a fixed number of households to be approached however there are no replacement households and therefore no quota. I agree with Victoria's use of these terms. Kevin

Anonymous

Answered:

11 years ago

Does this procedure not compromise PPS (even a little)? I usually think of PPS and quota going hand-in-hand.

Anonymous

Answered:

11 years ago

Quota sampling just means that survey teams continue to collect data until data have been successfully obtained from a target number of units of analysis, such as households or children. PPS is a sampling method. There is no inherent or statistical connection between the two. Using quota sampling can help avoid the necessity of later statistical weighting to account for non-response or incorrect assumptions about the number of respondents per household or other assumptions. However, quota sampling has the potential for introducing sampling bias if data collection is stopped before each sampled unit is recruited for data collection. The alternative method (selecting a determined number of sampling units, then collecting data from as many as possible without replacement) can also result in a form of sampling bias due to non-response. However, if non-respondents are similar to respondents, non-response bias can be corrected with statistical weighting during analysis. The sampling bias from quota sampling cannot be corrected after the data are collected; therefore, it is much more dangerous and should be avoided.

Bradley A. Woodruff

Technical Expert

Answered:

11 years ago

Woody, Maybe I'm parading my ignorance (again). I remember a presentation that you (or maybe it was Paul Spiegel) used to give showing how SMART type surveys were EPSeM (in terms of individuals from a population) when PPS and quota sampling were used together. I think that the SMART type survey is only EPSeM if a quota sample is taken from each PSU. They must come together for the sample to be EPSeM. This is why (e.g.) EPI uses PPS and quota sampling (noting that the EPI method has faced far more scrutiny and validation by competent survey scientist than any of the range of SMART versions ever has). Please explain why this is not so. I think that not using quota sampling within PSUs means that you will (with what seems to me to be inevitable non-response) need to use posterior weighting (which is not needed with a quota sample). In that case ... why bother with PPS in the first place ... that is why prior weight when you then have to posterior weight anyway? Choosing complicated over simple and expensive over cheap and slow over rapid makes no sense to me. WRT avoiding statistical weighting ... PPS is a prior weighting scheme which assumes (I believe) a quota sample. If you do not use a quota sample then the sample ceases to be EPSeM and you would have little choice but to use posterior weighting. This is not, as far as I know, commonly done with SMART survey data. I think that (as I write above) the SMART method will get close to a quota (if the assumptions are correct) and this will not be a serious problem for most surveys. My main concern is that the PPS sample can be seriously biased in situations where there has been (e.g.) considerable displacement. In that case, I think, we have little option but to collect weighting (population) data as we go and use posterior weighting during data analysis. WRT "However, quota sampling has the potential for introducing sampling bias if data collection is stopped before each sampled unit is recruited for data collection". I am confused as to how that would be a quota sample. It is, by your own definition, not a quota sample. This can happen when very small PSUs are selected. It is rare with PPS as the sample is concentrated in larger communities. It can happen with spatially stratified sampled but these types of sample usually employ posterior weighting. I wonder how you would ever know "if non-respondents are similar to respondents". This seems to me to be a BIG assumption as we already know they are qualitatively different from each other (i.e. one set responds (or is present) and the other set does not (or is absent)). It seems that you are suggesting that we make the similarity assumption against the only available evidence (they do not respond so you probably have no other evidence) and go ahead and adjust anyway. This does not appear to be much more than a theoretical advantage. I am also confused by your and Kevin's use of with-replacement and without-replacement. These terms are usually used to mean that we can or cannot include the same individual in the sample more than once rather than (as I intended) taking the next and nearest sampling opportunity. Please explain where I am going wrong.

Anonymous

Answered:

11 years ago

Kevin, I think we are facing the common problem of the same term meaning different things in different contexts. Kish (1965) uses the term "quota sampling" to refer to a sampling technique commonly used in opinion polls and market research in which the researcher attempts to represent the main (usually demographic or socio-economic) characteristics of a population by sampling a proportional number of individuals with each characteristic or combination of characteristics. For example, if you wanted a proportional quota sample of 100 people based on sex, you would first need to know the sex-ratio in the population. Let us imagine that this is 57/43 (males/females). In this case you would aim for a sample of 57 men and 43 women for a total of 100 respondents. You would start sampling and continue until you got those numbers and then you would stop sampling. If you’ve already got 43 women in the sample but not 57 men then you would continue to select men and discard any further eligible women respondents that fell into your sample (i.e. because you don’t need them because you have already “met the quota" for women). The problems with this approach are that: (1) You must decide in advance the specific characteristics on which to base the quota(s). Will it be by sex, age, education, ethnic group, religion, political affiliation, social class, car-ownership and so-on? Which is relevant? The sample can get very complicated to collect as (e.g.) each quota may be defined by a complex set of characteristic. You may end up with very small and unrepresentative samples in each quota. To avoid this problem you might have to use a very large overall sample size. A big problem with complicated samples is that it becomes difficult to identify and quantify sources of error. (2) The sampling proportions used to make up the various quotas must be accurate. Often these are not available. Census data (e.g.) is often published long after data was collected or many years before your survey and may be out of date (particularly in an emergency when people move and livelihoods are disrupted). (3) Even if the proportions / quotas are correct the selection of individuals is often found to be biased as surveyors may tend to favour some respondents over others in order to ease their work. (4) This is a non-probability sample so the sampling distribution of a variable is difficult to model meaning that confidence limits cannot be calculated (methods that use empirical sampling distributions such as the bootstrap may, however, help here). This is, I agree, a method fraught with problems and is best avoided. This is, however, NOT, what is meant by epidemiologists when we use the term. A “quota sample” in the epidemiological sense will use some form of probability sampling method to collect the quota. This means we can model the sampling distribution (e.g. EPI and earlier versions of SMART provide confidence intervals around point estimates using model-based techniques and quota sampled). Selection bias is minimised (although not eliminated) by employing a strict set of sampling rules (as is done in EPI and all versions of SMART). Simple eligibility rules are usually applied (e.g. children aged 6-59 months in SMART). The sample is as simple as possible. The two methods share the same name but are NOT the same thing. I now fear that SMART may have rejected quota sampling based on a confusion of terms and/or a misreading of the literature. Can someone please assure me that this is not the case?

Anonymous

Answered:

11 years ago

Hello colleagues, I would like to know the definition of the actual and theoretical coverage of a nutrition program and how they are measured. Thank you

Bradley A. Woodruff

Technical Expert

Answered:

11 years ago

Woody, WRT your (1) : The "Quota sampling" described and decried by Kish (1965) is the quota sampling described by Last (although I have never seen or heard of anyone use that type of quota sample in an epidemiological study ... would it ever get published?) not the SWTRDC type of quota sample. So calling on Kish (1965) to justify the removal SWTDRC from SMART is (IMO) somewhat spurious. I think SWTRDC is an ugly acronym and also hope it does not catch on. WRT your (2) : OK. I think the method outlined above in which we aim for a quota but complete the sample with non-responders replaced by the nearest sampling opportunity gets over this issue. It is, in effect, what SMART does now but with a different non-response rule. I do not worry about oversample in my own surveys as I tend to use a spatially stratified sample and weight after data collection. WRT your (3) : Maybe I am confused (again) by terminology. PPS as prior weighting to select PSUs does (I think) assume a quota sample. PPS as a way of specifying within strata sample sizes so that a stratum with a population of 10,000 has a sample twice the size of a stratum with a population of 5,000 does not assume a quota sample (but you would aim to get the sample). I don't think SMART does the latter. I think SMART uses a pseudo-quota method that aims to get much the same size of sample from each PSU regardless of the PSU population. WRT your (4) : This is a shocking statement. Who on earth do you think does SMART surveys now? Mid-level program managers. Certainly not many epidemiologists. WRT your (5) : Let me get this right ... you weight the sample prior to data collection so it is placed preferentially in the most populous communities and then you weight again in favour (again) of the most populous communities. This double-weights the sample and reduces the contribution of smaller communities to almost zero (why collect that data in this case). I think this to be bad practice. If you expect to use posterior weighting then why not take a spatially stratified first stage and weight once the data are in? WRT your (6) : How often is this done in SMART surveys? WRT your (7) : The permanently empty dwelling is not a real dwelling but just a building and should never have been selected in the first place. This is an obvious case for taking the next dwelling. It is not a replacement since nothing is being replaced. So here is a case where replacing is absolutely the correct thing to do and not replacing is absolutely the wrong thing to do. WRT your (8) : I disagree. First we need to know about the difference between these two sampling modes in order to apply finite population corrections. We also need to know about when we would prefer the hypergeometric to the binomial model in data analysis (particularly in small samples from small areas). Second ... I think that one thing that is clear from this this discussion thread is that complacency in terminology is an issue. And really ... there is nothing wrong with being a geek. Geek is chic! The geek shall inherit the Earth. Thank you for the complement. So ... where does that get us? Is it that SMART is slow, difficult, and expensive and getting slower, more difficult, and more expensive at every revision. It may even be using dodgy procedures such as failure to quota, inappropriate double weighting, and weird and wonderful sampling rules. Do I have that right? Is it time for reform that is more thoroughgoing than adding more and more cost and complication?

Anonymous

Answered:

11 years ago

Dear Woody, Mark and Kevin, Many thanks for your contributions to this discussion. However, it seems it has now strayed quite far from the original question into a more detailed debate over epidemiological methodology and terms. Can I suggest that either the discussion is opened to all in a new thread or that you take any further discussion offline. Perhaps if you come to an agreed position you could then post a summary final response on en-net. If there are specific detailed issues to be raised concerning the SMART methodology, it might be preferable to post these on the SMART methodology website where a direct response can be provided by the SMART team. Many thanks, Tamsin

Tamsin Walters

Forum Moderator

Answered:

11 years ago

I do not understand or agree with this position. Have you visited the SMART methodology website? The last message was posted 13 months ago. There seems to be no way to login. I can register but the site does not accept logins (this is a common experience). The site appears to be broken and moribund. There are more SMART related posts on this forum than on the SMART forum. I'm afraid (actually a little proud) that EN-NET is now the only game in town. I would like more people to enter into the discussion. They are free to do that if they want to. If they do not want to then neither you nor I should worry too much about that. This discussion IS open to all. I guess that with 202 views in three weeks that this thread is of interest to at least a few people. That indicates to me that the discussion is being followed and argues against a premature termination of the discussion. I am happy for you to move this discussion to a new thread on the EN-NET 'Assessment' forum but I think you should let it run its course. No-one is being forced to follow the discussion. If we are talking among ourselves then what harm is there in that? I think methodological and practical issues have been revealed in SMART and that these need to aired and resolved. If not here then where? Certainly not the SMART website which despite being moribund is not "neutral territory". I am one amongst many. We might ask our colleagues if they would like us to get over ourselves and shut up. BTW ... I am still pleased and proud to be called a geek even though I suspect is was meant as an insult.

Anonymous

Answered:

11 years ago

Just a reminder that the original question was on the ENA for SMART formula for calculating sample size, and the contributor was satisfied with the feedback. After this, there was a contribution from the SMART team on "fixed household" vs "quota method" as they are referred to in SMART surveys. This became the subject of a prolonged debate, which I have to say was becoming a little difficult to follow. I guess the decision we have to make is whether to accept guidance given through SMART (which we should if we are doing SMART surveys) in deciding whether to apply "fixed household" or "quota" for determining how to complete the required sample. Other than this, I am sure those with specific technical interest beyond this may actually communicate offline.This, of course, is just my opinion.

Anonymous

Answered:

11 years ago

OK. I'll shut up.

Anonymous

Answered:

11 years ago

Sorry for the delay in getting back on this. I have had limited access to internet this week. I would just like to clarify that it is not our intention to stifle debate or suppress valid concerns, but it seems that the issues raised are unlikely to get resolved on en-net, especially without input from the SMART designers/epidemiologists. If others have significant issues with the methodology, perhaps the way forward is for the expert epidemiologists to set up a consultation and review the SMART methodology? In the meantime, we will endeavour to alert the SMART team to these concerns and find out about the status of the SMART website and discussion forum. Thanks again to everyone for their contributions above.

Tamsin Walters

Forum Moderator

Answered:

11 years ago