Hi all,
I am asking this question from ignorance. Maybe it is a bit obvious, or maybe not.
I was checking on the ENA plausability report and I realised a weird thing in the check of age ratio (6-29 vs 30-59).
First of all, I am basing this question on the assumption that it is a Chi Square (X2) test that is applied in this check for obtaining the p value.
Therefore, when you calculate the expected values for boths groups, as they are only 2 groups, expected values are the same for 6-29 and 30-59. However, this is weird as 6-29 take into account 6 months less than 30-59, therefore the expected values should not be the same.
I do not know if there is any correction applied or I am wrong in something...?
I would be grateful if someone could clarify.
Thank you very much in advance.
Rregards,
Julian
Dear Julian,
The test used to analyze the age ratio in plausibility check ENA report is a Chi².
You would notice that the expected ratio is not 1 (as sex ratio) but 0.8 considering 24 months (=6-29 month) / 30 months (=30-59 month) = 0.8.
So the Chi² is testing if the sample's age ratio is near 0.8 as expected.
The original idea of the SMART methodology is to permit Nutritionist or Health worker to perform a strong robust cross-sectional survey (with international standard results) with basic knowledge on epidemiology and statistics. So, you don't necessarily need to understand the whole statistical fundaments under plausibility check report.
For better understanding of data quality process, see:
ANNEX 7.1: SMART: Ensuring data quality – is the survey result usable? by M. Golden
From training toolkit on smartmethodology.org
Hope this is helpful
Answered:
8 years agoHi Damien,
Thank you for your answer and clarification. Very helpfull the document ANNEX 7.4 Understanding Age and Sex ratios.
Actually, following that document, yes, the expected values are calculated through fixed proportions. Therefore if we calculate the porportions corresponding to the group age 6-29 and 30-59 we can calculate the expected values using this proportions:
6 – 17 months: 12000 / 51720 ˜ 23.2% (expressed as a simple proportion, it is 23.2/100 = 0.232)
18 – 29 months: 11700 / 51720 ˜ 22.6% (0.226)
30 – 41 months: 11340 / 51720 ˜ 21.9% (0.219)
42 – 53 months: 11160 / 51720 ˜ 21.6% (0.216)
54 – 59 months: 5520 / 51720 ˜ 10.7% (0.107
*Source SMART PM trainning manual
Therefore:
Group 6-29:
(12000+11700)/51720 =0.458236659
Expected value for this group = X * 0.458236659
Group 30 to 59
(11340+11160+5520)/51720 = 0.541763341
Expected value for this group of age = Y * 0.541763341
With this proportion the expected values can be calculated and apply the correction for this 6 months missing in 6 to 29 months group, having a X2 corrested and its associated p value.
If I would be wrong with this please let me know.
Thank you very much.
Regards,
Julian Ibarguen
Answered:
8 years ago
Hi Julian
The expected proportion of children for the Plausibility Check were compiled by Michael Golden based on extensive age distribution research as noted by Damien.
ENA computes the expected number of children for each month of the total population based on these expected proportions.
By looking at the expected proportions of children, multiplying them by the months at risk (12 months in the first four age categories, and 6 months in the last one), and then dividing them according to the age ratio grouping (6-29/30-59 months).
Looking at the table below, if 1,000 children are expected in the 6-17 month age group. Due to infant and child mortality 975 children are expected in the 18-29 months
Age Group Options in ENA
Expected Proportion of children for plausibility check of sampling
]
Age Groups (m) Male Female
6 - 17 ----------------- 1000, 1000
18 - 29 ---------------- 975, 975
30 - 41 ----------------- 945, 945
42 - 53 ----------------- 930, 930
54 - 59 ---------------- 920, 920
Thus, for the different age ranges we get the following:
6-17 = 12 months. 1,000*12=12,000.
18-29 = 12 months. 975*12=11,700.
30-41 = 12 months. 945*12=11,340.
42-53 = 12 months. 930*12=11,160.
54-59 = 6 months. 920*6=5,520.
Based on the above, we expect to have an age ratio of 6-29 months/30-59 months = 12000 + 11700 / 11340 + 11160 + 5520 = 23,700/28,020 = 0.8458 ˜ 0.85.
Where do the expected numbers of children expressed in the statistical test for sex and age ratios come from?
When adding the whole proportions obtained for each age ranges above, we obtain:
12000 + 11700 + 11340 + 11160 + 5520 = 51720, or 100% of an “ideal” sample.
For each age range, we expect to obtain the following proportions:
6 – 17 months: 12000 / 51720 ˜ 23.2% (expressed as a simple proportion, it is 23.2/100 = 0.232)
18 – 29 months: 11700 / 51720 ˜ 22.6% (0.226)
30 – 41 months: 11340 / 51720 ˜ 21.9% (0.219)
42 – 53 months: 11160 / 51720 ˜ 21.6% (0.216)
54 – 59 months: 5520 / 51720 ˜ 10.7% (0.107)
Check: 23.2% + 22.6% + 21.9% + 21.6% + 10.7% = 100%.
Hope this is helpful.
Thanks
Answered:
8 years agoA comment and a question ...
Comment : I think you have to be very careful when working with expected numbers generated from a model that is external to the data (as you have above). If the model is wrong then you will reject the (correct) data rather than the (wrong) model. Odd things happen in emergencies and we often use SMART in emergencies. Epidemics can knock chunks out of a population that the proposed model cannot include. Again, you would wrongly reject reality in favour of an abstract model. There are other reasons why the model could be wrong ... a moment's thought adds a sex ratio other than 1:1 (this is almost always the case), preferential abortion of females, female infanticide, preferential treatment in terms of diet and medical care to one sex, improving public health, failing public health, changing patterns of fertility .... I'm sure there are many more.
Question : This is the same question you pose above and (I think) fail to answer. It is "Where do the expected numbers of children expressed in the statistical test for sex and age ratios come from?". What are the details of the model. I can see that the sex ratio at birth is assumed to be 1:1 and that mortality is assumed not to differ between the sexes. How are the numbers 1000, 975, 945, 930, 920 found? Can we see the details of this model and how it is calculated? Without this we cannot assess if it is reasonable. Can you please show this in a sort of spreadsheet table so we can see how things are calculated?
Answered:
8 years agoHi All,
The Overall Sex Ratio and Overall Age Distribution are the two key Plausibility Check criteria useful in assessing the representativeness of a survey sample (detection of any selection bias) in relation to the expected age/sex distribution of the U5 population.
The values in ENA for SMART Options Tab have been set on the assumption of having equal number of boys and girls (aged 6-59 months) in a stable population size with similar birth rate and a death rate of about 1/10,000/day. Following extensive analyses of mortality trends across various countries, these figures in the Options Tab portray an expected exponential decrease in mortality for children from 0 to 59 months. For sake of comparison between surveys within and across countries, these figures should not be adjusted when generating the Plausibility Check.
A significant difference in the Overall Sex Ratio and Overall Age Distribution (the chi-square test shows a p-value <0.05), may be indicative of either:
1) Sampling problem, or
2) Significant deviation from normal demographics for this population (birth, death and differential migration rates)
These tests shouldn’t be solely relied upon to invalidate a survey, but rather be intuitively utilized to draw attention on the possible causes of deviation from what is expected based on the survey context at hand. Further details on the SMART Plausibility Check Chapter can be found here: http://smartmethodology.org/survey-planning-tools/smart-methodology/
Thanks
Answered:
8 years agoKennedy,
I think we are getting closer.
You write:
"The values in ENA for SMART Options Tab have been set on the assumption of having equal number of boys and girls (aged 6-59 months) in a stable population size with similar birth rate and a death rate of about 1/10,000/day. Following extensive analyses of mortality trends across various countries, these figures in the Options Tab portray an expected exponential decrease in mortality for children from 0 to 59 months. For sake of comparison between surveys within and across countries, these figures should not be adjusted when generating the Plausibility Check."
I still find this a little vague.
I am interested in how the numbers (i.e. 1000, 975, 945, 930, 920) were arrived at. Can you show the working? I am a little stuck ...
Mortality at 1/10,000/day is the same as 36.525 / 1000 / year so we'd expect the second number to be 1000 - 36.525 = 963 (rounded) not 975.
If we apply 1/10,000/day as a daily rate we get 1000 * (1 - 1/10000)^365.25 = 964 (rounded) not 975.
If the 1000 is the mid-term population then we'd start with 1000 * (1 + 1/10000)^(365.25/2) = 1018 children and end with 1018 * (1 - 1/10000)^365.25 = 981 (rounded) not 975.
I am also not sure that "expected exponential decrease in mortality" can give rise to the numbers (i.e. 1000, 975, 945, 930, 920). Simple differencing gives:
(1000 - 975) / 1000 = 2.5% mortality (c. 0.68 / 10,000 / day) ( 975 - 945) / 975 = 3.1% mortality (c. 0.84 / 10,000 / day) ( 945 - 930) / 945 = 1.6% mortality (c. 0.43 / 10,000 / day) ( 930 - 920) / 930 = 1.1% mortality (c. 0.29 / 10,000 / day)
Mortality is not decreasing exponentially (there is an increase in the second year). The average mortality rate is about 0.56 / 10,000 / day not "about "1/10,000/day".
I think that something may be wrong with the SMART numbers or (more likely) something wrong with my thinking or my arithmetic. Please help clear my confusion.
Thanks.
Answered:
8 years agoYes, it looks like the assumed risk of mortality used to perform the SMART plausability check is actually higher in the 18-29 month age group than in the 6-17 months age group. I know of no population in which this is actually true. Also, I agree with Mark's calculations; the assumed mortality rate is substantially lower than 1 death per 10,000 population per day. I also agree with Mr. Musumba's statement that we cannot make a reflexive decision based solely on the numbers produced by ENA.
In fact, in this spirit, I would question the usefulness of any such mechanical check on data quality. Because each population is subjected to different mortality and different factors affecting the sex and age ratio, no model can be universally applicable. Automatically rejecting as flawed those survey data which do not adhere to some artificial standard would be a grave mistake. Instead of trying to produce some simplified single summary statistic, I would instead plot a frequency histogram by age group to determine the relative numbers of children of different ages in the survey sample. If something looks unexpected or dodgy, it may be advisable to conduct further investigation of the population from which the sample was drawn or the sampling methods used. Moreover, simply identifying possible sampling bias is insufficient; one must carefully consider whether this sampling bias has altered the apparent results. A sampling bias without effect on the survey results has no impact on one's confidence in any conclusions derived from those results.
Answered:
7 years agoThanks for this.
I am concerned that the SMART team allowed this to pass. It seems that they are very strong on quality control when it comes to survey data but less so when it comes to quality assurance of their own work. I assume that this material will be reviewed and revised (including in the ENA software) in the near future. I hope this will be done openly. Can SMART confirm this?
I agree with your and Ken's comments regarding unthinking application of quality checks. In this case the model used to derive expected numbers seems to have been defective. It is possible that many SMART surveys were rejected (SMART documents would consider such surveyes as not "usable") or classed as "poor quality" based on non-compliance to a defective model.
I do think that such tests may have value for detecting potential problems with the representativeness of survey data but, as you say, this should prompt further investigation rather than an automatic rejection of survey data. I think your graphical approach is good. I would use a population pyramid (age by sex) for this.
Answered:
7 years agoThe comments about SMART are incorrect in this thread as in nearly all previous threads concerning SMART. Again, there are key issues which are wrong or misrepresented.
The zero to 5 mortality rate is of course counted from BIRTH and not from 6 months of age. This means that 5q0 and the 5q0.5 will be completely different. Because the deaths that occur in the first 6 months of life have been neglected in the posts made the calculations are simply wrong. If the data from www.childmortality.org and Wang et al (Age-specific and sex-specific mortality in 187 countries, 1970-2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 380, 2071-94, 2012) are consulted you will find that globally about 50% of deaths are neonatal and a further 30% post-neonatal (1 to 12months). In sub-Saharan Africa because the 5q1 mortality is increased the relative proportion of deaths that are neonatal falls to about 30%. Naturally, the mortality from 6 to 59 months will be substantially less than the mortality from birth to 59 months; but the difference is less in developing than in developed countries. To do the calculations correctly it is necessary to start at birth, abstract that part of the curve that encompasses the 6 to 59 month old children and then average this over the age ranges of interest.
As many of the surveys are conducted in areas under stress we have used the following equation to calculate the rate of loss of children as 10,000 -364.4xLN(day). The data are adjusted to having 1000 children in the first age range. The mortality rate from 6 months to 59.9 months using this equation is 0.616/10,000/d.
We do NOT recommend REJECTION of a survey on the basis of any one criterion as repeatedly asserted. There needs to be several parameters where a survey’s results deviate from what is normally found with a well conducted survey before it accumulates sufficient points to be “problematic”. Problematic surveys then require that the report justifies the deviation, before acceptance.
Although there are more males than females at birth (51-52:48-49), the mortality rate of males is higher than females from birth (globally 5q0 mortality ratio male:female is about 56:44), progressively evening up the sex ratio. Thus the assumption of equal numbers of boys and girls during the period 6-59months is justified. Well conducted surveys nearly all have about equal numbers of boys and girls providing empirical confirmation. Your theoretical reasons for a “legitimate” change in the sex ratio sufficient to alter analysis are not justified empirically. Again, in the very few countries where sex ratio in children 6-59 months deviates meaningfully from 1:1 the surveys would not be penalized for age distribution in the sample if it’s in the expected direction and is justified by reliable country demographic data.
There is one typo in the default age distribution numbers on ENA Options page– the second figure (for the age group 18-29 months) should be 965 and not 975. It will be corrected in the next updated version of ENA. Correspondingly, the expected age ratio of 6-29 months to 30-59 months in the Plausibility check report will be adjusted from 0.85 to 0.84. These changes will have a trivial effect on which surveys would be flagged for justification because of the age distribution in the sample.
This will conclude SMART input on the topic.
Answered:
7 years agoResults of population census results of Ethiopia (2007) shows the following proportion of child population for each age band:
Ethiopia
Age band Number %
0-11.9 months 1,775,454 16%
12-23.9months 1,964,606 18%
24-35.9months 2,294,205 21%
36-47.9months 2,263,614 21%
48-59.9months 2,499,143 23%
10,797,022
Tigray Region
Age band Number %
0-11.9 months 119,603 19%
12-23.9months 113,812 18%
24-35.9months 128,436 20%
36-47.9months 136,531 22%
48-59.9months 132,480 21%
630,862
Afar Region
Age band Number %
0-11.9 months 14,291 10%
12-23.9months 21,581 15%
24-35.9months 32,306 23%
36-47.9months 31,872 22%
48-59.9months 42,327 30%
142,377
Amhara region
Age band Number %
0-11.9 months 428,329 18%
12-23.9months 446,651 19%
24-35.9months 476,654 20%
36-47.9months 484,432 21%
48-59.9months 501,857 21%
2,337,923
Oromia Region
Age band Number %
0-11.9 months 732,201 16%
12-23.9months 825,025 19%
24-35.9months 978,883 22%
36-47.9months 911,247 20%
48-59.9months 1,003,961 23%
4,451,317
Somali Region
Age band Number %
0-11.9 months 31,626 7%
12-23.9months 62,901 14%
24-35.9months 116,666 26%
36-47.9months 85,482 19%
48-59.9months 153,024 34%
449,699
SNNP Region
Age band Number %
0-11.9 months 370,781 16%
12-23.9months 413,107 18%
24-35.9months 467,578 20%
36-47.9months 522,512 22%
48-59.9months 567,277 24%
2,341,255
Answered:
7 years ago