Answered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoAnswered:
15 years agoA query from a client about digit preference in some MUAC screening data got me thinking about the digit preference issue a bit more.
I got to thinking that the PROBIT estimator that is used in RAM, RAM-OP, S3M, and the LP surveillance system might be more resistant to digit preference than the classical estimator that is used in SMART, MICS, DHS, and the rest.
I decided to test this with a simple R language simulation which I repeat here.
Here we generate a population:
pop <- rnorm(n = 1000, mean = 140, sd = 11)
summary(pop)
The summary is:
Min. 1st Qu. Median Mean 3rd Qu. Max.
104.0 131.7 139.2 139.4 146.6 173.8
See the figure at the end of this post.
GAM is:
table(pop < 125)
which gives:
FALSE TRUE
902 98
which is 9.8%.
SAM is:
table(pop < 115)
which gives:
FALSE TRUE
985 15
which is 1.5%.
MAM is:
9.8% - 1.5% = 8.3%
These we consider to be the true prevalences.
If we "pollute" this data by forcing all data to end with “0” or “5”
popDP <- round(pop / 5) * 5
The first 20 values in popDP:
popDP[1:15]
shows:
135 145 160 125 150 135 150 145 120 155 155 145 135 135 135
This is an extreme digit preference. The digite preference score (from SMART) for this data is 66.68. We'd probably dismiss this data as junk.
The "pollution" does not change the summary statistics much:
summary(popDP)
gives:
Min. 1st Qu. Median Mean 3rd Qu. Max.
105.0 130.0 140.0 139.4 145.0 175.0
Also see the figure at the end of this post.
GAM is now:
table(popDP < 125)
which gives:
FALSE TRUE
935 65
which is 6.5% (down from the true 9.8%).
SAM is:
table(popDP < 115)
which gives:
FALSE TRUE
991 9
which is 0.9% (down from the true 1.5%).
MAM is:
6.5% - 0.9% = 5.6%
down from the true 8.3%.
Extreme digit preference in this simulation causes large underestimates. Not good.
If we use PROBIT estimators … for GAM:
pnorm(125, mean = mean(popDP), sd = sd(popDP)) * 100
we get:
10.01426
That is 10.0% (close to the true 9.8%)
For SAM:
pnorm(115, mean = mean(popDP), sd = sd(popDP)) * 100
we get:
1.499806
that is 1.5% (exactly the same as the true 1.5%).
MAM is:
MAM = 10.0% - 1.5% = 8.5% (close to the true 8.3%)
This is a simple model which assumes normality and rounding to nearest 0 or 5. In real-life data we might have some deviation from normality. In that case we can transform the data towards normality and then apply the PROBIT estimator.
PROBIT appears to be robust to problems of digit preference. It seems to me that, if true, this is an important finding.
Here are the true (pop) and "polluted" (popDP) distributions:
Answered:
8 years ago