Hello,
I'm currently working on my Public Health master thesis and I am using the data collected from a community-based cluster randomized control trial, in Angola. The study consisted of implementing different interventions to reduce child malnutrition in Angola, over a period of 2 years. I am now looking at the data collected at baseline and follow ups, aiming to see the effectiveness of the intervention over time and differences between interventions. My current challenge in cleaning the data is that I found children with HAZ of -7 , -6, -5, etc, and of 8, 7, 6, etc. As I was not part of the team collecting the data, I can't be confident that these z-scores result from accurate measurements and recording. This way, I'm looking for references/guidelines that can help me identify what are outliers and what are impossible/extremely unlikely measurements. I understand that looking at length/height can be tricky and there can indeed be extreme cases, but I suspect that I have a too big of a number of "outliers", some of which I consider quite extreme.
Did anyone come across the same challenge before? And how would you "solve" it? What can I use as reference to guide me and justify the exclusion of these cases?
Thank you in advance and have a great weekend.


Hello
According to OMS ,Extreme (i.e. biologically implausible) Length/height-for-age z-score is >6 or<-6
you can have more information http://www.who.int/childgrowth/en/

FRANCK ALE

Answered:

6 years ago

"Biologically implausible" does not mean impossible. A case in point: During one round of fieldwork in Zimbabwe, we made a point of analyzing the anthropometric data before anything else. This allowed us to pick up the extreme outliers automatically flagged by the software we were using (Anthro) and then return to the field the following week to assess whether the measurements were accurate. At just one village site, we had 23 outliers. Remeasurement showed that, allowing for a week's growth, 21 of the 23 were valid measurements. That's 91%. Excluding those flagged cases would have seriously biased our results and led to underestimates of the extent of undernutrition. Put differently, those children should not have been alive...but they were.
But you don't have the option of reassessing the children. Two suggestions. One, since measuring the length of very young infants is known to be difficult, you might stratify your sample by age to see if there is a pattern of outliers falling mainly among the very young. If such a pattern exists, it points toward measurement errors. Two, you should have at least two observations for each child -- baseline & follow-up. Check and see if the same child is recorded as an outlier, or near-outlier, at both points in time. (If > 2 points, even better.) If there is a pattern of the same children being recorded as outliers across time, this suggests that inaccurate measurements are unlikely to be the primary problem.
Finally, just check the statistical distribution of the wild z-scores -- across space and time...and across enumerators if you have that information. This analysis should help you narrow down the source of extreme values.
Hope this helps.

Bill Kinsey

Answered:

6 years ago

Dear Diana, this is one of the challenges I faced in Ethiopia. Here, 'age estimation' by care takers is based on 'recall' & it is prone to errors. That reduces validity of survey data to calculate age related indicators. Such implausible findings can also come from this. I am not sure if getting accurate 'age' is a challenge in Angola.

Yared Ab.

Answered:

6 years ago

Hi Diana

We have done some work with survey datasets, including DHS, and commonly see such values. It is a real problem because whilst we do genuinely see children down to a HAZ of around -8Z (and have re-measured these children to be sure) there should normally be very few above HAZ +3Z, actually most LMICs should not have many HAZ above +1Z. When we looked at the DHS datasets the outlying values are clustered by dates and locations suggesting individual measurers are making major mistakes. I suspect that is what may be happening in your data. For your MSc, you might want to first do scatter plots of HAZ by date, and if you have the information, by measurer. Then you could also look for outliers on a plot of HAZ against WAZ. Measurement errors are more likely in infants, so you could also plot by age. We previously found that whilst reliability evaluation of individual weight and length/height didn’t look too bad, when they are converted to Z scores, errors (most likely in length/height) were massively magnified. See: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282477/

All the best

Jay

Jay Berkley
Technical Expert

Answered:

6 years ago

Dear Franck, Bill, and Jay,
Thank you so much for your very helpful insights and knowledge sharing.
After some consideration, your sugestions, and looking again at the data I have, I decided to exclude all children with a HAZ >6 SD and <-6 SD. Also, I noticed cases that had one measurement standing out during the in the follow ups, hence I excluded from the analysis all children who had a difference of z-score between 2 consecutive measurements (6 months intervals) equal or higher than 4 (often this was a measurement between 2 much different values).

Once again, thank you for the advice.
Kind regards,
Diana

Diana

Answered:

6 years ago

Hello
i really appreciate Bill's advice (sometimes outlier is not bad data) and again i just want to let you know that you need to do analysis with and without outliers before dropping them. And you need to take into account how many people you have after excluding outliers .
Thanks and good luck

FRANCK ALE

Answered:

6 years ago
Please login to post an answer:
Login