Name Combining of multiple an independent surveys through weighting

is there simple guideline or tool that guides how to combine multiple an independent surveys into one survey that represent the wide geographic area? in a given province, five surveys (each 30by30) were conducted. During the analysis, at provincial level, we want to combine them by weighting in view of the district population size differences.

You have to be careful doing this as you may end up hiding variation behind a rather meaningless average. It is almost always more useful to present per district results if you can (a map is best as there may be some clear spatial pattern that will not be so clear in a table) rather than a single wider-area average. I think you could do both (i.e. present per-district estimates and then per-proving summary estimate. The first thing to do it to check that it make much sense to combine all the surveys in order to give a single result. This is only the case when the estimates from each survey are similar to each other. This can be as simple as a visual check using a "forest plot" of estimates and 95% CIs. Here is an example of similarity:



  Survey 1       |-----*-----------|
  Survey 2          |-----*----------|
  Survey 3     |------*------------|
  ...
  Survey N        |-----*-----------|
            +--+--+--+--+--+--+--+--+--+--+--+--+--+
            8  9  10 11 12 13 14 15 16 17 18 19 20 12

                          Prevalence (%)

Note that the point estimates (marked by the "*") are close to each other and there is a lot of overlap of the 95% CIs. In this case an average will have meaning. Here is an example of dissimilarity:



  Survey 1     |-----*--------|
  Survey 2                |-----*----------|
  Survey 3                    |------*---------|
  ...
  Survey N   |---*------|
            +--+--+--+--+--+--+--+--+--+--+--+--+--+
            8  9  10 11 12 13 14 15 16 17 18 19 20 12

                          Prevalence (%)

Note that the point estimates are widely spread and some of the CIs do not overlap much or at all. In this case an average will hide variation and would best be avoided. The pooled proportion is a population weighted average of the proportions found by each survey:



                      p1 * w1 + p2 * w2 + ... + pn * wn
  Pooled proportion = ---------------------------------
                               w1 + w2 + wn

where:



  p1 = proportion from survey 1
  p2 = proportion from survey 2
  .
  . and so-on
  .
  w1 = population in area for survey 1
  w2 = population in area for survey 3
  .
  . and so-on
  .

Complications arise when trying to pool variances. This is because the survey samples are complex and the variance is influenced by the proportion, the sample size, and the survey design effect. One way to approach this problem is to calculate the standard error (SE) from the estimates and 95% CIs reported from each survey:



       Upper Confidence Limit - Lower Confidence Limit
  SE = -----------------------------------------------
                         2 * 1.96

The pooled SE is:



                  ( SE1^2 * w1 + SE2^2 * w2 + ...  SEn^2 * wn )
  Pooled SE = sqrt( ----------------------------------------- )
                  (               w1 + w2 + wn                )

where:



  SE1 = SE for survey 1
  SE2 = SE for survey 2
  .
  . and so-on
  .
  w1  = population in area for survey 1
  w2  = population in area for survey 3
  .
  . and so-on
  .

The pooled estimate is:



  Pooled estimate = Pooled proportion +/- 1.96 * Pooled SE

Here is an example with three surveys only ... the survey results are:



  Survey   Population p       LCL   UCL
  -------- ---------- ----- ----- -----
  Survey 1     23,670 12.7%  9.7% 16.1%
  Survey 2     16,546  9.3%  6.3% 13.2%
  Survey 3     19,201 13.5%  9.8% 18.0%
  -------- ---------- ----- ----- -----

The pooled proportion is:



  Survey        w     p p * w
  -------- ------ ----- -----
  Survey 1 23,670 0.127 3,006
  Survey 2 16,546 0.099 1,638
  Survey 3 19,201 0.135 2,592
  -------- ------ ----- -----
       Sum 59,417       7,236

  Pooled proportion = 7236 / 59417 = 0.122

The pooled SE is:



  Survey        w   LCL   UCL    SE     SE^2  SE^2 * w
  -------- ------ ----- ----- ----- -------- ---------
  Survey 1 23,670 0.097 0.161 0.016 0.000256  6.059520
  Survey 2 16,546 0.063 0.132 0.018 0.000324  5.360904
  Survey 3 19,201 0.098 0.180 0.021 0.000441  8.467641
  -------- ------ ----- ----- ----- -------- ---------
       Sum 59,417                            19.888070

  Pooled SE = sqrt(19.88070 / 59417) = 0.0183

The pooled estimate is:



  Point estimate = 0.122
  95% LCL = 0.122 - 1.96 * 0.0183 = 0.086
  95% UCL = 0.122 + 1.96 * 0.0183 = 0.158

or 12.2% (95% CI = 8.6% - 15.8%). Important ... (1) Someone should check my thinking and my arithmetic. (2) When you do these sorts of calculation you should do them to the full precision throughout and only round at the end. I did not do this above so there will be some accumulated rounding error in the final result above. I hope this is of some use.

Anonymous

Answered:

10 years ago

Dear Mark, Thank you very much for the detail response.

Anonymous

Answered:

10 years ago