Hello,
I am seeking help on the determination of sample size for establishing the usefulness of diagnosis of a disease with a questionnaire vis-a-vis regular diagnosis by a specialist.
How may I calculate the sample size? prevalence is 5.5/1000, 95% CI, desired sensitivity 80% and desired specificity 90%, precision 5%. Does it include power aspect as well, if yes, then how? I noted formulas that estimate sensitivity and specificity seperately, but can both aspects be integrated together at one time for estimating sample size?
I wish to establish this questionnaire diagnostic method as good as diagnosis by a specialist and to have high sensitivity and specificity.
I noted a survey that mentioned, sensitivity 85%, specificity 50%, precision 8%, prevalence 15%, alpha risk 5%, then it gave a sample size of 77 diseased and 151 not diseased. How is this calculation made?
A help is sought and will be highly appreciated.
I'll take a swing at this ...
Given the 2-by-2 table:
Specialist + Specialist -
------------ ------------
Questionnaire + a b
Questionnaire - c d
------------ ------------
a + c b + d
------------ ------------
Sensitivity is :
a / (a + c)
Specificity is :
b / (b + d)
You want to estimate sensitivity and specificity with reasonable precision.
Both sensitivity and specificity are proportions:
numerator denominator
--------- -----------
Sensitivity a a + c
Specificity b b + d
--------- -----------
There are, therefore, two sample sizes to consider (i.e. "a + c" and "b + d").
The required sample sizes will depend on the proportion to be estimated and the precision required. It is usual to assume that both sensitivity and specificity will be considerably above 50% (since we hope the instrument will perform better than just tossing a coin). In this example I use 80% sensitivity and 90% specificity both to be estimated with a precision of +/- 5% :
Target Precision
------ ---------
Sensitivity 80% +/- 5%
Specificity 90% +/- 5%
------ ---------
A Standard sample size formula is used:
n(sensitivity) = (0.8 * (1 - 0.8)) / (0.05/ 1.96)^2 = 246
n(specificity) = (0.9 * (1 - 0.9)) / (0.05/ 1.96)^2 = 139
To obtain the overall sample size in a single sample the following conditions must be satisfied:
(a + c) >= 246
(b + d) >= 139
Note that (a + c) is the number of TRUE POSITIVES and that (b + d) is the number of TRUE NEGATIVES. In your problem "truth" is decide by the specialist.
The cheapest approach to achieving the sample is to select the first 246 patients diagnosed as POSITIVE by the specialist(s) and the first 139 diagnosed as NEGATIVE by the specialist(s). Prevalence is often high in specialist clinics so you may have to wait some time for the 139 negatives. If this is the case the an expedient measure is to take a sample of (assumed) negatives from another clinic which you will probably want to match on potential confounders and known risk factors (e.g. age, sex, SE status, ethnicity).
A simple population sampling based approach will be expensive with a low prevalence condition. You quote a prevalence of 5.5 / 1000 which (expressed as a proportion) is 0.0055 (0.55%). You would need a sample of about:
(1 / 0.0055) * 246 = 44728
to find 246 POSITIVE cases. This is impossible within reasonable resource constraints.
NOTES :
(1) The sample size calculation is based on guesses of values which are unknown at this time. It is prudent, therefore, to increase the calculated sample sizes slightly.
(2) All study subjects must be screened by the questionnaire and the specialist(s) who should be blind to the results of the other method (i.e. the specialist should not know the questionnaire result and vice-versa).
(3) You will need to consider ethical aspects of the trial. All cases found must be treated.
As usual ... check my arithmetic!
I hope this is of help.
Mark Myatt
Technical Expert
Answered:
9 years ago