There are 2 primary audiences for QA. The first is local site implementers (SDMs) who want confirmation/proof that their implementations pass at least minimal muster, or else know exactly where their implementation has gone wrong. The second audience is potential users of the data, who are similarly looking for assurance that the data at the sites will meet minimal expectations and generally be usable for a given study. In addition, the second audience will often want to see basic descriptives on the various fields, in order to see e.g., just how much gender identity data you have. These statistics are also good for forming opinions on how homogenous/harmonious the implementations seem to be across sites. If one site shows an average of 7 ambulatory visits per member per month, and everybody else is around .4, then that’s maybe worth looking into.
Generally, you will want to create 2 separate reports in a
QA process, one for each audience.
The presumption is that QA failures will either be fixed or
documented as not fixable. So new fails should be actionable by the implementing
site (unless they are known issues that have been explained on the issue
tracker. If an issue isn’t worth following up (e.g., 3 stray records out of 8
million) don’t put a fail in the report.
Best Practices/Pet Peeves
- QA
Programs go hand-in-hand with the specs to define a given data area. Expect
to hear from SDMs running your program about differences of opinion/interpretation
of the specs. It may take some dialogue & iteration to settle on checks
and tolerances. You should list the objective checks & those
tolerances on your collated report along with (at least) a summary of the
results of those checks at the sites.
- Don't
make me hunt for the table listing checks & pass/fail results--that
should be the first thing I see, right at the top of the output.
- I
should be able to ctrl-f for the word 'fail' and instantly know if I have
any fails & if so what they are.
- Be
thoughtful about what reporting output you create for the local implementer/user.
In particular, thou shalt not spit proc contents output into a
report. If you want a full CONTENTS on my dset, run that out to an
output dataset and consume that in your collated report. Don’t forget that
you can produced most anything you want once you get results back from the
sites.
- Make
your collation process and collated report creation process as streamlined
as possible, and favor publishing the results immediately after you get
updated results.
- You should be able to receive updated results at any
time, and have only to unzip contents into a folder, run a single
program, and then upload the resulting collated report.
- Give Sites immediate credit for having fixed
fails
- While it would be wonderful to always have the time to
pore over results and compare site implementations & follow-up with individual
sites, it is also not crazy to basically crowd-source this work.
- Produce
the smallest number of output datasets as will hold the data you
need. Each one has to be checked for PHI before it can be returned.
Consider concatenating multiple similar datasets together.
- Save
off close-to-raw records that violate a check and point the user to that
dset in the part of the report that lists the fail, so they don't have to
write their own code to (dis)confirm the issue & see some samples data
to start figuring out how it got into the dset.
- Graphics are way better than tables.
- Format values to make them interpretable to ppl who have not memorized the spec.