Monday, March 6, 2023

Thoughts On Multi-Site QA: Best Practices & Pet Peeves

There are 2 primary audiences for QA. The first is local site implementers (SDMs) who want confirmation/proof that their implementations pass at least minimal muster, or else know exactly where their implementation has gone wrong. The second audience is potential users of the data, who are similarly looking for assurance that the data at the sites will meet minimal expectations and generally be usable for a given study. In addition, the second audience will often want to see basic descriptives on the various fields, in order to see e.g., just how much gender identity data you have. These statistics are also good for forming opinions on how homogenous/harmonious the implementations seem to be across sites. If one site shows an average of 7 ambulatory visits per member per month, and everybody else is around .4, then that’s maybe worth looking into.

Generally, you will want to create 2 separate reports in a QA process, one for each audience.

The presumption is that QA failures will either be fixed or documented as not fixable. So new fails should be actionable by the implementing site (unless they are known issues that have been explained on the issue tracker. If an issue isn’t worth following up (e.g., 3 stray records out of 8 million) don’t put a fail in the report.

Best Practices/Pet Peeves

QA Programs go hand-in-hand with the specs to define a given data area. Expect to hear from SDMs running your program about differences of opinion/interpretation of the specs. It may take some dialogue & iteration to settle on checks and tolerances. You should list the objective checks & those tolerances on your collated report along with (at least) a summary of the results of those checks at the sites.
Don't make me hunt for the table listing checks & pass/fail results--that should be the first thing I see, right at the top of the output.
I should be able to ctrl-f for the word 'fail' and instantly know if I have any fails & if so what they are.
Be thoughtful about what reporting output you create for the local implementer/user. In particular, thou shalt not spit proc contents output into a report. If you want a full CONTENTS on my dset, run that out to an output dataset and consume that in your collated report. Don’t forget that you can produced most anything you want once you get results back from the sites.
Make your collation process and collated report creation process as streamlined as possible, and favor publishing the results immediately after you get updated results.

You should be able to receive updated results at any time, and have only to unzip contents into a folder, run a single program, and then upload the resulting collated report.
Give Sites immediate credit for having fixed fails
While it would be wonderful to always have the time to pore over results and compare site implementations & follow-up with individual sites, it is also not crazy to basically crowd-source this work.

Produce the smallest number of output datasets as will hold the data you need. Each one has to be checked for PHI before it can be returned. Consider concatenating multiple similar datasets together.
Save off close-to-raw records that violate a check and point the user to that dset in the part of the report that lists the fail, so they don't have to write their own code to (dis)confirm the issue & see some samples data to start figuring out how it got into the dset.
Graphics are way better than tables.
Format values to make them interpretable to ppl who have not memorized the spec.

Thursday, October 27, 2022

Provider specialty data--wrangling and using

In medical research data harvested from insurance/care data it's often important to know something about the providers that appear in the data. In particular we frequently want to know what the type of provider is--were they a:

Nurse Practitioner
Physician's Assistant
Social Worker
Technician

and so forth. It's also frequently important to know what that person's specialty is--for example:

Transplant surgery
Pediatrics
Oncology

Trying to make this data available for research--particularly in hybrid organizations like HMOs, can be a serious challenge. In general, we'd like to have a lookup table that lists every provider's type and specialty as it existed at the time of an encounter, say. Unfortunately, our sources don't often provide that information, and so we wind up with a pile of types and specialties for each provider that can be confusing to use.

Reasons a single provider might have more than one record in a provider type/specialty table

They are (or were) truly qualified in multiple specialties

simultaneously (say, internal medicine and oncology), or
they re-trained over the course of their career (e.g., started out as a registered nurse, became a nurse-practitioner)

We drew data for the same individual from multiple systems, and the information is inconsistent because:

one or all of the systems are wrong about the type/specialty
one system has a bad identifier & they're not really the same individual

Some of the data is just bad (e.g., indicates additional specialties not actually merited).

Unfortunately our sources (National Provider Identifier Registry, our various Electronic Medical Record systems, our claims processing systems) don't often (ever?) give guidance as to the time progression of various types and specialties. Lacking that information, we are powerless to convey it to users. So what can we do?

Document the scope of the problem

Looking just at our EMR data, a little more than 94% of providers have just a single listed specialty. If that's typical, while probably not negligible, it's not a show-stopper either.

Provide Users All The Data We Have, warn them of the issue, and give whatever guidance we can

So if you happen to know from your analysis that the NPI registry data is particularly dodgy, say, let them know that in cases of ambiguities, that they should prefer records coming from sources other than NPI.

Similarly, it's often useful to look at other contextual clues on particular encounters--what procedures/diagnoses bear that provider's identifier? Social workers don't perform Radial Keratotamy surgery, right--so if that's what looks to be happening, one of those fields is lying.

Department (when it's available and not imputed from provider specialty) can also be an important clue.

For Heavy-Hitters, Analyze and Special-Case Particular Providers' Data

In our provider table I like to include off-spec fields that give the number of encounter & rx fill records that bear that provider identifier. That lets me optimize my bang-for-buck in terms of my precious time inspecting/analyzing/researching seeming ambiguities. Providers that only appear on a handful of records do not merit the same attention that the ones with higher numbers do.

When we do find bad data, I am absolutely not above including lines like:

if provider = 'some_dude' and specialty = 'wrong specialty' then delete ;

in my code.

Maybe Someday--Do something AI-ish to flag inconsistencies or (dare I hope) try to construct time-periods

This should be possible to do in a centrally created Quality Assurance type program, but it would probably be a ton of work and is not feasible at the moment.

Wednesday, October 27, 2021

"Semantic" Version numbering for VDW Specifications

Software Engineers Have A Good Version Numbering System

Software projects large and small will often adopt what's known as "semantic versioning", which you can read about in-depth here, but let me reproduce the summary real quick so you don't have to stop reading this:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible [interface] changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

While the VDW specs are not in fact a software product they do share a lot of similarities. Specifically,

they change over time, and
programmers rely on them to ensure compatibility of the code they write.

VDW Should Adapt This System

It seems to me this is a model we should adapt for our own uses. Specifically, and in the light of yesterday's Implementation Group meeting decisions:

The MAJOR version applies only to the set of specs in the aggregate. Individual table specs inherit the MAJOR version (which means they will sometimes change even when particular table specs have not changed).

A Proposed Adaptation

Reasons to increment the MAJOR version include:

We have a large number of individual spec changes come through in a very short period of time + a concerted effort to implement them (a la the version 2 -> 3 changes).
The number of un-coordinated spec changes have accumulated sufficiently in the judgment of the VDW Operations Committee Lead's opinion that the new major version is warranted.

Since MINOR and PATCH levels only apply at the individual table level, I would propose we increment the MINOR level for changes that will break existing code, or are otherwise important (again in the judgment of the VOC lead). For example:

Change in the primary key.
Addition or removal of one or more fields, or changing the name of a field.
Significant expansion of the permissible values allowed for a field (for example the Virtual Care additions to the utilization specs)
Change in a reference spec to which a field is tied (for example if we were to decide to update the language table to tie values from ISO 639-2 to ISO 639-3)

By contrast PATCH levels are incremented for less-important changes like:

Edits to descriptions, either at the field or whole-table level
Correcting obvious typos.

This will of course entail adding PATCH numbers to the existing table specs, but doing so will give us the freedom to note changes while at the same time distinguishing between more-important and less-important ones.

VOC Lead Makes The Call

Change proposal authors should feel free to suggest incrementing/not incrementing MINOR and PATCH levels if the proposal is adopted, but the ultimate decision should be made by the VOC lead.

What do you think--can this work? How would you decide to handle particular types of changes?

Friday, March 16, 2018

Elements of a Distributed Research Data Network

As preparation for a panel discussion at Health Datapalooza 2018 on distributed data models, I wanted to jot down some thoughts on what I think you need to have a successful distributed research network. (Possibly these things are beside the point of that panel, but I've been wanting to write this for a little while. So here it is.)

Research Networks

What I mean by a 'research network' is a group of data-having organizations who intend to offer data access to remote users via some interface, in order to answer research questions. For me the paradigmatic example is of course the HCSRN. That's the organization I'm most familiar with, having been in on its VDW effort from pretty much the beginning in 2002.

So what do you need for a distributed research data network?

A Data Model

A data model is generally a collection of specifications for the tables that are available in the interface you're offering to remote users. It communicates to your users:

What data they will find in your instance (e.g., lab results; insurance periods, pharmacy fills)
How to refer to the data—e.g., table and field names.
How the data is structured—what relationships the tables bear to one another.

At least as important, the data model should communicate to your implementers just what data has a place in the data model, what it needs to look like, and where in the scheme it should go.

To that end, good specifications are more than just those table and field names. They should also feature some verbiage that sets out what data does and does not belong in each table (e.g., the Race fields should hold the patient's self-reported race values only—do not use non-self-report sources for these fields). Optimally, they will also specify some natural key for the table—some combination of substantive fields that should uniquely identify a record (e.g., there should be one record per (MRN, RXDATE, NDC, and RXPROVIDER)). At some point this meta-description verges into full-on implementation guidelines, discussed below, but there should certainly be some general description to help users and implementers alike know what should be in the tables.

Important Adjuncts

Table specifications are the bare minimum you need for a successful DRDN. In addition to these, it is optimal to have:

Objective QA tests
Implementation guidelines
An authority to construe the specifications and answer user and implementer questions.

That last element is potentially very expensive of course—it may involve paying knowlegable humans to respond to questions and make recommendations/answer questions. If your network is very successful and can gain many implementers, then it's possible that an active community will spring up around it sufficient to keep it going without money changing hands (in much the same way that happens with some open-source software). But it would be unwise to count on that happening very early—much better to seed the ground by starting out with a paid group.

As the VDW effort matured I came to really appreciate the existence of objective QA tests as a supplemental source of data specification. Just the exercise of agreeing on what checks you should have (apart from 'all fields should exist & be the proper type') is incredibly valuable for surfacing the frequently diverse understandings and assumptions of your implementers.

Extra Credit

In addition to the above, it would be lovely to also include a sample implementation from widely available reference data (e.g., Optum, Medicare Claims). The OMOP folks have put together this implementation of an ETL process that creates OMOP data from CMS' synthetic public use file, for example.

Likewise, since the healthcare and health research fields are both ever-changing, it's good to have a process by which the DM can be amended.

Finally, in addition to a human-readable specification of your data model, it's always nice to include one or more flavors of SQL DDL scripts that create an empty set of the tables that constitute your data model.

A Query Processing Engine

This is what your users will use to manipulate the data from afar. Because they are doing so "from afar" it is crucial to be as specific as possible as to what that environment is like. Examples include:

python 3.6 or later, with pyodbc 4+ and pandas v02+
R 3.4 or later
Oracle PL/SQL version 9i or later
SAS 9.3 or later
::centrally created, possibly open-sourced, web app query tool with corresponding execution nodes at implementing sites::

For actual programming environments, where queries consist of human-generated, small batch free range artisinal code, you want to specify that environment in a lot of detail. Generally, you want to enable your users to create a similar environment locally (either with their local implementation of your data model, or from synthetic data) so they can formulate/test their queries before inflicting them on the network.

Opting for a custom web-based GUI application for specifying queries takes out a lot of the guesswork in terms of "will it run at the sites?" but of course that's at the cost of having to create said GUI application, and the sophistication of what users will be able to do.

A Registry of Implementations

Having some central clearinghouse where implementers can register that they have implemented your data model, are open for business, and set expectations in terms of what queries they're willing to entertain is crucial. The popularity of a data model comes from how many organizations have implemented it. This is also a great place for implementers to set expectations in terms of:

The extents-in-time of their data holdings.
The date they last ran the QA tests, and their results.
What QA issues are currently known, and whether they are fixable/in process of being fixed.
What specific versions of the query engine software they have available.
What hoops there are to jump through in order to get a response—e.g., data use agreements, fees, approvals, applications, etc.

On the HCSRN we have a portal website that we use to document these things.

An Implementing Community

If nobody implements your data model, you don't have a network. And as soon as more than one org implements, you have the problem of ensuring that different implementers are interpreting those specifications in something like the same way. In a lot of ways, specifications are like legislated laws—they usually sound logical, coherent, and reasonable when they are formulated, in the abstract, with particular use-cases in mind. But just like we need judges to construe those laws in order to apply them to specific situations (often disturbingly different from those anticipated when the law/specification was first written) we need at a minimum, a way to arrive at consensus regarding how specific decisions should be made.

Arriving at consensus means having a conversation, be it live at meetings, or on conference calls, or asynchronously in e-mail for on forum threads. The means for having those conversations are what I mean by Community.

Users!

No need to belabor this one. Without people interested in using your data model, again, you don't have a network.

What Else?

What have I forgotten? Tell me in the comments.

Monday, June 20, 2016

SAS Programming package for Sublime Text is now available via Package Control

Just a quick note to convey the information in the title there. Thanks to some lovely help from the community (especially @friedegg, @seemack and @bobfourie) the SAS package for Sublime is now more functional than ever, and is finally installable via the wonderful Package Control.

So if the previous, janky installation procedure has thus far kept you from trying out Sublime for SAS programming, do please give it a whirl.

Friday, February 26, 2016

Sublime Text, Intel Graphics and CTRL-ALT-ARROW hotkeys

Just in case this saves someone else (or future me) some head-scratching.

Work issued me a new laptop, onto which I promptly installed ST3. When I first hit ctrl-alt-down arrow to go into column select mode my display turned upside down. I've seen this dozens of timesby now--this is the graphics card add-on program making it easy to rotate my display (though why 180 degrees is a useful rotation that they want to serve is unclear). I hit ctrl-alt-up arrow to put things back to rights, and then go into the system area to look for the (in my case Intel) icon to turn off its hot-keys.

But now I go back into ST3 and those key combos don't work at all. Display stays properly oriented, but my sublime cursor moves noplace.

Is ST3 even receiving those keypresses? I open its console with ctrl-` and type "sublime.log_input(True)" to turn on key event logging. Nada, zip, nothing--something is completely eating those keypreses. I google for a bit (mostly finding posts from people who have inadvertently rotated their displays and don't know how to un-rotate them) and didn't find anything useful.

So I pull up the full Intel HD Graphics Control Panel app and go into the hotkeys section:

For a goof, I tried re-defining the rotate-y hotkey combinations from ctrl-alt-::something:: to shift-alt-::something::.

I enabled hot-keys & tried sublime again. That worked. Then I disabled hot-keys and it still worked. So I'm calling it a fix. This seems like a bug in Intel's utility though--it seems to be eating keystrokes that it's been told to ignore.

Tuesday, November 10, 2015

How we do Quality Assurance for the VDW

Last weekend I attended the most excellent PCORI Data Quality Code-a-thon, hosted by Michael Kahn and his colleagues over at University of Colorado, at which I met some really interesting and smart people doing really interesting work. A couple of them evinced an interest in VDW QA work and I said I'd share the substantive checks that we are doing.

Some Context

This is volunteer work

Like most everything VDW, QA work is largely unfunded and distributed across implementing sites. Volunteers from the data area workgroups (e.g., Utilization, Pharmacy, Enrollment) put together lists of checks pertaining mostly to their data areas, write VDW programs that implement the checks & periodically (generally annually, but sometimes more frequently) make a formal request that implementing sites run the code & submit their results to the program author(s) for collation & reporting out to the VDW Implementation Group.

One big implication of this is that our approach is not nearly as coordinated as an outside user might expect. I'd like to say that we are evolving toward a common approach, and we do have a new(ish) "Cross-File QA" group that's taking on meta-standards for QA work, but there is definitely a long way to go before this is uniform enough to be coherent to anyone not familiar with the history.

QA Is Multi-purpose

We generally try to kill 2 birds with our QA stones. Primarily we want to characterize the quality of our implementations for ourselves, each other, and our user community. But we also love it when our reports are useful to Investigators writing grant applications, who sometimes need to brag about e.g., how many person/years worth of data we have across the network for people with a drug benefit.

This can be a slippery slope, on occasion leading individual sites to declare that a given measure has strayed from QA (which is generally exempt from IRB approval) into substantive research territory, or else exposes what should be proprietary information. One example that comes to mind on Enrollment was a measure of churn--e.g., in a typical month, how many enrollees does a site tend to lose to disenrollment, and how many do they pick up? It's a constant dance/negotiation.

Roy's QA Prejudices

To my way of thinking the best QA:

Enables implementers to find (and fix) their own errors first, before exposing them to any larger audience. This is a matter of professional courtesy.
Includes as many objective checks as are practical to implement in code, and presents the running user with:

A clear list of what the checks are
What the tolerance is for those checks (e.g., 3% of your patient language records can have nonstandard values, but any more than 5% and we're going to say you failed the check).
Whether the file passed or failed each check.

More general descriptives characterizing the amount and predominant values in the data. These are often most useful when viewed as part of collated output so you can compare sites.
Collated quality/descriptive reports

should be readily available to the user community (we have them up on the HCSRN web portal, behind the password-protected area).
should be easily updated (completely automatically if possible) so that implementers are incentivized to fix whatever issues they can as soon as possible (and get credit for doing so).

Following the lead of the Utilization (encounters) workgroup, we generally refer to the objective checks as "Tier 1" checks and the descriptives as "Tier 2". Like most things, the checks are a matter of negotiation within the workgroup. I've come to think of them as crucial adjuncts to the specs themselves because they sometimes reveal reasonable disagreements on how to interpret the specs.

The Checks

Demographics

Tier 1

All variables listed in the spec exist and are the proper type (character or numeric). I don't personally like to check length and so don't, though there is diversity of opinion on that & so some QA programs do. There is no tolerance on these checks--any missing variable is a fail.
For those variables that have an enumerated domain of permissible values (pretty much everything but MRN and birth_date) that those are the only values found. If > 2% of the values found are off-spec we issue a warning. At 5% or greater we fail the check.
MRN is unique. Zero tolerance here.

Tier 2

Counts of records (not just current enrollees) by

Gender
Race
Ethnicity
Age group
Need for an interpreter

Counts of enrollees over time by those same variables.

Enrollment

Tier 1

All variables listed in the spec exist and are the proper type (character or numeric). Here again zero tolerance.
For those variables that have an enumerated domain of permissible values (all but MRN, the start/stop dates, PCP and PCC) that those are the only values found. Same tolerance here as with demog-- > 2% is a warn, > 5% a fail.
Enr_end must fall after enr_start. Zero tolerance.
Enr_end must not be in the future. Warn at 1%, fail at 3%.
At least one of the plan type flags (which say whether the person/period was enrolled in a PPO, HMO, etc.) must be set. Warn at 2% and fail at 4%.
Ditto for the insurance type flags (e.g., commercial, medicare/caid, etc.).
If any of the Medicare part insurance flags are set, the general Medicare insurance flag must be set. Zero tolerance.
No period prior to 2006 should have the Medicare Part D flag set. 1% warn, 2% fail.
If the Part D flag is set, the drugcov flag must also be set.
If the high-deductible health plan flag is set, either the commercial or the private-pay flag must also be set.
If any of the incomplete_* variable values is 'X' (not implemented) then all values must be 'X'.

(That last check refers to six variables too new to be listed in the publicly available specs. They let implementers surface known problems with data capture, if there are any (for example at Group Health we only have tumor registry information on people who live in one of the seventeen WA State SEER counties).

Tier 2

Counts and percents of enrollees over time by all substantive enrollment variables--plan types, insurance types, drug coverage, etc.
Counts & percents over time by several demographic variables (listed above under demographics).
Counts & percents of enrollees over time by whether the values in primary care clinic (PCC) and primary care physician (PCP) appear to be valid (that is, contain at least one non-space and non-zero character).

More Later

As I'm the primary author of the E/D checks, these were the easiest to hand. I'll be back with at least some of the other checks the other groups have implemented.