Implementing the Virtual Data Warehouse: October 2022

In medical research data harvested from insurance/care data it's often important to know something about the providers that appear in the data. In particular we frequently want to know what the type of provider is--were they a:

Nurse Practitioner
Physician's Assistant
Social Worker
Technician

and so forth. It's also frequently important to know what that person's specialty is--for example:

Transplant surgery
Pediatrics
Oncology

Trying to make this data available for research--particularly in hybrid organizations like HMOs, can be a serious challenge. In general, we'd like to have a lookup table that lists every provider's type and specialty as it existed at the time of an encounter, say. Unfortunately, our sources don't often provide that information, and so we wind up with a pile of types and specialties for each provider that can be confusing to use.

Reasons a single provider might have more than one record in a provider type/specialty table

They are (or were) truly qualified in multiple specialties

simultaneously (say, internal medicine and oncology), or
they re-trained over the course of their career (e.g., started out as a registered nurse, became a nurse-practitioner)

We drew data for the same individual from multiple systems, and the information is inconsistent because:

one or all of the systems are wrong about the type/specialty
one system has a bad identifier & they're not really the same individual

Some of the data is just bad (e.g., indicates additional specialties not actually merited).

Unfortunately our sources (National Provider Identifier Registry, our various Electronic Medical Record systems, our claims processing systems) don't often (ever?) give guidance as to the time progression of various types and specialties. Lacking that information, we are powerless to convey it to users. So what can we do?

Document the scope of the problem

Looking just at our EMR data, a little more than 94% of providers have just a single listed specialty. If that's typical, while probably not negligible, it's not a show-stopper either.

Provide Users All The Data We Have, warn them of the issue, and give whatever guidance we can

So if you happen to know from your analysis that the NPI registry data is particularly dodgy, say, let them know that in cases of ambiguities, that they should prefer records coming from sources other than NPI.

Similarly, it's often useful to look at other contextual clues on particular encounters--what procedures/diagnoses bear that provider's identifier? Social workers don't perform Radial Keratotamy surgery, right--so if that's what looks to be happening, one of those fields is lying.

Department (when it's available and not imputed from provider specialty) can also be an important clue.

For Heavy-Hitters, Analyze and Special-Case Particular Providers' Data

In our provider table I like to include off-spec fields that give the number of encounter & rx fill records that bear that provider identifier. That lets me optimize my bang-for-buck in terms of my precious time inspecting/analyzing/researching seeming ambiguities. Providers that only appear on a handful of records do not merit the same attention that the ones with higher numbers do.

When we do find bad data, I am absolutely not above including lines like:

if provider = 'some_dude' and specialty = 'wrong specialty' then delete ;

in my code.

Maybe Someday--Do something AI-ish to flag inconsistencies or (dare I hope) try to construct time-periods

This should be possible to do in a centrally created Quality Assurance type program, but it would probably be a ton of work and is not feasible at the moment.

Implementing the Virtual Data Warehouse

Thursday, October 27, 2022

Provider specialty data--wrangling and using

Reasons a single provider might have more than one record in a provider type/specialty table

Document the scope of the problem

Provide Users All The Data We Have, warn them of the issue, and give whatever guidance we can

For Heavy-Hitters, Analyze and Special-Case Particular Providers' Data

Maybe Someday--Do something AI-ish to flag inconsistencies or (dare I hope) try to construct time-periods

Followers

Blog Archive