Use of healthcare data in analytics is growing exponentially.  We have developed tools to keep PHI anonymous but still useful by maintaining personhood even when data comes from multiple sources.


A False Choice

In the last ten years, use of advanced analytics has grown exponentially – and for good reason.  The truly exciting thing about the world of healthcare analytics is that it can uncover opportunities for reduced costs that leave patients healthier and happier.


The HIPAA rules allow for use of healthcare data to gain these insights.  Plans effectively have two choices.  The first option called "Safe Harbor" details how data can be redacted and obscured so that if there was a breach, the it has been cleansed to a point where it would be impossible to tie it back to any individual.  The bad news is that many of the insights that can be gained through analytics are also lost.


The second option is to just use PHI and then make sure to sign BAA agreements with vendors and others with access to the information.  We think this represents a false choice and one that leads to more damaging breaches when they happen and potentially discouraging the ability and opportunities to find the most valuable insights.    Opportunities exist to anonymize or de-identify the data to a degree where individuals are identifiable but not able to be traced back to the actual patient.

Anonymous data can lose person-hood and therefore be less useful in analytics

The image to the left an example of a typical "Safe Harbor" scenario.  Data has been cleansed, and as far as anyone can tell, the records can never be traced back to what patient originally was seen in any scenario.  This might be ok when broad studies are done or where a general view of treatments or diagnosis is needed.


However, if one were to want to study re-admission rates, how would one determine that the first "patient" was readmitted after data has been cleaned?  Further, the "Safe Harbor" rules all but obscure the ability to study demographics such as location or age or even patient history in any meaningful way.

In order to use analytics to study data for cancer incidence, managing diabetes, long term outcomes studies or any number of other examples, analysts need to be able to see treatments and diagnosis over time as they apply to a particular patient.  Within the "Safe Harbor" rules, this is basically impossible.  So usually PHI is just used.  The good news is that personhood is maintained.  The bad news is that every time PHI is used, no matter how careful everyone is, it represents another opportunity for a breach.


HIPAA and the HITECH Act, have substantial teeth in them to make sure data is kept secure and private, however every few months a story will appear in the news about a breach of this data.



A Typical Approach to Anonymizing or De-Identifying Data

Data Source 1

Data Source 2

Manual masking, de-identification or anonymization can be done in a variety of ways, but is typically done with custom code.  This code is created specific to each data source.  So if someone wanted to do a detailed study to help predict who is most likely to develop diabetes, not only would encounter data be needed, but enrollment data containing more detailed demographic information would also be required.


In the above example, "personhood" was maintained when bringing over data from Data Source 1 (encounters).  It was also maintained when bringing over data from Data Source 2.  However, the method for encrypting the data from one source differed from the other making it impossible to match the demographic data from our enrollment file in Data Source 2 to the anonymized encounter data from Data Source 1.


 Writing, maintaining and performing quality control checks is notoriously difficult in these types of environments which is yet another reason healthcare professionals default to using PHI when performing these types of studies even where there is a willingness to do better.   After facing this same problem over the years and spanning customers, it became apparent an affordable, standard, supportable, and repeatable approach would be needed.


Don't Redact!™

Data Source 1

Data Source 2

The EDI Project™ has created a product called "Don't Redact!" that allows data to be anonymized automatically.  Personhood can be easily maintained when data comes from different data sources.  If studies are being updated on an ongoing basis with data that not only comes in over time, but also may come from varying sources, Don't Redact! is a perfect tool to be sure that data is nearly useless in trying to tie back a patient found in the data to an actual patient in the real world.  New data sets created are irreversibly anonymized.  The product is quick to deploy and works with almost any kind of data input.  Site licenses start at $25,000 making this hundreds of thousands of dollars less than competitive products. Please contact us to learn more.

The EDI Project™

Reliable Integration

The EDI Project™

5510 El Arbol Dr.

Carlsbad, CA 92008