NB: this series is still a work in progress.

This post builds off of our previous discussions on healthcare AI infrastructure. If you are unfamiliar with that infrastructure, it may be helpful to review the posts that cover the general lay of healthcare IT land first.

Overview

In this post we’ll discuss the technical side of a project called the Michigan Critical Care Utilization and Risk Evaluation System (M-CURES). We developed M-CURES as an in-hospital deterioration prediction system for patients admitted to the hospital for acute respiratory failure during the initial onset of the COVID-19 pandemic.

In the early days of the pandemic, everyone was concerned about being able to quickly triage patients between different levels of care. We expected to see a massive influx of patients and wanted to be able to place them in the correct care setting (e.g. home, field hospital, regular hospital, ICU). To meet this anticipated need the Michigan Medicine leadership asked us to get a predictive model developed and implemented as fast a possible.

To speed the process up as much as possible we started talking about implementation extremely early in the process. Within the first week we had settled on doing internal integration with Epic’s tools. Although it was our first time using Epic’s tooling we felt like it would give us the best shot at the fastest integration process. Additionally, to speed up the process, we started doing technical integration in parallel with model development.

The development of the project as well as its external validation is discussed in a paper published by the bmj.

Epic’s Internal Integration Approaches

As mentioned in the development infrastructure post, Epic provides tooling to facilitate the internal technical integration.

At the time we started the M-CURES project Epic offered two options for internal integration:

  • Epic Cognitive Computing Platform (ECCP) and
  • Predictive Model Markup Language (PMML).

Epic’s PMML approach is interesting because you essentially specify the model via configuration (using the PMML standard) and Epic builds a copy of the model based on their implementations of different model architectures. I have not built anything using this approach; however, my research at the time seemed to indicate that it was the more limited option, as only a handful of simple model architectures were supported.

Because of the model architecture limitations of PMML, we decided to go with ECCP for M-CURES. ECCP enables you to host custom Python models using Epic’s model serving infrastructure. This model serving infrastructure is essentially a sandboxed Python AI environment hosted using Microsoft Azure. At a high level, data are passed from Chronicles to this special Azure instance, the model produces predictions, which are then passed back to Chronicles. ECCP takes care of the data transitions and AI developers primarily only need to worry about their AI code.

Overview of ECCP

Architecture diagram for implementing custom models served outside of an EMR vendor's system. Research data warehouse generates reports that are then sent to the external model implementation environment, the model generates predictions which are then passed to the EMR system.
Epic's ECCP Implementation Architecture.

This infrastructure tightly integrates Epic’s various systems so that data can flow from Chronicles to the model and then on to the end user in a fairly seamless way.

Model input data is passed out of Chronicles using Reporting Workbench. Reporting Workbench is designed for different types of EMR reporting. Analysts can configure special versions of these reports to pull patient data that can be fed to AI models. Data are in a tabular structure1, where rows represent patients or encounters, and columns represent attributes like age, current heart rate, etc..

These data are then passed in a secure manner to the model, which is running on the special Azure instance. The model developer can then include a variety of code and outside data to produce model outputs and meta-data (like explainability scores).

This information is then passed back to Chronicles and end up in special in a special part of the database that’s designed to store predictive model results.2

When the data is back in Chronicles you serve it to users in many different ways. For example, you could use it to fire best practice alerts or have it be highlighted as an additional column in a list of patients stratify patients based on a risk score. This is all fairly easy to do because you’ve already been working with your analysts to get the data directly into Chronicles, and then they can work with their colleagues to set up the best practice alert, or column display.

The entire flow data from Chronicles to ECCP and back to Chronicles is tightly integrated and controlled, so you have pretty good guarantees about safety and reliability.

Chronicles Not Clarity

What’s crucial about the workflow depicted above is that there’s a data report that comes directly out of Chronicles (not Clarity) that you use as a part of this packaging workflow. This is report is often a small extract representing current patients in the health system. Thus, despite being small it is a very good representation of what the data will look like prospectively, as its generated by the prospective infrastructure.

Sandboxing

ECCP provides a Python environment with a load of standard Python AI/ML libraries (Numpy, Pandas, SKLearn, etc.) They also provide a couple custom Python functions that help you interact with the data interfaces.

  • Receiving inputs: They provide function calls to receive input data exported from Chronicles and parse it into a tabular structure.
  • Returning outputs: After you have model predictions you can use their function calls to package up results and send them back to Chronicles.

These functions help to bookend your model code and help developers automate data flow.

Although the ECCP environment is sandboxed, developers are not overly constrained in terms of the code they can include, as they can include additional python and data files in the package. Additionally developers can call external APIs from within this environment (if they are whitelisted by the health system’s IT teams). This means that you could include information from other sources or do data processing via another service. Thus, its fairly easy to convert an existing Python model and convert it for use with ECCP.

Model Development

We will now discuss how we developed M-CURES using ECCP. You can find a lot of the model development and validation details in our bmj paper. The short version is that model development primarily used Michigan Medicine’s research infrastructure. Although we got the the majority of training and internal validation data from Michigan Medicine’s Research Data Warehouse (RDW), Epic’ implementation infrastructure ended up reshaping our approach to model development.

Architecture diagram for developing models inside of ECCP.
Architecture diagram for developing models capable of running on ECCP. A key part of model development and implementation using ECCP depends on setting up a Reporting Workbench report. This report can be used for model development and it should definitely be used for validation and packaging.

Reporting Workbench Report

Differences in data pipelines led to a shift in how we built the model. The research data pipeline we were familiar with for model development gave us a lot of control in terms of pulling a wide array of features per patient. However, this control came at the cost of accessing very low level data, which meant that we had to put a good deal of development effort into getting the data in the right representational state. For example, we could easily pull all of the meds and vitals for a given patient encounter. But then it was up to us to figure out the right way to filter and aggregate these data prior to feeding it into the model.

The reporting infrastructure that Epic uses for ECCP can be thought of as “higher level,” where the choice-preparation trade-off was flipped. The choices of available data available in Reporting Workbench reports is slightly restricted. This limitation is counterbalanced by the benefit of that data being automatically filtered and aggregated. So we could easily specify that we would like the last set of vitals or if the patient had received a beta-blocker medication. What’s great is that these are data elements that are shared across the health systems’s Epic reporting infrastructure, so analysts only need to build a column/feature once.

On the whole, this is a great benefit. However, it does limit the choices available to developers. Initially, we chafed at this a little. But this was due to the fact that we were so used to “rolling our own”. Having standard data components that can be re-used and are maintained by the health system is absolutely the way of the future. We just weren’t used to it.

For the M-CURES project we were assigned a small amount of analyst time to help build out the Reporting Workbench report that we would be using. Because this was so limited we chose to use a very small set of features to include in the model. We did a bunch of experiments with the training data (from RDW) to do feature selection and then routinely checked with our analyst colleague to ensure that we could get the selected features in the report. Through this iterative process we ended up with logistic regression model that we wanted to use.

Epic Model Development Environment

At this point we had model weights and Python code. In order to actually run the model in ECCP you need to package these into a format that the sandboxed Azure instance can handle. This is where Epic’s model development environment, known as Slate, comes in.

The Slate tooling enables model developers to test and package their Python code. Its an Epic developed docker container that replicates the hosting environment available on Azure. This environment has a battery of python libraries that are commonly used for AI, like Numpy, Pandas, and SKLearn. Additionally, it has a custom Epic functions that enable you to test the model and also package it.

After setting up Slate on our development servers we ported our logistic regression model into it. Alongside the code we also brought in an example report produced by our analyst. This enabled us to use Epic’s tools to do test runs, where the model was passed this example report data-representative of what it would see in production. These testing tools enabled us to better understand the way that ECCP worked and debug our model and preprocessing code. I’ll describe one of the most useful tests we conducted in the technical integration testing section below.

Once we were happy with the way the model worked in the Slate testing environment we used Epic’s tools to package the model and all its associated code.

Epic Implementation Environment

We then shared the packaged model with our Epic analyst colleague. In addition to

There is some configuration that is needed when running the model in real time, in addition to the model we need a couple items:

  • input data report, and
  • model run information.

We need to explicitly connect the Reporting Workbench model discussed above to our configuration. Additionally, we need to instantiate the logic that controls the frequency at which the model runs. For this one creates a special Epic batch job that will run with a specified frequency. This job runs the Reporting Workbench reports and passes that data to the model process that then calculated predictions.

The predictions computed by the model are then passed back to Chronicles.

Despite a couple technical limitations, the entire flow data from Chronicles to ECP and back to Chronicles controlled, unless you have pretty good guarantees about safety and reliability.

Technical integration testing

- slate vs. manually running the model
- production debugging

Limitations

One thing major limitation of this integration approach is that a significant amount of the model run configuration is controlled by health system analysts as opposed to model developers. This is fine if there is really good communication between the two parties, but there’s often a big disconnect, because analysts sort of sit in a siloed place inside of health system IT And developers tend to be outside of direct health IT and structure. Usually this ends up devolving into a big game of telephone, as these parties that don’t normally talk to one another or have good relationships. So, as always, we need to work on this so part of our sociotechnical system.

This decision to do technical integration simultaneously with model development turned out to be fairly important. The learnings from technical integration directly impacted our choices for model development. For example, we realized that building the Reporting Workbench report was a relatively laborious process. Each column in the report took a good amount of time to build and validate. These columns corresponded to a variable (also known as a feature) that the model took as input. So the integration effort scaled linearly with the number of features we wanted to include in the model.

During early parts of development we were exploring models with thousands of features, as we had access to the features from RDW and had code to easily manage these features. However, once we learned more about integration effort we decided to cap the number of features being used to a fairly small number (around 10). We felt comfortable with this decision because we felt like we hit a good balance between performance and implementation time. Our early experiments indicated that we wouldn’t lose a ton of performance going from thousands of features to ten (something on the order of less than 10% relative decrease in AUROC) and we were fairly sure that we could implement and test the report with the allocated Epic analyst built time.

Cheers,
Erkin
Go ÖN Home

Generally, for this setup you have to a model package and some additional configuration. The model package contains your model and the code necessary to package your model in a manner that can be run on the hosting service and that you have additional configuration that determines the data passed to the model

  1. This is the place where the non-tabular structure of healthcare data can cause issues for healthcare AI novices. Because the underlying data in Chronicles isn’t actually tabular, and the best representation of longitudinal health data is often also not tabular as well there’s lots of engineering that needs to be done in order to get a good representation of the patients. 

  2. The kind of information that you can pass back are a little limited because the database only expects certain types of information (e.g., integer or float).