>> Adam Wilcox: Hello. My name is Adam Wilcox. I'm going to be talking today about data entry and storage and computing systems. In this lecture I'll be talking about data entry and storage. I'll first give some background about why some of the considerations I'll be discussing are important and then I'm going to go through some examples of data storage representations to compare and contrast them. So in order to understand some of the issues around data entry and data storage it's important first to understand kind of the paradigm shift that happens as we move from paper to electronic data. When we think about paper record it has certain advantages in that it's familiar, it's easy to read, it's easy to hold and it's accessible. But when we really think about it it also has some disadvantages as well. First it gets lost a lot. It can only be stored as paper in one place and we have to go where it is to see it. Unless we're actually looking at it it's really very passive. There's no information that transmits from it out unless you're reading it. One of the other disadvantages that -- is that it's so easy to produce paper that we produce much more than we ever use. There have been studies that have looked on average how many pieces of paper documentation are created. So for example you have an observer who will go into a clinical setting and watch in a -- in that setting how many pieces of paper are produced versus how man pieces of paper are accessed from previous visits. And these have shown that in some of these clinical offices the clinical process produces up to 20 pieces of paper for every one that is accessed from these previous visits. Now, that's a lot. That means 20 pieces of paper are produced just for one -- and just one of them is going to be accessed later. But this isn't just because of the paper itself. In other studies with computers where computers were storing information passively it was shown to be accessed of about the same rate. So in 2007 George Rucksack and others did a study where they looked at an emergency department and an electronic system that showed information available to the emergency doctors. And they found out this was all information from previous visits and they looked at what percent of this available data was even reviewed during the clinical care episode. They found that it was only accessed about 5 to 30 percent of the time or just that 5 to 30 percent of the information was accessed. And that's when all it would take is just clicking a mouse to check that data. As we move to electronic systems there are a lot of different advantages that are claimed and the wake of the systems are used as related to how these advantages are seen. So some advantages are that they decrease the cost of storage since the storage cost for paper can be really high. And when you have whole rooms just devoted to medical records that keep growing because the records can't just get thrown away after a couple years for financial or tax reports. It can also be easier to manage electronic records than it is to manage paper records. And then sometimes you can apply computing calculations to make the electronic documentation smarter. Such as for billing or information retrieval or decision support. So why is this discussion important in a lecture about data entry and storage? Because this paradigm shift is related to these data entry and storage issues. It's sometimes easy to think about all we're really doing is making a paper system electronic so it's just easier to store and manage the data. But these paper systems are optimized for putting data in. Now that makes sense because as we've shown a lot more data is put in than taken out, but in an electronic system if it really uses the benefits of information retrieval and decision support it needs to be optimized for pulling the data out. Because the data can be used many times. In fact researchers that have studied this have modeled how often a piece of data stored would be retrieved for secondary use for decision support or information retrieval in an electronic system. And they found that information would be retrieved 30 times over the lifetime of that piece of data. So in other words, 30 times for every -- once it was restored. So think about that. On one hand we have paper which was 20 to 1 for storage and retrieval and on the other side we have electronic health records which it's 1 to 30 for storage and retrieval. That's an amazing difference. More than two orders of magnitude. This is important to be thinking about through this lecture because I'm going to be talking about how data can be collected and stored in different ways that effect storage and retrieval efficiency. And it's important for you to know what the effects of different approaches to what would seem are really arbitrary decisions about how we store and how we access the data. At this point an important term for you to know regarding clinical data storage is a repository. You know, a repository often means the same thing as a database, but really it's a place where all of the information is gathered. Steve Johnson in 1996 described in some detail and he said a repository has to meet the following two criteria, it has to establish clear understanding of data that are relevant to the health care process and store it together. And it's not just storing it, it actually has to represent the data effectively. And in a repository you want to implement a database that performs efficiently for the patient care tasks. So it's representing and then storing and being able to retrieve efficiently. So often as you'll see, you know, repository is data come from multiple sources that are gathered together in one place to be useful for clinical care. Now, in this lecture I'll often be using the terms database and repository somewhat interchangeably because I'm talking specifically about a clinical database that's being used a repository. But it's important for you to understand when people talk about a repository usually they're talking about a clinical database where you gather data from multiple different sources. Now, an important part of repository is how the data are actually represented. Usually they're represented in a relational database that has different relational structures or tables that represent different clinical concepts. And there's a couple of ways to do it. One way is that you have -- it's fully relational where you have lots of tables that represent the different clinical concepts. And the meaning is in both in the column and the actual table. So you could have a table specifically for patient visits and that table would have different columns that were related to different visits -- that were related to visits like column for the patient, a column for the type of visit it was or a column for the clinician and a column for different observations that would be relevant to the visit. Another example is a radiology exam where you could have columns again about the patient and the radiology tech or you could also have the location of where the exam was taken and the type of exam. Even columns for the specific findings. I've had some experience in looking at the back end of some of these electronic health records and many do follow this relational design. For example the GE logician system has data stored directly as observations that are documented from data entry forms. GE logician is the small office electronic health record from GE. Now at New York Presbyterian we use the eclipsis system and we've looked at the back end there and the tables and there's over 1,400 tables. Now, not all of these tables represent patient information. Some of them represent codes that are used in the storing of the data. So codes such as for different locations that can be added in so they wouldn't all represent individual patient data, however, when you've got 1,400 tables a lot of -- ^M00:08:05 [ Silence ] ^M00:08:12 >> Adam Wilcox: Now, here's a -- when I talked about data entry forms I just want to give an example. Here's a screen shock from an EHR data entry form. So along the top you got the different tabs that represent the different components of the visit now such as the history, the vital signs, the physical exam and the impression. Now, looking just at the vital signs tab we see the concepts of height and weight, temperature, vision, smoking status, et cetera and these are all concepts that would be important to collect at a visit. There's also little information to the side just that's helpful about, you know, what the appropriate values or expected values might be for different areas of the form, but for the most part it's just for storing information. Now there are a lot of concepts here as the next slide here shows where we've got a list of all the concepts that would be part of this -- all of the concepts or observations that would be part of this well adolescent visit form. I'm not going to go through them all individually, but as you look at this list you might think about how this data should be stored in a database. One way is to just store it as a single table with a different column for each concept and a row that represents an instance of that form with the actual values then being elements in this form. Now that sounds pretty simple and it would be a good way to start, but then you might ask what happens if someone wants to change that form in some way. What happens if someone wants to add a new vital sign such as a screening for ADHD or for depression to that form. Then you have to change that big table and move everything over to add another column. Then -- so to make that easier or to fix that problem another approach may be to break it up into a lot of smaller tables according to the natural concepts grouping -- concept groupings such as the tabs that we saw for history or vital signs, physical exam and impression. So you could create a separate table, one for history, one for vital signs, one for physical exam and then one for impression. But then this can have its own disadvantage because you start to gather too many tables. Now, these tables that you'd created wouldn't just be for each data entry form in the electronic health records, but they'd be used for other data -- other methods of data collection in the EHR. For example there's the data from all the clinical entry forms for the physicians, for the nurses, for the other clinical staff and these forms actually may be different depending on who's entering them. And then there's also the data that could be coming from ancillary systems like the laboratory and pharmacy systems. And finally there could be data from different devices that are loaded into the repository. All these different sources can influence the data representation of the tables that are in this repository. So let me go through an example, here we have an example of a clinical scenario that shows the different types of data that can be stored on a patient. So just for background this is from the 2007 certification commission for health information technology which is an organization that builds test scripts or criteria for EHR's. So just kind of an example of what the EHR should be able to store and an example that was used to test whether EHR's could be effective. The exact example is in the small text and in the table. So the text says involves an inpatient admission for Elizabeth Benson a six month old who weighs 13 pounds and 14 ounces and her height is 26 inches. She was directly admitted by her pediatrician Dr. John Miller to the inpatient pediatric unit with a diagnosis of pneumonia and possible sepsis. And then in the table we have a bunch of laboratory and radiology test results such as the blood tests for white blood count differential bounds and then radiology chest x-ray report. Now, that's a pretty brief description, but there's a lot of data elements there. And when we try to represent that data using the relational tables we can see how really complicated it can get. So here I've taken all of that information and tried to store it in eight tables that describe that. You have the table for the patient that describes Elizabeth Benson and her age, you have the table that describes the inpatient admission with the admitting physician being where she was admitted. The initial -- the admitting diagnosis of possible sepsis and then various vital sign measurements that were included. And then for the lab test I've broken those up into different batteries or groups of tests. The white blood cell count and the differential [inaudible] and other things would be part of what's called an ABC with differential. And then the potassium was part of the BMP or basil metabolic panel. So in this example, you know, we've got these eight tables, but I actually was trying to minimize the number of tables by making the columns really long so for example in the diagnosis rather than having one row for each diagnosis I've got diagnosis one, diagnosis two and then the vital signs rather than having one for each -- one row for each vital sign I've tried to put it all into one table like height and weight together. But you still even just doing that get eight different tables. So -- and when you look at the lab tests I made a different table for each kind of lab battery or group of tests and you imagine how that would expand as I add all the other possible batteries or groups of lab tests that would be done. Here we see actually a more real world example with some samples from the eclipsis tables. Now real world in the sense that I -- that it shows that how much the columns can actually expand. So the list of terms that are represents all of the columns that would be in a table. In my fake example I use dot, dot, dot so I didn't have to show how many columns the tables would really have, but you can easily end up with a lot of columns when you get down to the information that needs to be stored on the data. And for these two tables they have about forty columns each. And those are just two tables. Here's a part of the database schema for eclipsis and the tables are really multiplying. And as I mentioned before there's over 1,400 tables in an eclipsis implementation. So how does a system end up with so many tables and then with so many columns in the table? Well, some of that is that you really aren't trying to limit tables. When you have a model where you're adding tables as you add forms of data entry where you have lots of tables it's really easy to add more. I mean, what's the big deal about 1,401 tables versus just 1,400. As a result these systems are really efficient at building new forms for data entry because all you have to do to create a new form is create your new table. And then it's really easy to just say okay for this form I'm going to store it in these tables. And they can be really efficient that way. When they're efficient in that way they're really just like electronic paper where they optimize around data collection rather than retrieval. So think about it, once you've stored all of these tables in these different places imagine that you want to try and find one concept that may be stored in multiple different forms. Maybe you're storing blood pressure and in some cases that might be taking -- or documented by a physician or in some cases it might be documented by a nurse or even a medical assistant. If each of those individuals has a different form and each of those forms is represented by different tables you're going to have to go through all of these different areas just to retrieve it. You know, this report of these multiple tables and optimize each with storage is really common with systems that are initially intended to replace paper based system. And are -- where the benefit is really just reducing storage space. The problem as I'm alluding to here is getting the data out of the systems. These quires that have to go across these tables. Not only can they become very inefficient, but you also have problems when the terms mean the same thing like the blood pressure example. It's also difficult to gather data from the form when it's represented by multiple tables in some type of summary because it has to be pulled from so many different places. Now, that was a relational based model. There is another model that's called the event based model which does things a little differently. Now, you can see this image that shows the event based model and you've still got a bunch of tables. I've got the patient and -- table, I've got information about alerts and orders. And it looks like I haven't really fixed the problem because before I showed you eight tables and now here I've got at least 12 that are going on. But I'm going to get to the difference. First though before I do that oftentimes when people store an event rather than -- with an event model what makes it different from -- what -- how it gets implemented and how it's a little different from relational model where you just have the different tables representing different chunks is that you have events and components. So you have an event of a lab test or event of a lab battery or a group of tests and then that has different components that can be represented in rows such as the actual values of the different tests. In this case we have an event that's representing an admission and then you have different components which represent data around that admission. A way to diagram this like what was done by then Ginican and Stam [assumed spelling] in a paper in 1995 is they showed the event is some component and then there's this one too many relationships of the different actions that are part of it. And this is kind of the core representation of an event model. So you have a single event and then tied to that would be a whole long list of the components that are represented to that. You know, this is just a different way of storing the data. And you may ask, well, why wouldn't -- why would you go to this way versus the other? Or actually before we get to that, you might ask how does this actually work together? Now, let's go back to that same example of the inpatient admission for the six month old Elizabeth Benson and see what it looks like. Here we saw it before in these relational tables, but in an event model it will look a little different. So we have the patient table describing again we have the visit table and that's similar. And then I've got a bunch of clinical events so I've got a clinical event of the diagnosis, I've got a clinical event of vital signs of event of lab tests and a couple of lab tests actually and a radiology report. And then under each one I've got the components and the components are linked back to the events through the event ID in the clinical component table. Now, this doesn't show actually all of the tables and all of the primary and secondary keys that would be part of it, but conceptually it's really showing how you have these things linked back together. So you've -- I've reduced what was eight tables down to four, but more importantly I've reduced most of the tables just down to these two event and component tables where they can all be stored like this. So I showed you these different tables and then you'd ask, well, okay so how does it really make things different? For one thing, the focus of an event model is different than a relational model in terms of where it's optimized. A relational model maybe optimized for storage, an event model is optimized for retrieval. So with the event model it's actually a little trickier to store the data. You have to keep -- make sure that the events and the components are matched all together and keep those relationships of all the components together. And that sometimes can be a little more complex than it seems, but it's not impossible and it's not that difficult. The benefit of it though is that when you're quiring the data you are able to query just these two tables. You don't have to merge across multiple tables and go found out where all of the data are. That's so much better than the relational structure where it's difficult to query both data and context. Now, quiring the data is gathering multiple elements requiring these multiple tables joins. You know, as I mentioned, it can be very good for data storage and it can also actually be good for getting data across patients. So if you wanted to find -- if you wanted to compare data where it ended on a form across multiple patients that relational model can be very efficient. But that's usually not what people are doing when they're providing clinical care. In the next lecture I'll talk about how that could be generally used in a data warehouse when it's still being done for analysis. But in terms of providing clinical care with decision support and making -- with building support and making the system smarter it's usually done a little differently. In the event model however it's really different quiring across the data types. The data that are organized and use event and component tables are usually physically organized according to patient. So they can be very fast once you've selected a patient and gathering all of the data. It's not so good at quiring across patients. But there is different ways to mitigate that. Now, so this is the last slide of this part of the lecture so I just want to recap kind of what we've gone over. I've talked about the paradigm shift that happens when you move from paper to electronic. The paradigm shift doesn't happen if all you're doing is building a system that reproduces the exact benefits of paper and the exact disadvantages of paper. In this case if you follow just a purely relational model and have a lot of tables that are spread across where the forms are spread across multiple tables. But if you want to make it smarter, if you want to have the systems be really using the data for secondary use to make it smarter and help kind of creating thinking system you're going to be retrieving a lot more than you're going to be storing. And in this case it may be better to have the data model more in an event based model where it's optimized more for retrieval. So paradigm shift you can optimize over storage versus retrieval which one you do will depend on what you -- what your real goals with the system are and what you want to have as the advantages. And not surprisingly I'm going to advocate that because you want the systems to be smarter that we're not just trying to create a version of electronic paper there are some important advantages to the base model.