>>  Adam Wilcox:  Hello.  My name is Adam Wilcox.  I'm going

               to be talking today about data entry and storage and

               computing systems.  In this lecture I'll be talking about

               data entry and storage.  I'll first give some background

               about why some of the considerations I'll be discussing are

               important and then I'm going to go through some examples of

               data storage representations to compare and contrast them.

               So in order to understand some of the issues around data

               entry and data storage it's important first to understand

               kind of the paradigm shift that happens as we move from

               paper to electronic data.  When we think about paper record

               it has certain advantages in that it's familiar, it's easy

               to read, it's easy to hold and it's accessible.  But when we

               really think about it it also has some disadvantages as

               well.  First it gets lost a lot.  It can only be stored as

               paper in one place and we have to go where it is to see it.

               Unless we're actually looking at it it's really very

               passive.  There's no information that transmits from it out

               unless you're reading it.  One of the other disadvantages

               that -- is that it's so easy to produce paper that we

               produce much more than we ever use.  There have been studies

               that have looked on average how many pieces of paper

               documentation are created.  So for example you have an

               observer who will go into a clinical setting and watch in a

               -- in that setting how many pieces of paper are produced





               versus how man pieces of paper are accessed from previous

               visits.  And these have shown that in some of these clinical

               offices the clinical process produces up to 20 pieces of

               paper for every one that is accessed from these previous

               visits.  Now, that's a lot.  That means 20 pieces of paper

               are produced just for one -- and just one of them is going

               to be accessed later.  But this isn't just because of the

               paper itself.  In other studies with computers where

               computers were storing information passively it was shown to

               be accessed of about the same rate.  So in 2007 George

               Rucksack and others did a study where they looked at an

               emergency department and an electronic system that showed

               information available to the emergency doctors.  And they

               found out this was all information from previous visits and

               they looked at what percent of this available data was even

               reviewed during the clinical care episode.  They found that

               it was only accessed about 5 to 30 percent of the time or

               just that 5 to 30 percent of the information was accessed.

               And that's when all it would take is just clicking a mouse

               to check that data.  As we move to electronic systems there

               are a lot of different advantages that are claimed and the

               wake of the systems are used as related to how these

               advantages are seen.  So some advantages are that they

               decrease the cost of storage since the storage cost for

               paper can be really high.  And when you have whole rooms





               just devoted to medical records that keep growing because

               the records can't just get thrown away after a couple years

               for financial or tax reports.  It can also be easier to

               manage electronic records than it is to manage paper

               records.  And then sometimes you can apply computing

               calculations to make the electronic documentation smarter.

               Such as for billing or information retrieval or decision

               support.  So why is this discussion important in a lecture

               about data entry and storage?  Because this paradigm shift

               is related to these data entry and storage issues.  It's

               sometimes easy to think about all we're really doing is

               making a paper system electronic so it's just easier to

               store and manage the data.  But these paper systems are

               optimized for putting data in.  Now that makes sense because

               as we've shown a lot more data is put in than taken out, but

               in an electronic system if it really uses the benefits of

               information retrieval and decision support it needs to be

               optimized for pulling the data out.  Because the data can be

               used many times.  In fact researchers that have studied this

               have modeled how often a piece of data stored would be

               retrieved for secondary use for decision support or

               information retrieval in an electronic system.  And they

               found that information would be retrieved 30 times over the

               lifetime of that piece of data.  So in other words, 30 times

               for every -- once it was restored.  So think about that.  On





               one hand we have paper which was 20 to 1 for storage and

               retrieval and on the other side we have electronic health

               records which it's 1 to 30 for storage and retrieval.

               That's an amazing difference.  More than two orders of

               magnitude.  This is important to be thinking about through

               this lecture because I'm going to be talking about how data

               can be collected and stored in different ways that effect

               storage and retrieval efficiency.  And it's important for

               you to know what the effects of different approaches to what

               would seem are really arbitrary decisions about how we store

               and how we access the data.  At this point an important term

               for you to know regarding clinical data storage is a

               repository.  You know, a repository often means the same

               thing as a database, but really it's a place where all of

               the information is gathered.  Steve Johnson in 1996

               described in some detail and he said a repository has to

               meet the following two criteria, it has to establish clear

               understanding of data that are relevant to the health care

               process and store it together.  And it's not just storing

               it, it actually has to represent the data effectively.  And

               in a repository you want to implement a database that

               performs efficiently for the patient care tasks.  So it's

               representing and then storing and being able to retrieve

               efficiently.  So often as you'll see, you know, repository

               is data come from multiple sources that are gathered





               together in one place to be useful for clinical care.  Now,

               in this lecture I'll often be using the terms database and

               repository somewhat interchangeably because I'm talking

               specifically about a clinical database that's being used a

               repository.  But it's important for you to understand when

               people talk about a repository usually they're talking about

               a clinical database where you gather data from multiple

               different sources.  Now, an important part of repository is

               how the data are actually represented.  Usually they're

               represented in a relational database that has different

               relational structures or tables that represent different

               clinical concepts.  And there's a couple of ways to do it.

               One way is that you have -- it's fully relational where you

               have lots of tables that represent the different clinical

               concepts.  And the meaning is in both in the column and the

               actual table.  So you could have a table specifically for

               patient visits and that table would have different columns

               that were related to different visits -- that were related

               to visits like column for the patient, a column for the type

               of visit it was or a column for the clinician and a column

               for different observations that would be relevant to the

               visit.  Another example is a radiology exam where you could

               have columns again about the patient and the radiology tech

               or you could also have the location of where the exam was

               taken and the type of exam.  Even columns for the specific





               findings.  I've had some experience in looking at the back

               end of some of these electronic health records and many do

               follow this relational design.  For example the GE logician

               system has data stored directly as observations that are

               documented from data entry forms.  GE logician is the small

               office electronic health record from GE.  Now at New York

               Presbyterian we use the eclipsis system and we've looked at

               the back end there and the tables and there's over 1,400

               tables.  Now, not all of these tables represent patient

               information.  Some of them represent codes that are used in

               the storing of the data.  So codes such as for different

               locations that can be added in so they wouldn't all

               represent individual patient data, however, when you've got

               1,400 tables a lot of --

               ^M00:08:05

               [ Silence ]

               ^M00:08:12

               >>  Adam Wilcox:  Now, here's a -- when I talked about data

               entry forms I just want to give an example.  Here's a screen

               shock from an EHR data entry form.  So along the top you got

               the different tabs that represent the different components

               of the visit now such as the history, the vital signs, the

               physical exam and the impression.  Now, looking just at the

               vital signs tab we see the concepts of height and weight,

               temperature, vision, smoking status, et cetera and these are





               all concepts that would be important to collect at a visit.

               There's also little information to the side just that's

               helpful about, you know, what the appropriate values or

               expected values might be for different areas of the form,

               but for the most part it's just for storing information.

               Now there are a lot of concepts here as the next slide here

               shows where we've got a list of all the concepts that would

               be part of this -- all of the concepts or observations that

               would be part of this well adolescent visit form.  I'm not

               going to go through them all individually, but as you look

               at this list you might think about how this data should be

               stored in a database.  One way is to just store it as a

               single table with a different column for each concept and a

               row that represents an instance of that form with the actual

               values then being elements in this form.  Now that sounds

               pretty simple and it would be a good way to start, but then

               you might ask what happens if someone wants to change that

               form in some way.  What happens if someone wants to add a

               new vital sign such as a screening for ADHD or for

               depression to that form.  Then you have to change that big

               table and move everything over to add another column.  Then

               -- so to make that easier or to fix that problem another

               approach may be to break it up into a lot of smaller tables

               according to the natural concepts grouping -- concept

               groupings such as the tabs that we saw for history or vital





               signs, physical exam and impression.  So you could create a

               separate table, one for history, one for vital signs, one

               for physical exam and then one for impression.  But then

               this can have its own disadvantage because you start to

               gather too many tables.  Now, these tables that you'd

               created wouldn't just be for each data entry form in the

               electronic health records, but they'd be used for other data

               -- other methods of data collection in the EHR.  For example

               there's the data from all the clinical entry forms for the

               physicians, for the nurses, for the other clinical staff and

               these forms actually may be different depending on who's

               entering them.  And then there's also the data that could be

               coming from ancillary systems like the laboratory and

               pharmacy systems.  And finally there could be data from

               different devices that are loaded into the repository.  All

               these different sources can influence the data

               representation of the tables that are in this repository.

               So let me go through an example, here we have an example of

               a clinical scenario that shows the different types of data

               that can be stored on a patient.  So just for background

               this is from the 2007 certification commission for health

               information technology which is an organization that builds

               test scripts or criteria for EHR's.  So just kind of an

               example of what the EHR should be able to store and an

               example that was used to test whether EHR's could be





               effective.  The exact example is in the small text and in

               the table.  So the text says involves an inpatient admission

               for Elizabeth Benson a six month old who weighs 13 pounds

               and 14 ounces and her height is 26 inches.  She was directly

               admitted by her pediatrician Dr. John Miller to the

               inpatient pediatric unit with a diagnosis of pneumonia and

               possible sepsis.  And then in the table we have a bunch of

               laboratory and radiology test results such as the blood

               tests for white blood count differential bounds and then

               radiology chest x-ray report.  Now, that's a pretty brief

               description, but there's a lot of data elements there.  And

               when we try to represent that data using the relational

               tables we can see how really complicated it can get.  So

               here I've taken all of that information and tried to store

               it in eight tables that describe that.  You have the table

               for the patient that describes Elizabeth Benson and her age,

               you have the table that describes the inpatient admission

               with the admitting physician being where she was admitted.

               The initial -- the admitting diagnosis of possible sepsis

               and then various vital sign measurements that were included.

               And then for the lab test I've broken those up into

               different batteries or groups of tests.  The white blood

               cell count and the differential [inaudible] and other things

               would be part of what's called an ABC with differential.

               And then the potassium was part of the BMP or basil





               metabolic panel.  So in this example, you know, we've got

               these eight tables, but I actually was trying to minimize

               the number of tables by making the columns really long so

               for example in the diagnosis rather than having one row for

               each diagnosis I've got diagnosis one, diagnosis two and

               then the vital signs rather than having one for each -- one

               row for each vital sign I've tried to put it all into one

               table like height and weight together.  But you still even

               just doing that get eight different tables.  So -- and when

               you look at the lab tests I made a different table for each

               kind of lab battery or group of tests and you imagine how

               that would expand as I add all the other possible batteries

               or groups of lab tests that would be done.  Here we see

               actually a more real world example with some samples from

               the eclipsis tables.  Now real world in the sense that I --

               that it shows that how much the columns can actually expand.

               So the list of terms that are represents all of the columns

               that would be in a table.  In my fake example I use dot,

               dot, dot so I didn't have to show how many columns the

               tables would really have, but you can easily end up with a

               lot of columns when you get down to the information that

               needs to be stored on the data.  And for these two tables

               they have about forty columns each.  And those are just two

               tables.  Here's a part of the database schema for eclipsis

               and the tables are really multiplying.  And as I mentioned





               before there's over 1,400 tables in an eclipsis

               implementation.  So how does a system end up with so many

               tables and then with so many columns in the table?  Well,

               some of that is that you really aren't trying to limit

               tables.  When you have a model where you're adding tables as

               you add forms of data entry where you have lots of tables

               it's really easy to add more.  I mean, what's the big deal

               about 1,401 tables versus just 1,400.  As a result these

               systems are really efficient at building new forms for data

               entry because all you have to do to create a new form is

               create your new table.  And then it's really easy to just

               say okay for this form I'm going to store it in these

               tables.  And they can be really efficient that way.  When

               they're efficient in that way they're really just like

               electronic paper where they optimize around data collection

               rather than retrieval.  So think about it, once you've

               stored all of these tables in these different places imagine

               that you want to try and find one concept that may be stored

               in multiple different forms.  Maybe you're storing blood

               pressure and in some cases that might be taking -- or

               documented by a physician or in some cases it might be

               documented by a nurse or even a medical assistant.  If each

               of those individuals has a different form and each of those

               forms is represented by different tables you're going to

               have to go through all of these different areas just to





               retrieve it.  You know, this report of these multiple tables

               and optimize each with storage is really common with systems

               that are initially intended to replace paper based system.

               And are -- where the benefit is really just reducing storage

               space.  The problem as I'm alluding to here is getting the

               data out of the systems.  These quires that have to go

               across these tables.  Not only can they become very

               inefficient, but you also have problems when the terms mean

               the same thing like the blood pressure example.  It's also

               difficult to gather data from the form when it's represented

               by multiple tables in some type of summary because it has to

               be pulled from so many different places.  Now, that was a

               relational based model.  There is another model that's

               called the event based model which does things a little

               differently.  Now, you can see this image that shows the

               event based model and you've still got a bunch of tables.

               I've got the patient and -- table, I've got information

               about alerts and orders.  And it looks like I haven't really

               fixed the problem because before I showed you eight tables

               and now here I've got at least 12 that are going on.  But

               I'm going to get to the difference.  First though before I

               do that oftentimes when people store an event rather than --

               with an event model what makes it different from -- what --

               how it gets implemented and how it's a little different from

               relational model where you just have the different tables





               representing different chunks is that you have events and

               components.  So you have an event of a lab test or event of

               a lab battery or a group of tests and then that has

               different components that can be represented in rows such as

               the actual values of the different tests.  In this case we

               have an event that's representing an admission and then you

               have different components which represent data around that

               admission.  A way to diagram this like what was done by then

               Ginican and Stam [assumed spelling] in a paper in 1995 is

               they showed the event is some component and then there's

               this one too many relationships of the different actions

               that are part of it.  And this is kind of the core

               representation of an event model.  So you have a single

               event and then tied to that would be a whole long list of

               the components that are represented to that.  You know, this

               is just a different way of storing the data.  And you may

               ask, well, why wouldn't -- why would you go to this way

               versus the other?  Or actually before we get to that, you

               might ask how does this actually work together?  Now, let's

               go back to that same example of the inpatient admission for

               the six month old Elizabeth Benson and see what it looks

               like.  Here we saw it before in these relational tables, but

               in an event model it will look a little different.  So we

               have the patient table describing again we have the visit

               table and that's similar.  And then I've got a bunch of





               clinical events so I've got a clinical event of the

               diagnosis, I've got a clinical event of vital signs of event

               of lab tests and a couple of lab tests actually and a

               radiology report.  And then under each one I've got the

               components and the components are linked back to the events

               through the event ID in the clinical component table.  Now,

               this doesn't show actually all of the tables and all of the

               primary and secondary keys that would be part of it, but

               conceptually it's really showing how you have these things

               linked back together.  So you've -- I've reduced what was

               eight tables down to four, but more importantly I've reduced

               most of the tables just down to these two event and

               component tables where they can all be stored like this.  So

               I showed you these different tables and then you'd ask,

               well, okay so how does it really make things different?  For

               one thing, the focus of an event model is different than a

               relational model in terms of where it's optimized.  A

               relational model maybe optimized for storage, an event model

               is optimized for retrieval.  So with the event model it's

               actually a little trickier to store the data.  You have to

               keep -- make sure that the events and the components are

               matched all together and keep those relationships of all the

               components together.  And that sometimes can be a little

               more complex than it seems, but it's not impossible and it's

               not that difficult.  The benefit of it though is that when





               you're quiring the data you are able to query just these two

               tables.  You don't have to merge across multiple tables and

               go found out where all of the data are.  That's so much

               better than the relational structure where it's difficult to

               query both data and context.  Now, quiring the data is

               gathering multiple elements requiring these multiple tables

               joins.  You know, as I mentioned, it can be very good for

               data storage and it can also actually be good for getting

               data across patients.  So if you wanted to find -- if you

               wanted to compare data where it ended on a form across

               multiple patients that relational model can be very

               efficient.  But that's usually not what people are doing

               when they're providing clinical care.  In the next lecture

               I'll talk about how that could be generally used in a data

               warehouse when it's still being done for analysis.  But in

               terms of providing clinical care with decision support and

               making -- with building support and making the system

               smarter it's usually done a little differently.  In the

               event model however it's really different quiring across the

               data types.  The data that are organized and use event and

               component tables are usually physically organized according

               to patient.  So they can be very fast once you've selected a

               patient and gathering all of the data.  It's not so good at

               quiring across patients.  But there is different ways to

               mitigate that.  Now, so this is the last slide of this part





               of the lecture so I just want to recap kind of what we've

               gone over.  I've talked about the paradigm shift that

               happens when you move from paper to electronic.  The

               paradigm shift doesn't happen if all you're doing is

               building a system that reproduces the exact benefits of

               paper and the exact disadvantages of paper.  In this case if

               you follow just a purely relational model and have a lot of

               tables that are spread across where the forms are spread

               across multiple tables.  But if you want to make it smarter,

               if you want to have the systems be really using the data for

               secondary use to make it smarter and help kind of creating

               thinking system you're going to be retrieving a lot more

               than you're going to be storing.  And in this case it may be

               better to have the data model more in an event based model

               where it's optimized more for retrieval.  So paradigm shift

               you can optimize over storage versus retrieval which one you

               do will depend on what you -- what your real goals with the

               system are and what you want to have as the advantages.  And

               not surprisingly I'm going to advocate that because you want

               the systems to be smarter that we're not just trying to

               create a version of electronic paper there are some

               important advantages to the base model.