Data Migration


Laitek Medical Software - Romania HQ is expanding its team! We have several job openings available, please check out our Jobs section.



A general discussion of Data Cleansing can be found in the pages About Data Migration & Storage.

When you need it

You need Data Cleansing when the quality of your historical data is not sufficient to your forward-going needs. This may be because new workflow tools increase the benefits of well-organized data, or that several systems are being combined in the new archive and patient folders need to be merged, or that old practices or equipment compromised data quality. Just as often it is because moving to a new system is deemed an appropriate time to clear out accumulated detritus.

Discarding obvious junk is relatively easy and fits within the scope of normal data migration. Source archives may contain studies without valid images, test images created by maintenance personnel and inadvertently stored in the archive, derived images that were discarded but never purged, images of phantoms or lab animals, imported demonstration images with names of dead Presidents, and a host of other exceptions that for the most part should never been in the archive in the first place. These cases typically have demographic attributes that are obviously defective and easily spotted.

Archived data may also contain violations of the DICOM standard. For example, the DICOM Attribute (0018,0015) Body Part Examined is a CS datatype that allows only the characters {A-Z, 0-9, _} and may only be 16 bytes in length, but some archived data may have a whole line of free text in this Attribute. Such fields can in general not be truncated to valid form without a determination that the lost data is not clinically significant. A limited amount of such exception handling is normally included in the standard scope of Migratek migration projects.

Patient-level cleanup may be required when multiple facilities are combined into the new system, as when institutional mergers have resulted in the adoption of a single patient index. Such projects usually start with a "Gold Standard" master patient list from a hospital information system, that contains patient Name, patient ID, birthdate and gender, and other IDs by which the patient is known. For each study, the cleanup process finds the best match in the "Gold Standard" list, and imports the master demographic attributes into the image data.

Study-level matching is the most reliable form of data cleansing, and results in image sets that are linked to their respective reports. This linkage is routinely provided by Modality Worklist support today, but PACS archives being replaced today may contain many Studies acquired before Modality Worklist was implemented. The streamlined user interfaces offered by some PACS today benefit highly from linkage of historical reports to their respective image sets, a factor that speaks toward a study-level matching project. Study-level matching also provides more reliable demographic identification, because for example, "John Smith, DOB 8/8/80, who had a CT exam on 9/6/2001" is obviously a more solid identification than "John Smith, DOB 8/8/80".

How we do it

Migratek performs matching-based cleanup activities before any of the images are moved. We obtain a Study inventory from the source archive, and an exam list furnished by hospital HIS/RIS personnel. A typical PACS Study listing would include the columns:

Patient ID 
Patient Name 
Patient Sex 
Patient Birthdate 
Accession Number 
Requested Procedure ID 
Study Date 
Study Time 
Study UID 

A typical exam listing might include the columns:

Patient Name 
Birth Date 
Placer Order Number 
RIS Order Number 
Exam Code 
Exam Description 
Exam Start Time 
Exam End Time 

There is much variation in the names and the meaning of the HIS/RIS data, which we discuss and clarify with the customer for each job. The meaning of DICOM study attributes must also be understood in the context of how they are used in each local setting.

Working with the customer, we determine:

We then run the matching rules against the customer's data and assess the result. The count of matches yielded by each rule, and the matches made are visually inspected. The Migratek Migration Server hosts a Web-based user interface enabling customer inspection and approval of matching results on a rule-by-rule and study-by-study basis. Rule weights may be adjusted and the process repeated as necessary, but often this process is completed on the first pass. The end result of the matching process is a Stream Processor Instruction File (SPIF), which defines the set of operations to be performed on each DICOM information object when it is migrated.

How much is enough?

Like many endeavors, PACS data cleansing is subject to a Pareto 80-20 rule, and most of the benefit can be obtained by modest efforts. The exceptions encountered in historical clinical data include many that can be detected by a simple algorithm, a number that require carefully crafted rules, some that yield only to individual investigation that looks at the images, and a few that can be resolved only by a physician. Given that a majority of historical data is never accessed again, appropriate discretion is required to decide how much effort to expend at the time of migration. For most cases requiring individual investigation, and for all cases requiring a physician's help, it is better to send the exception cases to the destination PACS, flagged in some way appropriate to the new system, and defer the individual investigations to the time and context where that exam is sought for a clinical purpose.

Our approach is designed to elicit and make clear the point of diminishing return. Doing so has always led us to an easy consensus with the customer on the appropriate level of data cleansing effort.