Big Data Analytics
Ongoing 9 Years
A major American medical research institution reached out to us to help automate their efforts researching medical outcomes and to enable them to quickly construct analytical processes aimed at detecting statistically significant correlations in symptom clustering or adverse drug interactions. As a part of this process the platform’s ETL subsystem aggregates medical data from multiple channels transforming it into OMOP Common Data Model normalizing terminology, vocabulary and coding schemes.
Envion Software developed advanced ETL/ELT platform able to efficiently process terabytes of EMR and EMH data with web front-end enabling researches to construct sophisticated studies using easy to use GUI that on demand marshals elastic AWS resources needed for rapid execution.
A leading North-American medical research institution was looking for a solution allowing them automate their data processing and analysis.
This institution is engaged in medical outcome research, and the ability to comprehensively analyze very large datasets to identify intricate correlations between the different data parameters is crucial to all their research activities. In particular, they needed to be able to process terabytes of EMR/EMH data with a view to establishing any possible statistically significant correlations between different symptoms, finding out whether a medicine is prone to produce any side effects, and performing a diverse variety of other advanced tasks associated with bulk-processing of medical data.
In addition to the massive amounts of clinical data for hundreds of millions of patients, the ETL subsystem’s main challenge was the disparate error prone data representations used in different channels and even within a particular channel. This required us to implement an efficient HITL workflow process on supper massive data stores that supports intelligent response to processing errors, flagging of unrecoverable malformed data items, scheduling them for human review and augmentation of the ETL to automate as much of the processing as possible.
Thus, Envion Software needed to develop an advanced ETL functionality that would enable the client to automatically collect differently formatted data from multiple sources, and process them based on a unified data model. In addition, we were also requested to create a web application to provide an easy to use GUI for constructing medical studies, to automate analytical processing and to visualize the results.
To effectively address the ETL challenge Envion Software put together a team of database developers (one of whom was also acted as a Project Manager), backend and frontend developers.
A key ingredient in the successful execution of any project is effective communication. The team’s daily, direct and effective communication with client’s senior stakeholders assured effective development and on the go evolution in functionality. This communication was facilitated by Envion’s corporate documentation and project management infrastructure.
The technology-related challenges included the need to create a separate ETL mapping for each of the multiple data sources, as well as functionality for the regular addition of new data sources and incorporation of data set updates.
Due to the poor quality of the source data, the vast amounts of data, and the highly specialized nature of the latter it was not possible to apply a regular QA process. Thus, to ensure the proper quality of the deliverables a workflow process was implemented to allow for logging, for scheduling unrecoverable errors for human review, and augmentation of the ETL process to allow for as much automation as possible.
The performance of the system is always critical when you are dealing with multibillion record datasets. Thus the system evolved a hybrid cooperative ETL/ELT process, which enabled it to apply the most efficient approach to each transformation.
Implementing the project has allowed the research institution to dramatically improve the quality of their research by adding whole new dimensions to it. They are now able to analyze medical data at a much greater depth, and in much larger volumes.
Following the project’s delivery, the client made a decision to continue their cooperation with Envion Software, and requested us to develop a web portal to allow their researchers process large volumes of data in a cloud.
The project is still ongoing, and the Envion Software team is now engaged in further optimizing the system’s performance, adding support for new data sources, implementing support for one more common schema, and extending support to new data warehouse systems.