Pre-Clinical Biopharmaceutical R&D:

Information Modeling Amid Life Science Complexity

Organizational Goal: 

Define an core enterprise information master plan, specifically as a ‘to be’ logical enterprise information model for a Boston-based BioPharma R&D startup developing a Messenger RNA (mRNA / Ribonucleic Acid) platform, together with venture partners, to deliver transformative drugs to leverage the body’s innate ability, via mRNA, to automatically produce nearly any protein needed to effect therapy within and among our cells.  This cutting-edge science will combat cancers and other medical challenges that resist traditional therapies.


As an R&D startup with solid venture funding with global pharma leaders, it was time to work on data management.  To date, most data collection, analysis and reporting had been number-crunched first in spreadsheets and then disseminated somewhat informally, but the this approach could no longer scale.  Projects were approved for new specialty applications to organize, enter data, and capture results from scientific workflows, no common information model existed with which to standardize the data beyond the scope of a few processes within one department.  A group of subject matter experts were organized to provide needed domain context for a conceptual information model, then a full-blown logical data model, including specific data relationships and detailed attributes.  Initial findings confirmed that the in vitro studies (projects) and the assay (experiment) protocols available within them held a dizzying array of complex, overlapping classifications that defy shoe-horning into just a few neat hierarchies for analytic reporting purposes.  Similarly, individuals and teams of internal staff, and external experts, and even external organizations, play a variety of roles at various levels within and across the studies.  So, the logical data model would need to support this complexity and, just as important, be flexible enough to support changes in these areas quickly and without needing frequent structural changes to any database using our prescribed information design.


  • Facilitated an agile Model Storming process with SME’s, thereby quickly evolving the initial information model artifacts, validating their logic with product owners, and refining them with must-have details.  Researched and gathered definitions based on the Standard for Exchange of Nonclinical Data (SEND) from the Clinical Data Interchange Standards Consortium (CDISC). (See www.cdisc.org/standards/foundational/send).  Attributed entities ranged from high-level (Study Request, Protocol, Animal, Animal Group, Treatment Plan) to detailed (Actual Treatment, Agent, Animal Model, Assay Protocol, Assay Execution Step, Assay Result, Sample, Sampling Schedule, Discovery Construct and mRNA Construct).
  • Designed a set of universal data model entities to support the complex, unpredictable categorization (techies: many-to-many) among the various types and objectives among in-vitro studies.
  • Similarly, designed entities for assay-protocols (experimental methods) to support the evolving classifications among their various scientific-method types.
  • Delivered a flexible ‘Party’ entity scheme neatly organizing data for individuals, teams, external organizations and their roles in and around these studies. Instantiated a set of entities specifying known roles for parties, each party’s position, if applicable, within the customer’s organization or an external group, also specifying each party’s role in the myriad parts of a study, and finally, defined a re-use pattern for extending the set of party entities to support as-yet unknowable roles without the need for database re-engineering.
  • Wrote data glossary defining all delivered entities and attributes.
  • Briefed product owners on delivered model and glossary.


Instead of starting future contracts with venture partners, Clinical Research Organizations, and software development projects (build OR buy), with open questions about the organization’s complex information (data) model, including definitions, relationships, and all details, the product owners now use our published model to bring venture partners, vendors and internal colleagues together with a common lexicon and structure of organizational information, thus improving quality and reducing risk and costs across the organization.


Credit Scores and Identity Protection Services: Direct to Consumer

Organizational Goal: 

Establish and implement short-term tactics, while establishing long-term strategy for mission critical reporting and diverse analytics on ECS’s new, direct-to-consumer credit report monitoring and subscription service.


An established leader in direct-to-consumer credit monitoring subscriptions, ECS was nearly ready to release its second generation core platform.   The system’s NoSQL transactional infrastructure is built to scale out to any number of cloud-based, commodity servers with near-linear performance gains.  However, in light of this system’s loose-coupling between its published data services, and the resulting eventual data consistency (vs. the up-front ACID-compliant transaction consistency of RDBMS), it was preferred to perform mission-critical reporting and data analytics on this data within an ANSI-SQL-compliant database environment.

Solutions:  Tactical and Strategic

A cross-departmental team of Super-User business analysts, all knowledgeable of core business processes, was identified.  I (Daniel U.) was engaged as a Tableau and data architecture consultant to train Super Users on the use of Tableau Desktop and Tableau Server to analyze the new data, to identify any broken business processes, to create, validate and distribute operational metrics within visual-analytic dashboards.  Additionally, my charter was to help these analysts to differentiate emergency tactical response to daily hot-button issues from the strategic need for common data semantics and a data infrastructure that will reliably scale up to their expected large data volumes, and with that to plan a long-term solution for their operational reporting and data analytics infrastructure.


As a short-term solution, the NoSQL transactional data was replicated into a cloud RDBMS, and sourcing from it, the Tableau-based training and analytics-creation process began without delay.  Analysts validated and published approved SQL queries with common business rules.  Taught trainees how to quickly create ad-hoc reports and visualizations based on these rule-containing queries, performing limited data blending across source-queries, while also shown the limited standardization and drill-downs available while sourcing from these pre-built queries.  For publish-ready Tableau dashboards, configured, secured and administered Tableau Server while mentoring new Tableau Server administrators and dashboard-publishing analysts on its features to validate, distribute and collaborate on discovered insights.


Brought in formal classroom training in Data Vault Modeling for BI-DW team.  Evangelized, led platform selection, data modeling and team initiation of strategic data reporting / analytics infrastructure.  Preparing project-kickoff with internal staff and platform vendors.  An ongoing initiative as of this writing.


Business stakeholders are receiving critical metrics in support of continued ramp-up of new core business platform and ready for cut-over of existing customers.  Short-fuse operational and analytic requirements are satisfied.  BI-DW team is confident in unique, new ability to build scalable data reporting infrastructure well before reporting and analytic details defined, which had otherwise blocked their progress.


National Oncology Clinical Data Exchange

Organizational Goal: 

Acquire and integrate clinical data from diverse systems of electronic health records, drug dispensaries, laboratories, and claims management for twenty eight independent oncology clinics across the U.S. in order to support any form of reporting and analytics that customers may conceive of now or later.


Reporting and analytics requirements were mostly undefined when the integration was required, so the firm chose and had already begun implementing a Data Vault data warehouse modeling methodology for their data warehouse in order to avoid erroneous data transformation.  However, although many separate data vault subject areas were designed and being loaded in isolation, neither the design nor the loading logic for the integration structures that would actually join them together were implemented or even well understood, due to the newness of the Data Vault method to the team.  Once these integrating links were completed, subsequent design and loading would be needed to add data on claims, which held the potential for substantially increased integration between the subject areas of providers, patients, diagnoses, therapeutic drugs, treatments, outcomes, costs and reimbursement.


Performed technical assessment of in-progress Data Vault EDW and recommended key improvements for logic and data integrity.  Led two ETL engineers in subsequent implementation.  Modeled and defined loading logic for just enough new link structures to join all existing subject areas without unnecessary table proliferation.  Designed extension of operational data store to integrate snapshots of claims data from disparate customer claims management systems for data de-duplication.  Modeled data and defined load logic for de-duplicated claims subject data in EDW.  Guided QA testing.  Modeled star schema downstream data presentation area for dashboard.


Organization and customers now enjoy integrated reporting and analytics limited only by data itself.  New claims management EDW subject area is acknowledged as the most accurate EDW subject area.  Since the EDW captures, loosely integrates, and historizes all source system data, two nearly unprecedented benefits result:  First, new data from as yet unknown clinical systems in the future is easily integrate-able with zero refactoring of existing EDW data structures.  Secondly, as reporting and analytic requirements evolve or intensity, any downstream data presentation layers can be easily re-designed and reloaded with no loss of historical data.CS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s