Johannes Ulander: Towards Better Data

At the end of November 17, Johannes Ulander presented at the PhUSE single day event in Beerse. His presentation, ‘Towards Better Data’ considered linked data and graph data bases and the impact this has on the quality of the data. One of the key drivers for A3 Informatics is to improve data quality and to make using data standards as straight forward as possible.

Johannes started by considering the way we currently collect and view data in tabular form. This presents the data to us as two-dimensional structure in rows and columns. This is the legacy view of data to make it human readable and comes from data being collected and stored in relational databases and excel spreadsheets. Representing the data in tabular form works well, as long as we know the questions we want to ask, or how we want to query the data. However, what happens when the questions change or we want to add another layer of data? The underlying model then has to change and, if you are working with a large team, how do you manage those changes and maintain a clear understanding of those changes? This is perhaps what the industry is now faced with when using the SDTM model. As we add more variations to SDTM, so the standard starts to become muddled and the data is becoming lost behind the model. It is complicated further by the regulators receiving different ‘flavours of SDTM’ and consequently placing their own rules on the model creating conflicting standards. Unfortunately, the SDTM model is now struggling to keep pace with the industry it is designed to serve.

SDTM Mapping

Mapping and inconsistencies in data are still an issue.

Johannes then went on to talk about what it is we actually need when working with a standard and the data:

  • Control
    • Constantly updated versions
    • To maintain the rate of change of versions
  • Precision
    • Which version am I using?
  • Visibility
    • What has changed in a new version?
    • When did it change?
    • What is the impact of the change?
  • Ease of use
    • The standard needs to be easy to use
    • It must be machine readable

Linked data and Graph databases

Of course, this leads into considering linked data and graph databases, something both Johannes and Dave Iberson-Hurst have been discussing with the industry for many years. In his presentation, Johannes uses the example of the demographics domain and the collection of race and ethnicity. The SDTM model breaks down when you are starting to collect more than one race for one subject. It can of course be handled in the tabular structure but the model needs to be amended and variability is introduced and different sponsors may handle that variability in different ways. As such, the model then breaks down. Graph databases on the other hand have many advantages for the end user. If we return to the issue of race, there is no need to adjust your query on the data as the database has already created the relationships for a subject who has two race entries. The query then just needs to be visualised for the user.

SDTM and Race/Ethnicity Data

SDTM Demographics Domain Issues


Graph Queries

Graph Queries & Visualisation

The presentation goes on to discuss the direct impact on using tools which are built on linked data and you can download the slides here. [PDF OF SLIDES] At A3 Informatics, our team have built an MDR that is designed using graph databases and linked data. These foundations are the key to our tool development for both Glandon Study Build and Glandon Define.xml. If you want to gain control your CDISC standards and not have issues with version management, thereby future proofing your clinical research, please talk to us about how we can help you.

You can download Johannes’ complete presentation here.