A3Informatics
  • Home
  • Tools
  • Blog
  • Privacy Policy
Select Page

SDTM Mapping in a Linked Data World

04/09/2018
SDTM Mapping in a Linked Data World

by Kirsten Langendorf – CDISC Subject Matter Expert

 

SDTM, Tables and Linked Data

It can be a tricky communication task to convey the linked graph data approach for handling SDTM data, to an audience used to making SDTM in a table-based approach (SAS and excel tables). Typical questions – and very valid ones – can be:

  • How do I map the data to SDTM?
  • Do I need to learn linked graph data stuff?
  • What is the benefit of storing the data in a graph?
  • Where/how do I get the SDTM table out?

In this blog, we will try to answer these questions and ‘translate’ the table-based approach to the linked-graph based approach.

Tables and the Benefits of Graphs

Most people working with SDTM data are used to viewing the data in a tabular format like those shown in Figure 1. The translation to the linked graph representation also makes sense when explained.

SDTM Linked Data Fig 1

Figure 1 Tabular and linked-graph representation of SDTM

Are there any benefits of the graph representation? One thing is that it does not separate metadata from data. It expresses directly that the subject (with USUBJID:01-701-1148) got the first symptoms of Alzheimer’s on 10 of June 2018. In the case of the SDTM structure above it, it is necessary to interpret and combine information put in different columns to be able to decode it for human understanding. As an example, there is no immediate interpretation of the category (the variable MHCAT with label Category for Medical History) that suggests that it should contain Primary Diagnosis and First Symptoms, when other categories in other domains contain Inclusion, Haematology or the name of a questionnaire.

Another thing to note is that there is no duplication of information in the graph. In the table, the variable USUBJID is repeated for each new relationship represented in MHCAT (Primary Diagnosis, First Symptoms and Significant pre-existing condition). The graph only needs to add new relationships to an existing subject. So when adding the second subject, with relations to the same disease only needs a new subject and new relationships to the already existing diseases.

Do I need to be a graph expert?

However, the next thought is most likely: “This is just a few rows. How do I get the overview of my data when it looks like a graph, see Figure 2?  “Do I need to build up my skills in linked graphs?”

SDTM Linked Data Fig 2

Figure 2 Many rows in SDTM in a linked graph

The answer is: No, you do not need to be an expert in linked graphs and you will not view your SDTM data in the graphical view as in Figure 2. The application that are built on top of the linked graph database should have the functionality to present the data in the usual tabular format. Instead of double-clicking on a table icon to open the data, the user should have a similar button to view the e.g. MH. The underlying functionality of the button will just be a query on the graph and presenting it in the same tabular format, see Figure 3, but the end-user does not need to think about that.

SDTM Linked Data Fig 3

Figure 3 Presenting SDTM in a tabular view – query

Export functions are made to export the data in various formats (Excel, csv, SAS, XPT).

What about external sources?

The next question is then how do I map the data I get from EDC and external sources to SDTM? The short answer is that when using linked graph data there is no mapping process. The location of the data is determined by the annotation stored in the underlying graph. When setting up a study’s schedule of assessments, a set of forms are used. These forms are either made in the ‘traditional way’ using questions and mappings or using Biomedical Concepts, see Figure 4 for an example.

SDTM Linked Data Fig 4

Figure 4 Form created by use of BCs and traditional questions

Using Biomedical Concepts provides annotations automatically by utilising the standard mapping defined in BRIDG and the user associating a BC to a domain. So, the only thing the user must decide is which domain the collected item (e.g. systolic blood pressure) belongs to. Normally we place the same information into the same domain but often this is arbitrary. Also consider the recent changes with the QS, CC and FT domains. BCs give us the flexibility to move data as ideas and needs change.

Creating SDTM using this linked-graph approach with mappings stored in the graph will also remove the QC task of ensuring consistency between CRF annotation and SDTM data. The metadata used for the annotation and the SDTM datasets definition is coming from the same source – the linked-graph – one source of knowledge.

What about supplemental qualifiers and derivations?

Supplemental qualifiers have been introduced to SDTM to allow for additional variables to be added in a consistent structure (the supplemental domains) and to ensure that the main domains are kept in a standardised way. From a linked-graph perspective this is just a matter of data presentation. In a linked-graph the user just adds variables to a domain. These will be supplemental qualifiers (non-standard variables) when presented in a tabular form. Non-standard (SDTM) data collected will be made as a traditional question, see Figure 4, and the non-standard variable will be used in the annotation for the question (<NONSTANDARDVAR> in supplemental<domain>).

Derivation can also be defined in the linked-graph. Specification of a derivation, e.g. –DY variables, is metadata, see Figure 5.

SDTM Linked Data Fig 5

Figure 5 Derivations metadata in linked-graph

The method can contain simple calculations as illustrated in Figure 5, or more complex derivations like a macro.

So SDTM can be produced by displaying the underlying linked-graph information in the well-known SDTM format without having to think about graph technology. But is this enough to leave a well-known process using programming and excel? Are there any other benefits of using the linked data approach for making SDTM?

By design the linked graph data will ensure traceability: terminology – BCs – forms – SDTM data and versions hereof. This means that it is straight forward to create a report to display differences (e.g. between terminology or SDTM models), relationships (terminology, BCs, forms and SDTM domains), and impacts (of changing versions). It will just be a query on the underlying graph model.

Take home message?

Linked data allows us to manage our mapping issues. Mapping is a process of creating relationships. Linked data allows us to build the relationships from the start saving us from mapping pain

Recent Posts

  • Flying through PhUSE with Linked Data, FHIR, & Biomedical Concepts
  • CDISC Standards: Avoiding the Square Wheel
  • SDTM Mapping in a Linked Data World
  • A Chance to Change – Reflections on CDISC and PhUSE
  • Removing Silos: Leveraging CDISC Standards

Tags

aCRF Biomedical Concepts CDISC Consistency CRF Data define.xml Domains EDC eSource FHIR Glandon Define Glandon Study Build Glandon Suite Graph Databases Linked Data LOINC MDR Metadata PhUSE SDTM Silos Standards Study Workbench Technology Debt Terminology Therapeutic Areas Updates Version Control

Archives

  • December 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • January 2018
  • December 2017
  • October 2017
  • July 2017