One of the three main tasks of data management is to translate individual subject data to logically grouped datasets ready for analysis. Study data captured in a structured format with which the statistician can work. But with what datasets can the statistician work?
In fact, with everything. Because he or she is capable of transforming your datasets to suiting datasets with the statistical software. So the question could better be, with what datasets can the statistician comfortably work? Without re-structuring the data delivered?
Well, first it is handy to know a bit more about the products statisticians deliver.
1. Tables with descriptive statistics describing the subject group under study. Overall descriptive stats for all subjects together or descriptive stats per treatment group or per gender.
N=208
Gender Male 84 ( 40,4%) Female 124 (59,6%)
Age (yrs) Mean = 56,7 Min = 34 - Max = 82
Weight (kg) Mean = 78,8 Min = 51 - Max = 111
Tables for safety outcomes. Numbers and percentages of adverse events that occurred. Overall and per treatment group.
Adverse events Medication A Medication B
(n=1205) (n=1200)
No. (%) of patients No. (%) of patients
Gastrointestinal disorders 101 (8,4%) 113 (9,4%)
Diarrhea 67 (5,6%) 66 (5,5%)
Nausea 61 (5,1%) 57 (4.8%)
Muscoloskeletal and connective
tissue disorders 98 (8,1%) 89 (7,4%)
Pain in extremity 45 (3,7%) 59 (4,9%)
Back pain 72 (6,0%) 62 (5,2%)
2. Graphs to visually compare the different intervention groups under study. E.g. survival rates, pharmacokinetics.
3. Statistical tests to compare the efficacy objectives between the different intervention groups. On which the conclusion of your clinical study report will be based.
“Subjects receiving the new medicine were significantly
more likely to respond well up on overall quality of life, than were those who received the placebo(P < 0,05), whereas those walking within 24 hours after surgery, or weight loss were no more likely to respond well than those without these features.”
4. And last but not least, raw data listings if not already created by data management.
(The advantage of creating raw data listings for a study is that you get to know the individual study data. You are busy with all individual data records, instead of grouping them into a table, graph or analysis. It helps to get to know the individual drop-outs, the outliers and the missing measurements.)
Subject number Visit date Diastolic blood pressure Heart rate
(mm Hg) (bpm)
1209 13AUG2011 127 78
1210 15AUG2011 116 89
1301 16JUN2011 104 91
This about the products statisticians deliver for a clinical study report. Secondly some examples of datasets and why chosen as such:
1. A demography dataset, DEMO, is delivered with all demography data for all subjects, like gender, date of birth, but also subject number and date of screening. Only this demography dataset is needed to program a descriptive stats table for all subjects.
SUBJID DSCREEN GENDER DBIR
1209 12JUN2011 1 17OCT1945
1210 13JUN2011 2 10FEB1961
1301 07JUL2011 1 04DEC1954
In- and exclusion criteria can be a separate dataset. Because these are only listed and checked for deviations.
2. Datasets contain subject numbers and most of them also have visit dates. These so-called key data fields, are used to combine data from different datasets. E.g. a dataset revealing the actual treatments merged with the demography dataset. Using the subject number, both datasets can be combined. And a descriptive stats table of the subjects per treatment group can be programmed.
SUBJID GROUP TRTLABEL
1209 A New medicine
1210 A New medicine
1301 B Placebo
With exception of the key data (subject number, visit number), CRF data should exist in one dataset only. Either in this or that, but not in two or more datasets.
3. Another example, blood – and urine laboratory assessments for all visits combined in one dataset. To check for laboratory result shifts across visits.
SUBJID DVISIT LABP LABR UNIT OUT CS
1210 13JUN2011 ASAT 68 U/L 2 2
1210 20JUN2011 ASAT 123 U/L 1 1
1210 08AUG2011 ASAT 72 U/L 2 2
1210 15AUG2011 ASAT 52 U/L 2 2
1210 13JUN2011 Creatinine 69 umol/L 2 2
All measurements collected in one visit are not necessarily present in one dataset. On the contrary, it is more logical to have different measurements in separate datasets. Maybe a measurements dataset for small repeating measurements.
4. Datasets that needed normalization, like often is more convenient for medical history, in- and exclusion criteria and laboratory datasets, can not be combined with non-normalized data in one dataset. Normalized datasets have additional key fields next to subject number and visit number. E.g Criteria number for an in- and exclusion criteria dataset. Or a specimen (blood/urine) field and a laboratory test field for a laboratory dataset.
SUBJID DVISIT CRITNO CRIT INEX
1210 13JUN2011 4 BMI < 25 Yes
1210 13JUN2011 4 BMI < 25 Yes
1210 13JUN2011 5 Is the subject pregnant? No
Thus the single outcome of the one-time measured pregnancy test at screening is often added to the demography dataset instead of added to the in- and exclusion criteria dataset.
5. For identification and search reasons, adverse event and concomitant medication datasets contain adverse event numbers respectively concomitant medication numbers.
SUBJID CONM No. Medication Reason given AE No.
1301 23 Atenolol Prophylaxis
1301 24 Prednisone Adverse event 3
1301 25 Acetylsalicylic acid Adverse event 12
Do you get an idea of how to structure your CRF data in logically grouped datasets?
In practice, get the blank CRF and sit down with the statistician or statistical programmer to logically group all CRF data in datasets. The total number of datasets for a regular clinical study…. is around 20 to 30 different datasets. Estimated time to draw CRF data to grouped datasets; 30 minutes. And you will discover with what structured format the statistician comfortable works.
Kind regards, Maritza
© 2011, Maritza Witteveen, ProCDM
You’re welcome to re-publish this newsletter if you add the following text to it. This is an article from Maritza Witteveen of ProCDM. Data management for clinical research. Receive tips and the free e-book ‘Five strategies to get reliable, quality clinical data’ by subscribing via http://www.procdm.nl/pages/knowledgebase.asp.