Publications

37 Publications visible to you, out of a total of 37

Abstract (Expand)

Introduction LIFE is a large epidemiological study aiming at causes of common civilization diseases including adiposity, dementia, and depression. Participants of the study are probands and patients. Probands are randomly selected and invited from the set of Leipzig (Germany) inhabitants while patients with known diseases are recruited from several local hospitals. The management of these participants, their invitation and contact after successful attendance as well as the support of nearly all ambulance processes requires a complex ambulance management. Each participant is examined by a set of investigation instruments including interviews, questionnaires, device-specific investigations, specimen extrac- tions and analyses. This necessitates a complex management of the participantspecific examination program but also specific input forms and systems allowing to capture administrative (measurement and process environment or specific set-ups) and scientific data. Additionally, the taken and prepared specimens need to be labeled and registered from which participant they stem and in which fridge or bio-tank they are stored. At the end, all captured data from ambu- lance management, investigation instruments and laboratory analyses need to be integrated before they can be analyzed. These complex processes and requirements necessitate a comprehensive IT-infrastructure. Methods Our IT-infrastructure modularly consists of several software applications. A main application is responsible for the complex participant and ambulance man- agement. The participant management cope with selected participant data and contact information. To protect participant’s privacy, a participant identifier (PID) is created for each participant that is associated to all data which is managed and captured in the following. In ambulance management, each participant is associated with a predefined investigation program. This investigation program is represented in our systems by a tracking card that is available as print-out and electronically. The electronic version of tracking cards is utilized by two software applications, the Assessment Battery and the CryoLab. The former system coordinates the input of scientific data into online input forms. The input forms are designed in the open source system LimeSurvey. Moreover, the Assessment Battery is used to monitor the input process, i.e., it shows which investigations are already completed and which of them are still to do. The Cryolab system registers and tracks all taken specimens and is used to annotate extraction and specific preparation processes, e.g., for DNA isolation. Moreover, it tracks specimen storage in fridges and bio-tanks. A central component is the metadata repository collecting metadata from ambulance management and data input systems. It is the base for the integra- tion of relevant scientific data into a central research database. The integration follows a mapping-based approach. The research database makes raw data and special pre-computations called derivatives available for later data analysis. Results & Discussion We designed and implemented a complex and comprehensive IT-infrastructure for the epidemiological research in LIFE. This infrastructure consists of several software applications which are loosely coupled over specified interfaces. Most of the software applications are new implementations; only for capturing scientific data external software application are applied.

Authors: Toralf Kirsten, A. Kiel, M. Kleinert, R. Speer, M. Rühle, Hans Binder, Markus Löffler

Date Published: 30th Sep 2013

Publication Type: Not specified

Abstract (Expand)

Introduction LIFE is a epidemiological study aiming at discovering causes of common disorders as well as therapy and diagnostic possibilities. It applies a huge set (currently more then 400) of complex instruments including different kinds of interviews, questionnaires, and technically founded investigations on thousands of Leipzig inhabitants. Correlations in data, e.g., between diseases on the one hand and a combination of life conditions on the other requires high quality data. Data errors affect this data quality. However, avoiding every error is nearly impossible. Therefore, the captured data routinely needs to be validated and revised (curated) in case of error. Methods From the data-perspective, we differentiate between two main types of data errors, syntax or format errors and semantic errors. Syntax errors mostly occur when the data needs to be converted to change its data type, e.g., from text to number or from text to date/time fields. This is often the case when data is captured as text by the data input system but should be centrally managed and analyzed in a different format. Hence, the data conversion is only successful when the input data contains the data in the right format. Data conversion is applied when the data is transfered from data input systems to the central research database collecting all captured data in an integrated and harmonized form. Corrupted data that cannot be converted to the target data type is replaced by a missing value (also called null value, nil etc.). The definition of a default value is not sufficient since the default usually depends on the corresponding question or measurement input field and can strain analysis results when they are not concerned. Moreover, the definition process for every question/input field would be to time-consuming. Semantic errors are much harder to detect than syntax errors. Typically, they are semantically implausible outliers or are part of other artefacts, e.g.,when data of two input fields is mixed up. Currently, we let the detection of semantic errors to a epidemiological quality analysis that is performed by several statisticians. Conversely, syntax errors can be easily technically detected; they are logged when they occur in the process of transferring data from data input systems to the central research database. With respect to both types of errors, syntax and semantic errors, we designed and implemented a software application called Curation-DB allowing to curate (adapt and change) data. In particular, the system lists the logged syntax errors occurring during the data conversion step daily at night. A user can adapt the current input value by specifying a new (target) value for a listed syntax problem. With this specification, the corresponding input value is replaced by the specified value before the next conversion step is started. This specification process can be iteratively applied for a corresponding input value when the syntax problem is not solved by the current specification. The semantic errors need to be first detected separately. Then, a user can specify value changes replacing an existing value with the new specified one. Results & Discussion The Curation-DB application is already in use. Currently, selected quality managers routinely check the listed syntax errors. There are currently more than 2000 of such errors curated. In near future, we will extend this software to manage rules validating research data semantically to automatically detect obvious semantic errors.

Authors: J. Wagner, A. Kiel, Toralf Kirsten

Date Published: 30th Sep 2013

Publication Type: Not specified

Powered by
(v.1.13.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig

By continuing to use this site you agree to the use of cookies