Publications

37 Publications visible to you, out of a total of 37

Abstract (Expand)

Accurately estimating the length of stay (LOS) of patients admitted to the intensive care unit (ICU) in relation to their health status helps healthcare management allocate appropriate resources and resources and better plan for the future. This paper presents predictive models for the LOS of ICU patients from the MIMIC-IV database based on typical demographic and administrative data, as well as early vital signs and laboratory measurements collected on the first day of ICU stay. The goal of this study was to demonstrate a practical, stepwise approach to predicting patient’s LOS in the ICU using machine learning and early available typical clinical data. The results show that this approach significantly improves the performance of models for predicting actual LOS in a pragmatic framework that includes only data with short stays predetermined by a prior classification.

Authors: Lars Hempel, Sina Sadeghi, Toralf Kirsten

Date Published: 1st Jun 2023

Publication Type: Journal article

Abstract (Expand)

BACKGROUND: Clinical trials, epidemiological studies, clinical registries, and other prospective research projects, together with patient care services, are main sources of data in the medical research domain. They serve often as a basis for secondary research in evidence-based medicine, prediction models for disease, and its progression. This data are often neither sufficiently described nor accessible. Related models are often not accessible as a functional program tool for interested users from the health care and biomedical domains. OBJECTIVE: The interdisciplinary project Leipzig Health Atlas (LHA) was developed to close this gap. LHA is an online platform that serves as a sustainable archive providing medical data, metadata, models, and novel phenotypes from clinical trials, epidemiological studies, and other medical research projects. METHODS: Data, models, and phenotypes are described by semantically rich metadata. The platform prefers to share data and models presented in original publications but is also open for nonpublished data. LHA provides and associates unique permanent identifiers for each dataset and model. Hence, the platform can be used to share prepared, quality-assured datasets and models while they are referenced in publications. All managed data, models, and phenotypes in LHA follow the FAIR principles, with public availability or restricted access for specific user groups. RESULTS: The LHA platform is in productive mode (https://www.health-atlas.de/). It is already used by a variety of clinical trial and research groups and is becoming increasingly popular also in the biomedical community. LHA is an integral part of the forthcoming initiative building a national research data infrastructure for health in Germany.

Authors: T. Kirsten, F. A. Meineke, H. Loeffler-Wirth, C. Beger, A. Uciteli, S. Staubert, M. Lobe, R. Hansel, F. G. Rauscher, J. Schuster, T. Peschel, H. Herre, J. Wagner, S. Zachariae, C. Engel, M. Scholz, E. Rahm, H. Binder, M. Loeffler

Date Published: 3rd Aug 2022

Publication Type: Journal article

Abstract (Expand)

Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitationsng limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available.

Authors: Masoud Abedi, Lars Hempel, Sina Sadeghi, Toralf Kirsten

Date Published: 1st Jul 2022

Publication Type: Journal article

Abstract (Expand)

Abstract Background In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy risks like the accidental disclosure of data to third parties. Therefore, alternative data usage policies, which comply with present privacy guidelines, are of particular interest. Objective We aim to enable analyses on sensitive patient data by simultaneously complying with local data protection regulations using an approach called the Personal Health Train (PHT), which is a paradigm that utilises distributed analytics (DA) methods. The main principle of the PHT is that the analytical task is brought to the data provider and the data instances remain in their original location. Methods In this work, we present our implementation of the PHT paradigm, which preserves the sovereignty and autonomy of the data providers and operates with a limited number of communication channels. We further conduct a DA use case on data stored in three different and distributed data providers. Results We show that our infrastructure enables the training of data models based on distributed data sources. Conclusion Our work presents the capabilities of DA infrastructures in the health care sector, which lower the regulatory obstacles of sharing patient data. We further demonstrate its ability to fuel medical science by making distributed data sets available for scientists or health care practitioners.

Authors: Sascha Welten, Yongli Mou, Laurenz Neumann, Mehrshad Jaberansary, Yeliz Yediel Ucer, Toralf Kirsten, Stefan Decker, Oya Beyan

Date Published: 1st Jun 2022

Publication Type: Journal article

Abstract (Expand)

Clinical research based on data from patient or study data management systems plays an important role in transferring basic findings into the daily practices of physicians. To support study recruitment, diagnostic processes, and risk factor evaluation, search queries for such management systems can be used. Typically, the query syntax as well as the underlying data structure vary greatly between different data management systems. This makes it difficult for domain experts (e.g., clinicians) to build and execute search queries. In this work, the Core Ontology of Phenotypes is used as a general model for phenotypic knowledge. This knowledge is required to create search queries that determine and classify individuals (e.g., patients or study participants) whose morphology, function, behaviour, or biochemical and physiological properties meet specific phenotype classes. A specific model describing a set of particular phenotype classes is called a Phenotype Specification Ontology. Such an ontology can be automatically converted to search queries on data management systems. The methods described have already been used successfully in several projects. Using ontologies to model phenotypic knowledge on patient or study data management systems is a viable approach. It allows clinicians to model from a domain perspective without knowing the actual data structure or query language.

Authors: Christoph Beger, Franz Matthies, Ralph Schäfermeier, Toralf Kirsten, Heinrich Herre, Alexandr Uciteli

Date Published: 1st May 2022

Publication Type: Journal article

Abstract (Expand)

The constant upward movement of data-driven medicine as a valuable option to enhance daily clinical practice has brought new challenges for data analysts to get access to valuable but sensitive data due to privacy considerations. One solution for most of these challenges are Distributed Analytics (DA) infrastructures, which are technologies fostering collaborations between healthcare institutions by establishing a privacy-preserving network for data sharing. However, in order to participate in such a network, a lot of technical and administrative prerequisites have to be made, which could pose bottlenecks and new obstacles for non-technical personnel during their deployment. We have identified three major problems in the current state-of-the-art. Namely, the missing compliance with FAIR data principles, the automation of processes, and the installation. In this work, we present a seamless on-boarding workflow based on a DA reference architecture for data sharing institutions to address these problems. The on-boarding service manages all technical configurations and necessities to reduce the deployment time. Our aim is to use well-established and conventional technologies to gain acceptance through enhanced ease of use. We evaluate our development with six institutions across Germany by conducting a DA study with open-source breast cancer data, which represents the second contribution of this work. We find that our on-boarding solution lowers technical barriers and efficiently deploys all necessary components and is, therefore, indeed an enabler for collaborative data sharing.

Authors: Sascha Welten, Lars Hempel, Masoud Abedi, Yongli Mou, Mehrshad Jaberansary, Laurenz Neumann, Sven Weber, Kais Tahar, Yeliz Ucer Yediel, Matthias Löbe, Stefan Decker, Oya Beyan, Toralf Kirsten

Date Published: 1st Apr 2022

Publication Type: Journal article

Abstract (Expand)

Sharing data is of great importance for research in medical sciences. It is the basis for reproducibility and reuse of already generated outcomes in new projects and in new contexts. FAIR data principles are the basics for sharing data. The Leipzig Health Atlas (LHA) platform follows these principles and provides data, describing metadata, and models that have been implemented in novel software tools and are available as demonstrators. LHA reuses and extends three different major components that have been previously developed by other projects. The SEEK management platform is the foundation providing a repository for archiving, presenting and secure sharing a wide range of publication results, such as published reports, (bio)medical data as well as interactive models and tools. The LHA Data Portal manages study metadata and data allowing to search for data of interest. Finally, PhenoMan is an ontological framework for phenotype modelling. This paper describes the interrelation of these three components. In particular, we use the PhenoMan to, firstly, model and represent phenotypes within the LHA platform. Then, secondly, the ontological phenotype representation can be used to generate search queries that are executed by the LHA Data Portal. The PhenoMan generates the queries in a novel domain specific query language (SDQL), which is specific for data management systems based on CDISC ODM standard, such as the LHA Data Portal. Our approach was successfully applied to represent phenotypes in the Leipzig Health Atlas with the possibility to execute corresponding queries within the LHA Data Portal.

Authors: A. Uciteli, C. Beger, J. Wagner, A. Kiel, F. A. Meineke, S. Staubert, M. Lobe, R. Hansel, J. Schuster, T. Kirsten, H. Herre

Date Published: 24th May 2021

Publication Type: Journal article

Abstract (Expand)

Sharing data is of great importance for research in medical sciences. It is the basis for reproducibility and reuse of already generated outcomes in new projects and in new contexts. FAIR data principles are the basics for sharing data. The Leipzig Health Atlas (LHA) platform follows these principles and provides data, describing metadata, and models that have been implemented in novel software tools and are available as demonstrators. LHA reuses and extends three different major components that have been previously developed by other projects. The SEEK management platform is the foundation providing a repository for archiving, presenting and secure sharing a wide range of publication results, such as published reports, (bio)medical data as well as interactive models and tools. The LHA Data Portal manages study metadata and data allowing to search for data of interest. Finally, PhenoMan is an ontological framework for phenotype modelling. This paper describes the interrelation of these three components. In particular, we use the PhenoMan to, firstly, model and represent phenotypes within the LHA platform. Then, secondly, the ontological phenotype representation can be used to generate search queries that are executed by the LHA Data Portal. The PhenoMan generates the queries in a novel domain specific query language (SDQL), which is specific for data management systems based on CDISC ODM standard, such as the LHA Data Portal. Our approach was successfully applied to represent phenotypes in the Leipzig Health Atlas with the possibility to execute corresponding queries within the LHA Data Portal.

Authors: Alexandr Uciteli, Christoph Beger, Jonas Wagner, Alexander Kiel, Frank A Meineke, Sebastian Stäubert, Matthias Löbe, René Hänsel, Judith Schuster, Toralf Kirsten, Heinrich Herre

Date Published: 1st May 2021

Publication Type: InCollection

Abstract (Expand)

Planning clinical studies to check medical hypotheses requires the specification of eligibility criteria in order to identify potential study participants. Electronically available patient data allows to support the recruitment of patients for studies. The Smart Medical Information Technology for Healthcare (SMITH) consortium aims to establish data integration centres to enable the innovative use of available healthcare data for research and treatment optimization. The data from the electronic health record of patients in the participating hospitals is integrated into a Health Data Storage based on the Fast Healthcare Interoperability Resources standard (FHIR), developed by HL7. In SMITH, FHIR Search is used to query the integrated data. An investigation has shown the advantages and disadvantages of using FHIR Search for specifying eligibility criteria. This paper presents an approach for modelling eligibility criteria as well as for generating and executing FHIR Search queries. Our solution is based on the Phenotype Manager, a general ontological phenotyping framework to model and calculate phenotypes using the Core Ontology of Phenotypes.

Authors: A. Uciteli, C. Beger, J. Wagner, T. Kirsten, F. A. Meineke, S. Staubert, M. Lobe, H. Herre

Date Published: 26th Apr 2021

Publication Type: Journal article

Abstract (Expand)

BACKGROUND: The successful determination and analysis of phenotypes plays a key role in the diagnostic process, the evaluation of risk factors and the recruitment of participants for clinical and epidemiological studies. The development of computable phenotype algorithms to solve these tasks is a challenging problem, caused by various reasons. Firstly, the term 'phenotype' has no generally agreed definition and its meaning depends on context. Secondly, the phenotypes are most commonly specified as non-computable descriptive documents. Recent attempts have shown that ontologies are a suitable way to handle phenotypes and that they can support clinical research and decision making. The SMITH Consortium is dedicated to rapidly establish an integrative medical informatics framework to provide physicians with the best available data and knowledge and enable innovative use of healthcare data for research and treatment optimisation. In the context of a methodological use case 'phenotype pipeline' (PheP), a technology to automatically generate phenotype classifications and annotations based on electronic health records (EHR) is developed. A large series of phenotype algorithms will be implemented. This implies that for each algorithm a classification scheme and its input variables have to be defined. Furthermore, a phenotype engine is required to evaluate and execute developed algorithms. RESULTS: In this article, we present a Core Ontology of Phenotypes (COP) and the software Phenotype Manager (PhenoMan), which implements a novel ontology-based method to model, classify and compute phenotypes from already available data. Our solution includes an enhanced iterative reasoning process combining classification tasks with mathematical calculations at runtime. The ontology as well as the reasoning method were successfully evaluated with selected phenotypes including SOFA score, socio-economic status, body surface area and WHO BMI classification based on available medical data. CONCLUSIONS: We developed a novel ontology-based method to model phenotypes of living beings with the aim of automated phenotype reasoning based on available data. This new approach can be used in clinical context, e.g., for supporting the diagnostic process, evaluating risk factors, and recruiting appropriate participants for clinical and epidemiological studies.

Authors: A. Uciteli, C. Beger, T. Kirsten, F. A. Meineke, H. Herre

Date Published: 21st Dec 2020

Publication Type: Journal article

Abstract (Expand)

BACKGROUND: The successful determination and analysis of phenotypes plays a key role in the diagnostic process, the evaluation of risk factors and the recruitment of participants for clinical and epidemiological studies. The development of computable phenotype algorithms to solve these tasks is a challenging problem, caused by various reasons. Firstly, the term ’phenotype’ has no generally agreed definition and its meaning depends on context. Secondly, the phenotypes are most commonly specified as non-computable descriptive documents. Recent attempts have shown that ontologies are a suitable way to handle phenotypes and that they can support clinical research and decision making. The SMITH Consortium is dedicated to rapidly establish an integrative medical informatics framework to provide physicians with the best available data and knowledge and enable innovative use of healthcare data for research and treatment optimisation. In the context of a methodological use case ’phenotype pipeline’ (PheP), a technology to automatically generate phenotype classifications and annotations based on electronic health records (EHR) is developed. A large series of phenotype algorithms will be implemented. This implies that for each algorithm a classification scheme and its input variables have to be defined. Furthermore, a phenotype engine is required to evaluate and execute developed algorithms. RESULTS: In this article, we present a Core Ontology of Phenotypes (COP) and the software Phenotype Manager (PhenoMan), which implements a novel ontology-based method to model, classify and compute phenotypes from already available data. Our solution includes an enhanced iterative reasoning process combining classification tasks with mathematical calculations at runtime. The ontology as well as the reasoning method were successfully evaluated with selected phenotypes including SOFA score, socio-economic status, body surface area and WHO BMI classification based on available medical data. CONCLUSIONS: We developed a novel ontology-based method to model phenotypes of living beings with the aim of automated phenotype reasoning based on available data. This new approach can be used in clinical context, e.g., for supporting the diagnostic process, evaluating risk factors, and recruiting appropriate participants for clinical and epidemiological studies.

Authors: Alexandr Uciteli, Christoph Beger, Toralf Kirsten, Frank A Meineke, Heinrich Herre

Date Published: 1st Dec 2020

Publication Type: Journal article

Abstract (Expand)

Purpose: The onset and progression of optic neuropathies like glaucoma often occurs asymmetrically between the two eyes of a patient. Interocular circumpapillary retinal nerve fiber layer thickness (cpRNFLT) differences could detect disease earlier. To apply such differences diagnostically, detailed location specific norms are necessary. Methods: Spectral-domain optical coherence tomography cpRNFLT circle scans from the population-based Leipzig Research Centre for Civilization Diseases–Adult study were selected. At each of the 768 radial scanning locations, normative interocular cpRNFLT difference distributions were calculated based on age and interocular radius difference. Results: A total of 8966 cpRNFLT scans of healthy eyes (4483 patients; 55% female; age range, 20–79 years) were selected. Global cpRNFLT average was 1.53 µm thicker in right eyes (P < 2.2 × 10–16). On 96% of the 768 locations, left minus right eye differences were significant (P < 0.05), varying between +11.6 µm (superonasal location) and −11.8 µm (nasal location). Increased age and difference in interocular scanning radii were associated with an increased mean and variance of interocular cpRNFLT difference at most retinal locations, apart from the area temporal to the inferior RNF bundle where cpRNFLT becomes more similar between eyes with age. Conclusions: We provide pointwise normative distributions of interocular cpRNFLT differences at an unprecedentedly high spatial resolution of 768 A-scans and reveal considerable location specific asymmetries as well as their associations with age and scanning radius differences between eyes. Translational Relevance: To facilitate clinical application, we implement these age- and radius-specific norms across all 768 locations in an open-source software to generate patient-specific normative color plots.

Authors: Neda Baniasadi, Franziska G. Rauscher, Dian Li, Mengyu Wang, Eun Young Choi, Hui Wang, Thomas Peschel, Kerstin Wirkner, Toralf Kirsten, Joachim Thiery, Christoph Engel, Markus Loeffler, Tobias Elze

Date Published: 3rd Aug 2020

Publication Type: Journal article

Abstract (Expand)

The successful determination and analysis of phenotypes plays a key role in the diagnostic process, the evaluation of risk factors and the recruitment of participants for clinical and epidemiological studies. The development of computable phenotype algorithms to solve these tasks is a challenging problem, caused by various reasons. Firstly, the term ‘phenotype’ has no generally agreed definition and its meaning depends on context. Secondly, the phenotypes are most commonly specified as non-computable descriptive documents. Recent attempts have shown that ontologies are a suitable way to handle phenotypes and that they can support clinical research and decision making. The SMITH Consortium is dedicated to rapidly establish an integrative medical informatics framework to provide physicians with the best available data and knowledge and enable innovative use of healthcare data for research and treatment optimization. In the context of a methodological use case “phenotype pipeline” (PheP), a technology to automatically generate phenotype classifications and annotations based on electronic health records (EHR) is developed. A large series of phenotype algorithms will be implemented. This implies that for each algorithm a classification scheme and its input variables have to be defined. Furthermore, a phenotype engine is required to evaluate and execute developed algorithms. In this article we present a Core Ontology of Phenotypes (COP) and a software Phenotype Manager (PhenoMan), which implements a novel ontology-based method to model and calculate phenotypes. Our solution includes an enhanced iterative reasoning process combining classification tasks with mathematical calculations at runtime. The ontology as well as the reasoning method were successfully evaluated based on different phenotypes (including SOFA score, socioeconomic status, body surface area and WHO BMI classification) and several data sets.

Authors: Alexandr Uciteli, Christoph Beger, Toralf Kirsten, Frank A. Meineke, Heinrich Herre

Date Published: 20th Dec 2019

Publication Type: InProceedings

Abstract (Expand)

PURPOSE: To investigate the role of sex on retinal nerve fiber layer (RNFL) thickness at 768 circumpapillary locations based on OCT findings. DESIGN: Population-based cross-sectional study. PARTICIPANTS: We investigated 5646 eyes of 5646 healthy participants from the Leipzig Research Centre for Civilization Diseases (LIFE)-Adult Study of a predominantly white population. METHODS: All participants underwent standardized systemic assessments and ocular imaging. Circumpapillary RNFL (cRNFL) thickness was measured at 768 points equidistant from the optic nerve head using spectral-domain OCT (Spectralis; Heidelberg Engineering, Heidelberg, Germany). To control ocular magnification effects, the true scanning radius was estimated by scanning focus. Student t test was used to evaluate sex differences in cRNFL thickness globally and at each of the 768 locations. Multivariable linear regression and analysis of variance were used to evaluate individual contributions of various factors to cRNFL thickness variance. MAIN OUTCOME MEASURES: Difference in cRNFL thickness between males and females. RESULTS: Our population consisted of 54.8% females. The global cRNFL thickness was 1 mum thicker in females (P < 0.001). However, detailed analysis at each of the 768 locations revealed substantial location specificity of the sex effects, with RNFL thickness difference ranging from -9.98 to +8.00 mum. Females showed significantly thicker RNFLs in the temporal, superotemporal, nasal, inferonasal, and inferotemporal regions (43.6% of 768 locations), whereas males showed significantly thicker RNFLs in the superior region (13.2%). The results were similar after adjusting for age, body height, and scanning radius. The superotemporal and inferotemporal RNFL peaks shifted temporally in females by 2.4 degrees and 1.9 degrees , respectively. On regions with significant sex effects, sex explained more RNFL thickness variance than age, whereas the major peak locations and interpeak angle explained most of the RNFL thickness variance unexplained by sex. CONCLUSIONS: Substantial sex effects on cRNFL thickness were found at 56.8% of all 768 circumpapillary locations, with specific patterns for different sectors. Over large regions, sex was at least as important in explaining the cRNFL thickness variance as was age, which is well established to have a substantial impact on cRNFL thickness. Including sex in the cRNFL thickness norm could therefore improve glaucoma diagnosis and monitoring.

Authors: D. Li, F. G. Rauscher, E. Y. Choi, M. Wang, N. Baniasadi, K. Wirkner, T. Kirsten, J. Thiery, C. Engel, M. Loeffler, T. Elze

Date Published: 17th Nov 2019

Publication Type: Journal article

Abstract (Expand)

Die Notwendigkeit des Managements von Forschungsdaten ist von der Forschungscommunity erkannt – Sponsoren, Gesetzgeber, Verlage erwarten und fördern die Einhaltung der guten wissenschaftlichen Praxis, was nicht nur die Archivierung umfasst, sondern auch die Verfügbarkeit von Forschungsdaten- und ergebnissen im Sinne der FAIR-Prinzipien. Der Leipzig Health Atlas (LHA) ist ein Projekt zur Präsentation und zum Austausch eines breiten Spektrums von Publikationen, (bio) medizinischen Daten (z.B. klinisch, epidemiologisch, molekular), Modellen und Tools z.B. zur Risikoberechnung in der Gesundheitsforschung. Die Verbundpartner decken hierbei einen breiten Bereich wissenschaftlicher Disziplinen ab, beginnend von medizinischer Systembiologie über klinische und epidemiologische Forschung bis zu ontologischer und dynamischer Modellierung. Derzeit sind 18 Forschungskonsortien beteiligt (u.a. zu den Domänen Lymphome, Gliome, Sepsis, Erblicher Darm- und Brustkrebs), die Daten aus klinischen Studien, Patientenkohorten, epidemiologischen Kohorten, teilweise mit umfangreichen molekularen und genetischen Profilen, sammeln. Die Modellierung umfasst algorithmische Phänotypklassifizierung, Risikovorhersage und Krankheitsdynamik. Wir konnten in einer ersten Entwicklungsphase zeigen, dass unsere webbasierte Plattform geeignet ist, um (1) Methoden zur Verfügung zu stellen, um individuelle Patientendaten aus Publikationen für eine Weiternutzung zugänglich zu machen, (2) algorithmische Werkzeuge zur Phänotypisierung und Risikoprofilerstellung zu präsentieren, (3) Werkzeuge zur Durchführung dynamischer Krankheits- und Therapiemodelle interaktiv verfügbar zu machen und (4) strukturierte Metadaten zu quantitativen und qualitativen Merkmalen bereit zu stellen. Die semantische Datenintegration liefert hierzu die Technologien (Ontologien und Datamining Werkzeuge) für die (semantische) Datenintegration und Wissensanreicherung. Darüber hinaus stellt sie Werkzeuge zur Verknüpfung eigener Daten, Analyseergebnisse, öffentlich zugänglicher Daten- und Metadaten-Repositorien sowie zur Verdichtung komplexer Daten zur Verfügung. Eine Arbeitsgruppe zur Applikationsentwicklung und –validierung entwickelt innovative paradigmatische Anwendungen für (1) die klinische Entscheidungsfindung für Krebsstudien, die genetische Beratung, für Risikovorhersagemodelle sowie Gewebe- und Krankheitsmodelle und (2) Anwendungen (sog. Apps), die sich auf die Charakterisierung neuer Phänotypen (z.B. ‚omics‘-Merkmale, Körpertypen, Referenzwerte) aus epidemiologischen Studien konzentrieren. Diese Anwendungen werden gemeinsam mit klinischen Experten, Genetikern, Systembiologen, Biometrikern und Bioinformatikern spezifiziert. Der LHA stellt Integrationstechnologie bereit und implementiert die Anwendungen für die User Communities unter Verwendung verschiedener Präsentationswerkzeuge bzw. Technologien (z.B. R-Shiny, i2b2, Kubernetes, SEEK). Dazu ist es erforderlich, die Daten und Metadaten vor dem Hochladen zu kuratieren, Erlaubnisse der Datenbesitzer einzuholen, die erforderlichen Datenschutzkriterien zu berücksichtigen und semantische Annotationen zu überprüfen. Zudem werden die zugelieferten Modellalgorithmen in einer qualitätsgesicherten Weise aufbereitet und, soweit anwendbar, online interaktiv zur Verfügung gestellt. Der LHA richtet sich insbesondere an die Zielgruppen Kliniker, Epidemiologen, Molekulargenetiker, Humangenetiker, Pathologen, Biostatistiker und Modellierer ist aber unter www.healthatlas.de öffentlich zugänglich – aus rechtlichen Gründen erfordert der Zugriff auf bestimmte Applikationen und Datensätze zusätzliche Autorisierung. Das Projekt wird über das BMBF Programm i:DSem (Integrative Datensemantik für die Systemmedizin, Förderkennzeichen 031L0026) gefördert.

Authors: F. A. Meineke, Sebastian Stäubert, Matthias Löbe, C. Beger, René Hänsel, A. Uciteli, H. Binder, T. Kirsten, M. Scholz, H. Herre, C. Engel, Markus Löffler

Date Published: 19th Sep 2019

Publication Type: Misc

Abstract (Expand)

Secondary use of electronic health record (EHR) data requires a detailed description of metadata, especially when data collection and data re-use are organizationally and technically far apart. This paper describes the concept of the SMITH consortium that includes conventions, processes, and tools for describing and managing metadata using common standards for semantic interoperability. It deals in particular with the chain of processing steps of data from existing information systems and provides an overview of the planned use of metadata, medical terminologies, and semantic services in the consortium.

Authors: M. Lobe, O. Beyan, S. Staubert, F. Meineke, D. Ammon, A. Winter, S. Decker, M. Loffler, T. Kirsten

Date Published: 21st Aug 2019

Publication Type: Journal article

Abstract

Not specified

Authors: Matthias Löbe, O. Beyan, Sebastian Stäubert, Frank A. Meineke, D. Ammon, Alfred Winter, S. Deckert, Markus Löffler, Toralf Kirsten

Date Published: 2019

Publication Type: InProceedings

Abstract (Expand)

3D-body scanning anthropometry is a suitable method for characterization of physiological development of children and adolescents, and for understanding onset and progression of disorders like overweight and obesity. Here we present a novel body typing approach to describe and to interpret longitudinal 3D-body scanning data of more than 800 children and adolescents measured in up to four follow-ups in intervals of 1 year, referring to an age range between 6 and 18 years. We analyzed transitions between body types assigned to lower-, normal- and overweight participants upon development of children and adolescents. We found a virtually parallel development of the body types with only a few transitions between them. Body types of children and adolescents tend to conserve their weight category. 3D body scanning anthropometry in combination with body typing constitutes a novel option to investigate onset and progression of obesity in children.

Authors: H. Loeffler-Wirth, M. Vogel, T. Kirsten, F. Glock, T. Poulain, A. Korner, M. Loeffler, W. Kiess, H. Binder

Date Published: 14th Sep 2018

Publication Type: Not specified

Human Diseases: obesity

Abstract (Expand)

INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, research institutions and IT companies. SMITH's goals are to establish Data Integration Centers (DICs) at each SMITH partner hospital and to implement use cases which demonstrate the usefulness of the approach. OBJECTIVES: To give insight into architectural design issues underlying SMITH data integration and to introduce the use cases to be implemented. GOVERNANCE AND POLICIES: SMITH implements a federated approach as well for its governance structure as for its information system architecture. SMITH has designed a generic concept for its data integration centers. They share identical services and functionalities to take best advantage of the interoperability architectures and of the data use and access process planned. The DICs provide access to the local hospitals' Electronic Medical Records (EMR). This is based on data trustee and privacy management services. DIC staff will curate and amend EMR data in the Health Data Storage. METHODOLOGY AND ARCHITECTURAL FRAMEWORK: To share medical and research data, SMITH's information system is based on communication and storage standards. We use the Reference Model of the Open Archival Information System and will consistently implement profiles of Integrating the Health Care Enterprise (IHE) and Health Level Seven (HL7) standards. Standard terminologies will be applied. The SMITH Market Place will be used for devising agreements on data access and distribution. 3LGM(2) for enterprise architecture modeling supports a consistent development process.The DIC reference architecture determines the services, applications and the standardsbased communication links needed for efficiently supporting the ingesting, data nourishing, trustee, privacy management and data transfer tasks of the SMITH DICs. The reference architecture is adopted at the local sites. Data sharing services and the market place enable interoperability. USE CASES: The methodological use case "Phenotype Pipeline" (PheP) constructs algorithms for annotations and analyses of patient-related phenotypes according to classification rules or statistical models based on structured data. Unstructured textual data will be subject to natural language processing to permit integration into the phenotyping algorithms. The clinical use case "Algorithmic Surveillance of ICU Patients" (ASIC) focusses on patients in Intensive Care Units (ICU) with the acute respiratory distress syndrome (ARDS). A model-based decision-support system will give advice for mechanical ventilation. The clinical use case HELP develops a "hospital-wide electronic medical record-based computerized decision support system to improve outcomes of patients with blood-stream infections" (HELP). ASIC and HELP use the PheP. The clinical benefit of the use cases ASIC and HELP will be demonstrated in a change of care clinical trial based on a step wedge design. DISCUSSION: SMITH's strength is the modular, reusable IT architecture based on interoperability standards, the integration of the hospitals' information management departments and the public-private partnership. The project aims at sustainability beyond the first 4-year funding period.

Authors: A. Winter, S. Staubert, D. Ammon, S. Aiche, O. Beyan, V. Bischoff, P. Daumke, S. Decker, G. Funkat, J. E. Gewehr, A. de Greiff, S. Haferkamp, U. Hahn, A. Henkel, T. Kirsten, T. Kloss, J. Lippert, M. Lobe, V. Lowitsch, O. Maassen, J. Maschmann, S. Meister, R. Mikolajczyk, M. Nuchter, M. W. Pletz, E. Rahm, M. Riedel, K. Saleh, A. Schuppert, S. Smers, A. Stollenwerk, S. Uhlig, T. Wendt, S. Zenker, W. Fleig, G. Marx, A. Scherag, M. Loffler

Date Published: 18th Jul 2018

Publication Type: Journal article

Abstract

Not specified

Authors: R. Karim Md, B-Ph. Nguyen, L. Zimmermann, Toralf Kirsten, Matthias Löbe, Frank A. Meineke, H. Stenzhorn, O. Kohlbacker, S. Decker, O. Beyan

Date Published: 2018

Publication Type: InProceedings

Abstract

Not specified

Authors: Alfred Winter, Sebastian Stäubert, Danny Ammon, Stephan Aiche, Oya Beyan, Verena Bischoff, Philipp Daumke, Stefan Decker, Gert Funkat, Jan Erik Gewehr, Armin de Greiff, Silke Haferkamp, Udo Hahn, Andreas Henkel, Toralf Kirsten, Thomas Klöss, Jörg Lippert, Matthias Löbe, Volker Lowitsch, Oliver Maassen, Jens Maschmann, Sven Meister, Rafael Mikolajczyk, Matthias Nüchter, Mathias W. Pletz, Erhard Rahm, Morris Riedel, Kutaiba Saleh, Andreas Schuppert, Stefan Smers, André Stollenwerk, Stefan Uhlig, Thomas Wendt, Sven Zenker, Wolfgang Fleig, Gernot Marx, André Scherag, Markus Löffler

Date Published: 2018

Publication Type: Journal article

Abstract (Expand)

Optical coherence tomography (OCT) manufacturers graphically present circumpapillary retinal nerve fiber layer thickness (cpRNFLT) together with normative limits to support clinicians in diagnosing ophthalmic diseases. The impact of age on cpRNFLT is typically implemented by linear models. cpRNFLT is strongly location-specific, whereas previously published norms are typically restricted to coarse sectors and based on small populations. Furthermore, OCT devices neglect impacts of lens or eye size on the diameter of the cpRNFLT scan circle so that the diameter substantially varies over different eyes. We investigate the impact of age and scan diameter reported by Spectralis spectral-domain OCT on cpRNFLT in 5646 subjects with healthy eyes. We provide cpRNFLT by age and diameter at 768 angular locations. Age/diameter were significantly related to cpRNFLT on 89%/92% of the circle, respectively (pointwise linear regression), and to shifts in cpRNFLT peak locations. For subjects from age 42.1 onward but not below, increasing age significantly decreased scan diameter (r=-0.28, p<0.001), which suggests that pathological cpRNFLT thinning over time may be underestimated in elderly compared to younger subjects, as scan diameter decrease correlated with cpRNFLT increase. Our detailed numerical results may help to generate various correction models to improve diagnosing and monitoring optic neuropathies.

Authors: M. Wang, T. Elze, D. Li, N. Baniasadi, K. Wirkner, T. Kirsten, J. Thiery, M. Loeffler, C. Engel, F. G. Rauscher

Date Published: 25th Dec 2017

Publication Type: Journal article

Abstract (Expand)

Three-dimensional (3D-) body scanning of children and adolescents allows the detailed study of physiological development in terms of anthropometrical alterations which potentially provide early onset markers for obesity. Here, we present a systematic analysis of body scanning data of 2,700 urban children and adolescents in the age range between 5 and 18 years with the special aim to stratify the participants into distinct body shape types and to describe their change upon development. In a first step, we extracted a set of eight representative meta-measures from the data. Each of them collects a related group of anthropometrical features and changes specifically upon aging. In a second step we defined seven body types by clustering the meta-measures of all participants. These body types describe the body shapes in terms of three weight (lower, normal and overweight) and three age (young, medium and older) categories. For younger children (age of 5-10 years) we found a common 'early childhood body shape' which splits into three weight-dependent types for older children, with one or two years delay for boys. Our study shows that the concept of body types provides a reliable option for the anthropometric characterization of developing and aging populations.

Authors: H. Loeffler-Wirth, M. Vogel, T. Kirsten, F. Glock, T. Poulain, A. Korner, M. Loeffler, W. Kiess, H. Binder

Date Published: 21st Oct 2017

Publication Type: Not specified

Human Diseases: obesity

Abstract (Expand)

BACKGROUND Conventional anthropometric measurements are time consuming and require well trained medical staff. To use three-dimensional whole body laser scanning in daily clinical work, validity, andd reliability have to be confirmed. METHODS We compared a whole body laser scanner with conventional anthropometry in a group of 473 children and adolescents from the Leipzig Research Centre for Civilization Diseases (LIFE-Child). Concordance correlation coefficients (CCC) were calculated separately for sex, weight, and age to assess validity. Overall CCC (OCCC) was used to analyze intraobserver reliability. RESULTS Body height and the circumferences of waist, hip, upper arm, and calf had an \textquotedblexcellent\textquotedbl (CCC ≥0.9); neck and thigh circumference, a \textquotedblgood\textquotedbl (CCC ≥0.7); and head circumference, a \textquotedbllow\textquotedbl (CCC \textless 0.5) degree of concordance over the complete study population. We observed dependencies of validity on sex, weight, and age. Intraobserver reliability of both techniques is \textquotedblexcellent\textquotedbl (OCCC ≥0.9). CONCLUSION Scanning is faster, requires less intensive staff training and provides more information. It can be used in an epidemiologic setting with children and adolescents but some measurements should be considered with caution due to reduced agreement with conventional anthropometry.

Authors: Fabian Glock, Mandy Vogel, Stephanie Naumann, Andreas Kuehnapfel, Markus Scholz, Andreas Hiemisch, Toralf Kirsten, Kristin Rieger, Antje Koerner, Markus Loeffler, Wieland Kiess

Date Published: 1st May 2017

Publication Type: Journal article

Abstract (Expand)

Clinical and epidemiological studies are commonly used in medical sciences. They typically collect data by using different input forms and information systems. Metadata describing input forms, database schemas and input systems are used for data integration but are typically distributed over different software tools; each uses portions of metadata, such as for loading (ETL), data presentation and analysis. In this paper, we describe an approach managing metadata centrally and consistently in a dedicated Metadata Repository (MDR). Metadata can be provided to different tools. Moreover, the MDR includes a matching component creating schema mappings as a prerequisite to integrate captured medical data. We describe the approach, the MDR infrastructure and provide algorithms for creating schema mappings. Finally, we show selected evaluation results. The MDR is fully operational and used to integrate data from a multitude of input forms and systems in the epidemiological study LIFE.

Authors: Toralf Kirsten, A. Kiel, M. Rühle, J.Wagner

Date Published: 2nd Mar 2017

Publication Type: Not specified

Abstract (Expand)

Prognostic relevant pathways of leukocyte involvement in human myocardial ischemic-reperfusion injury are largely unknown. We enrolled 136 patients with ST-elevation myocardial infarction (STEMI) after primary angioplasty within 12 h after onset of symptoms. Following reperfusion, whole blood was collected within a median time interval of 20 h (interquartile range: 15-25 h) for genome-wide gene expression analysis. Subsequent CMR scans were performed using a standard protocol to determine infarct size (IS), area at risk (AAR), myocardial salvage index (MSI) and the extent of late microvascular obstruction (lateMO). We found 398 genes associated with lateMO and two genes with IS. Neither AAR, nor MSI showed significant correlations with gene expression. Genes correlating with lateMO were strongly related to several canonical pathways, including positive regulation of T-cell activation (p = 3.44 x 10(-5)), and regulation of inflammatory response (p = 1.86 x 10(-3)). Network analysis of multiple gene expression alterations associated with larger lateMO identified the following functional consequences: facilitated utilisation and decreased concentration of free fatty acid, repressed cell differentiation, enhanced phagocyte movement, increased cell death, vascular disease and compensatory vasculogenesis. In conclusion, the extent of lateMO after acute, reperfused STEMI correlated with altered activation of multiple genes related to fatty acid utilisation, lymphocyte differentiation, phagocyte mobilisation, cell survival, and vascular dysfunction.

Authors: A. Teren, H. Kirsten, F. Beutner, M. Scholz, L. M. Holdt, D. Teupser, M. Gutberlet, J. Thiery, G. Schuler, I. Eitel

Date Published: 3rd Feb 2017

Publication Type: Journal article

Human Diseases: myocardial infarction

Abstract (Expand)

The LIFE Child study is a large population-based longitudinal childhood cohort study conducted in the city of Leipzig, Germany. As a part of LIFE, a research project conducted at the Leipzig Research Center for Civilization Diseases, it aims to monitor healthy child development from birth to adulthood and to understand the development of lifestyle diseases such as obesity. The study consists of three interrelated cohorts; the birth cohort, the health cohort, and the obesity cohort. Depending on age and cohort, the comprehensive study program comprises different medical, psychological, and sociodemographic assessments as well as the collection of biological samples. Optimal data acquisition, process management, and data analysis are guaranteed by a professional team of physicians, certified study assistants, quality managers, scientists and statisticians. Due to the high popularity of the study, more than 3000 children have already participated until the end of 2015, and two-thirds of them participate continuously. The large quantity of acquired data allows LIFE Child to gain profound knowledge on the development of children growing up in the twenty-first century. This article reports the number of available and analyzable data and demonstrates the high relevance and potential of the study.

Authors: T. Poulain, R. Baber, M. Vogel, D. Pietzner, T. Kirsten, A. Jurkutat, A. Hiemisch, A. Hilbert, J. Kratzsch, J. Thiery, M. Fuchs, C. Hirsch, F. G. Rauscher, M. Loeffler, A. Korner, M. Nuchter, W. Kiess

Date Published: 2nd Feb 2017

Publication Type: Journal article

Abstract (Expand)

Epidemiological studies analyze and monitor the health state of a population. They typically use different methods and techniques to capture and to integrate data of interest before they can be analyzed. As new technologies and, thus, devices are available for data capturing, such as wearables, new requirements arise for current data integration approaches. In this paper, we review current techniques and approaches as well as new trends in data capturing and the resulting requirement for its integration.

Authors: T. Kirsten, J. Bumberger, G. Ivanova, P. Dietrich, C. Engel, M. Loeffler, W. Kiess

Date Published: 2017

Publication Type: InBook

Abstract

Not specified

Authors: U.Sax, C. Bauer, T. Ganslandt, T. Kirsten

Date Published: 2016

Publication Type: Not specified

Abstract (Expand)

BACKGROUND: The LIFE-Adult-Study is a population-based cohort study, which has recently completed the baseline examination of 10,000 randomly selected participants from Leipzig, a major city with 550,000 inhabitants in the east of Germany. It is the first study of this kind and size in an urban population in the eastern part of Germany. The study is conducted by the Leipzig Research Centre for Civilization Diseases (LIFE). Our objective is to investigate prevalences, early onset markers, genetic predispositions, and the role of lifestyle factors of major civilization diseases, with primary focus on metabolic and vascular diseases, heart function, cognitive impairment, brain function, depression, sleep disorders and vigilance dysregulation, retinal and optic nerve degeneration, and allergies. METHODS/DESIGN: The study covers a main age range from 40-79 years with particular deep phenotyping in elderly participants above the age of 60. The baseline examination was conducted from August 2011 to November 2014. All participants underwent an extensive core assessment programme (5-6 h) including structured interviews, questionnaires, physical examinations, and biospecimen collection. Participants over 60 underwent two additional assessment programmes (3-4 h each) on two separate visits including deeper cognitive testing, brain magnetic resonance imaging, diagnostic interviews for depression, and electroencephalography. DISCUSSION: The participation rate was 33 %. The assessment programme was accepted well and completely passed by almost all participants. Biomarker analyses have already been performed in all participants. Genotype, transcriptome and metabolome analyses have been conducted in subgroups. The first follow-up examination will commence in 2016.

Authors: M. Loeffler, C. Engel, P. Ahnert, D. Alfermann, K. Arelin, R. Baber, F. Beutner, H. Binder, E. Brahler, R. Burkhardt, U. Ceglarek, C. Enzenbach, M. Fuchs, H. Glaesmer, F. Girlich, A. Hagendorff, M. Hantzsch, U. Hegerl, S. Henger, T. Hensch, A. Hinz, V. Holzendorf, D. Husser, A. Kersting, A. Kiel, T. Kirsten, J. Kratzsch, K. Krohn, T. Luck, S. Melzer, J. Netto, M. Nuchter, M. Raschpichler, F. G. Rauscher, S. G. Riedel-Heller, C. Sander, M. Scholz, P. Schonknecht, M. L. Schroeter, J. C. Simon, R. Speer, J. Staker, R. Stein, Y. Stobel-Richter, M. Stumvoll, A. Tarnok, A. Teren, D. Teupser, F. S. Then, A. Tonjes, R. Treudler, A. Villringer, A. Weissgerber, P. Wiedemann, S. Zachariae, K. Wirkner, J. Thiery

Date Published: 22nd Jul 2015

Publication Type: Not specified

Human Diseases: disease of mental health, mental depression, vascular disease, allergic hypersensitivity disease, sleep disorder, retinal degeneration

Abstract (Expand)

LIFE is an epidemiological study determining thousands of Leipzig inhabitants with a wide spectrum of interviews, questionnaires, and medical investigations. The heterogeneous data are centrally integrated into a research database and are analyzed by specific analysis projects. To semantically describe the large set of data, we have developed an ontological framework. Applicants of analysis projects and other interested people can use the LIFE Investigation Ontology (LIO) as central part of the framework to get insights, which kind of data is collected in LIFE. Moreover, we use the framework to generate queries over the collected scientific data in order to retrieve data as requested by each analysis project. A query generator transforms the ontological specifications using LIO to database queries which are implemented as project-specific database views. Since the requested data is typically complex, a manual query specification would be very timeconsuming, error-prone, and is, therefore, unsuitable in this large project. We present the approach, overview LIO and show query formulation and transformation. Our approach runs in production mode for two years in LIFE.

Authors: Toralf Kirsten, A. Uciteli

Date Published: 2015

Publication Type: Not specified

Abstract (Expand)

Life Child is an epidemiological study running at the LIFE Research Center for Civilization Diseas-es (University of Leipzig) since 2011. It aims at monitoring the development in children and adolescents by examining thousands of children in and around Leipzig. Of particular interest in this study are motor skills and physical activities of children between 6 and 18 years. There are multiple examinations including interviews, self-completed questionnaires and physical examinations (e.g., sport tests) to generate data describing the determined child as well her lifestyle and environment. The goal is to find causes for low to non physical activity and unincisive motor skills and capabilities since they are commonly attended with diseases, such as obesity and diabetes. As a first step in this direction, we analyzed data of specific sport tests, such as pushups, side steps and long jumps, according to the body mass index (BMI) of participants. We found that participants with high BMI achieve a similar number of pushups in early years like the normal BMI group, while in later years the pushup number of participants with normal BMI exceeds the pushup number of high BMI group. Surprisingly, the number of side steps is indifferent over age categories (6-18, yearly) between both groups. Conversely, the normal BMI group achieve higher distances through-out all age categories than the high BMI group. In future, we will associate these results with socio-economic and lifestyle indicators, e.g., interest in sport and physical activities of child and parents.

Authors: J. Lang, C. Warnatsch, M. Vogel, Toralf Kirsten, W. Kiess

Date Published: 17th Dec 2014

Publication Type: Not specified

Abstract (Expand)

Soziodemographische Merkmale gelten in den Humanwissenschaften als globale Einflussfaktoren in fast allen Forschungsgebieten und bilden das Basiselement von Kohortenstudien. Darüber hinaus wurde in vielen Bereichen ein direkter Zusammenhang zwischen sozialen Faktoren und Gesundheit nachgewiesen. Insbesondere im Kindes- und Jugendalter nimmt der sozioökonomische Status einer Familie entscheidenden Einfluss auf die physische sowie mentale Entwicklung. Derartige und weitere Forschungsfragen stehen im Blickpunkt des Projektes LIFE Child.

Authors: L. Meißner, C. Bucher, M. Vogel, Toralf Kirsten, S. Nerlich, U. Igel, W. Kiess, A. Hiemisch

Date Published: 17th Dec 2014

Publication Type: Not specified

Abstract (Expand)

LIFE Child is an epidemiological cohort study at the Leipzig Research Center for Civilization Dis-eases (University of Leipzig). A main goal of LIFE Child is to study the influence of environment and lifestyle factors to the development of children and adolescent in and near Leipzig. In particu-lar, we search for predominant aspects in the development of children with obesity. Typically, data is analyzed by different statistical methods and approaches to find (perhaps multi-variate) pre-dominant markers. Additionally, we map selected data to geographical maps to study their spatial distribution over urban districts of Leipzig, on the one hand. This allows to compara-tively analyze anthropometric measurements, such as age- and gender-corrected height, weight, and body mass index, together with further participant-related data including social indicators, e.g., in-come, education, socio economic indexes and lifestyle data, to distinguish city districts with a high correlation to those with low or no correlation. On the other hand, we associate anthropometric measurements with publicly available data, such as official statistics including district-specific un-employment rates and inhabitant densities by taking the participant's place of living into account. We developed a spatial analysis pipeline of anthropometric and lifestyle data according to Leipzig city districts. While cohort and publicly available data is managed by a database system, the analysis pipeline is implemented by dedicated R scripts. The sample is with more than 2,500 children large enough for first analyses. … Our first results show that unemployment of parents could be a factor for obesity of children especially in districts with low social index.

Authors: M. Vogel, A. Kiel, M. Rühle, Toralf Kirsten, M. Geserick, R. Gausche, G. Grande, D. Molis, U. Igel, S. Alvanides, W. Kiess

Date Published: 1st Nov 2014

Publication Type: Not specified

Abstract (Expand)

Introduction LIFE child as a part of the 'Leipzig Research Centre for Civilization Diseases' is a longitudinal cohort study aiming, inter alia, at monitoring normal development in children and adolescents from fetal life to adulthood. As an important part of the study, anthropometric dimensions are measured via classic methods, e.g. stadiometer or tape measure (ca. 15 items), but also via 3D body scanner technology (ca. 150 items). Because of missing standards data quality control and analysis of the latter one is a particular challenge. Methods We address the problem of absent reference values by using the data itself as a reference sample. Applying the LMS-method using the VGAM/GAMLSS packages [XXX] on a reference sample which is large enough results in age and gender corrected standard deviation scores (SDS) respectively percentile curves. A combination of variable clustering and clustering of values using these SDS is applied to the detect groups of dependend variables and peculiar cases respectively. Results In LIFE child the current reference sample consists of around 4000 scans of 1700 children. The age dependend l, m, and s values for each item are generated by dedicated R-routines and stored in a relational database system. The transformation algorithm by Cole is implemented as database function and dynamically applied on all associated raw data. Conspiciuous values can be detected using the SDS itself or the SDS in comparison with the belonging variable cluster and/or taking into account the follow-up data of the respective participant. These values can be reported and visualized using automated routines.

Authors: M. Vogel, A.L. Fischer, C. Bucher, W. Kiess, Toralf Kirsten

Date Published: 1st Nov 2014

Publication Type: Not specified

Abstract (Expand)

Introduction LIFE is a large epidemiological study aiming at causes of common civilization diseases including adiposity, dementia, and depression. Participants of the study are probands and patients. Probands are randomly selected and invited from the set of Leipzig (Germany) inhabitants while patients with known diseases are recruited from several local hospitals. The management of these participants, their invitation and contact after successful attendance as well as the support of nearly all ambulance processes requires a complex ambulance management. Each participant is examined by a set of investigation instruments including interviews, questionnaires, device-specific investigations, specimen extrac- tions and analyses. This necessitates a complex management of the participantspecific examination program but also specific input forms and systems allowing to capture administrative (measurement and process environment or specific set-ups) and scientific data. Additionally, the taken and prepared specimens need to be labeled and registered from which participant they stem and in which fridge or bio-tank they are stored. At the end, all captured data from ambu- lance management, investigation instruments and laboratory analyses need to be integrated before they can be analyzed. These complex processes and requirements necessitate a comprehensive IT-infrastructure. Methods Our IT-infrastructure modularly consists of several software applications. A main application is responsible for the complex participant and ambulance man- agement. The participant management cope with selected participant data and contact information. To protect participant’s privacy, a participant identifier (PID) is created for each participant that is associated to all data which is managed and captured in the following. In ambulance management, each participant is associated with a predefined investigation program. This investigation program is represented in our systems by a tracking card that is available as print-out and electronically. The electronic version of tracking cards is utilized by two software applications, the Assessment Battery and the CryoLab. The former system coordinates the input of scientific data into online input forms. The input forms are designed in the open source system LimeSurvey. Moreover, the Assessment Battery is used to monitor the input process, i.e., it shows which investigations are already completed and which of them are still to do. The Cryolab system registers and tracks all taken specimens and is used to annotate extraction and specific preparation processes, e.g., for DNA isolation. Moreover, it tracks specimen storage in fridges and bio-tanks. A central component is the metadata repository collecting metadata from ambulance management and data input systems. It is the base for the integra- tion of relevant scientific data into a central research database. The integration follows a mapping-based approach. The research database makes raw data and special pre-computations called derivatives available for later data analysis. Results & Discussion We designed and implemented a complex and comprehensive IT-infrastructure for the epidemiological research in LIFE. This infrastructure consists of several software applications which are loosely coupled over specified interfaces. Most of the software applications are new implementations; only for capturing scientific data external software application are applied.

Authors: Toralf Kirsten, A. Kiel, M. Kleinert, R. Speer, M. Rühle, Hans Binder, Markus Löffler

Date Published: 30th Sep 2013

Publication Type: Not specified

Abstract (Expand)

Introduction LIFE is a epidemiological study aiming at discovering causes of common disorders as well as therapy and diagnostic possibilities. It applies a huge set (currently more then 400) of complex instruments including different kinds of interviews, questionnaires, and technically founded investigations on thousands of Leipzig inhabitants. Correlations in data, e.g., between diseases on the one hand and a combination of life conditions on the other requires high quality data. Data errors affect this data quality. However, avoiding every error is nearly impossible. Therefore, the captured data routinely needs to be validated and revised (curated) in case of error. Methods From the data-perspective, we differentiate between two main types of data errors, syntax or format errors and semantic errors. Syntax errors mostly occur when the data needs to be converted to change its data type, e.g., from text to number or from text to date/time fields. This is often the case when data is captured as text by the data input system but should be centrally managed and analyzed in a different format. Hence, the data conversion is only successful when the input data contains the data in the right format. Data conversion is applied when the data is transfered from data input systems to the central research database collecting all captured data in an integrated and harmonized form. Corrupted data that cannot be converted to the target data type is replaced by a missing value (also called null value, nil etc.). The definition of a default value is not sufficient since the default usually depends on the corresponding question or measurement input field and can strain analysis results when they are not concerned. Moreover, the definition process for every question/input field would be to time-consuming. Semantic errors are much harder to detect than syntax errors. Typically, they are semantically implausible outliers or are part of other artefacts, e.g.,when data of two input fields is mixed up. Currently, we let the detection of semantic errors to a epidemiological quality analysis that is performed by several statisticians. Conversely, syntax errors can be easily technically detected; they are logged when they occur in the process of transferring data from data input systems to the central research database. With respect to both types of errors, syntax and semantic errors, we designed and implemented a software application called Curation-DB allowing to curate (adapt and change) data. In particular, the system lists the logged syntax errors occurring during the data conversion step daily at night. A user can adapt the current input value by specifying a new (target) value for a listed syntax problem. With this specification, the corresponding input value is replaced by the specified value before the next conversion step is started. This specification process can be iteratively applied for a corresponding input value when the syntax problem is not solved by the current specification. The semantic errors need to be first detected separately. Then, a user can specify value changes replacing an existing value with the new specified one. Results & Discussion The Curation-DB application is already in use. Currently, selected quality managers routinely check the listed syntax errors. There are currently more than 2000 of such errors curated. In near future, we will extend this software to manage rules validating research data semantically to automatically detect obvious semantic errors.

Authors: J. Wagner, A. Kiel, Toralf Kirsten

Date Published: 30th Sep 2013

Publication Type: Not specified

Powered by
(v.1.13.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig

By continuing to use this site you agree to the use of cookies