Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism

Tancredi Covioli, Tommaso Dolci, Fabio Azzalini, Davide Piantella, Enrico Barbierato*, Marco Gribaudo

*Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in rivistaArticolo in rivista

Abstract

The development of technologies such as cloud computing, IoT, and social networks caused the amount of data generated daily to grow at an incredible rate, giving birth to the trend of Big Data. Big data has emerged in the healthcare field, thanks to the introduction of new tools producing massive amounts of structured and unstructured data. For this reason, medical institutions are moving towards a data-based healthcare, with the goal of leveraging this data to support clinical decision-making through suitable information systems. This comes with the need to evaluate their performance. One of the techniques commonly used is modeling, which consists in performing an evaluation of a model of the system under analysis, without actually implementing it. However, to make an adequate performance assessment of Big Data systems, we need a diversity of volumes and speeds that, due to the sensitivity of data concerning healthcare, is not available. While in other fields this problem is usually solved through the use of synthetic data generators, in healthcare these are few and not specialized in performance evaluation. Therefore, this work focuses on the creation of a synthetic data generator for evaluating the performance of a Big Data system model for healthcare. The dataset used as a reference for creating the generator is MIMIC-III, which contains the digital health records of thousands of patients collected over a time span of multiple years. First, we perform an analysis of the dataset, adopting multiple distribution fitting techniques (e.g., phase-type fitting) to model the temporal distribution of the data. Then, we develop a generator structured as a multi-module library to allow the customization of each component, specifically we propose a multiformalism model to reproduce the patient behavior inside the hospital. Finally, we test the generator by evaluating the performance in different scenarios. Through these experiments, we show the granular control that the generator offers over the synthetic data produced, and the simplicity with which it can be adapted to different uses.
Lingua originaleEnglish
pagine (da-a)279-293
Numero di pagine15
RivistaLecture Notes in Computer Science
Volume14231
DOI
Stato di pubblicazionePubblicato - 2023

Keywords

  • Big Data
  • synthetic data generation
  • performance evaluation
  • healthcare data

Fingerprint

Entra nei temi di ricerca di 'Workflow Characterization of a Big Data System Model for Healthcare Through Multiformalism'. Insieme formano una fingerprint unica.

Cita questo