Skip to main content

Table 1 The Healthy Obese Project data harmonization and database federation step-by-step process

From: Data harmonization and federated analysis of population-based studies: the BioSHaRE project

Step Description
Study recruitment and documentation Studies are recruited to participate in the HOP and their key characteristics (e.g. design, sampling frame) are catalogued on the BioSHaRE website (
Harmonized variable selection and definition A set of ‘target’ variables required to answer obesity-related research questions is identified at workshops bringing together BioSHaRE investigators.
Study variable identification and harmonization potential assessment By analysing participating studies’ questionnaires, standard operating procedures, and data dictionaries, the potential for each study to generate this set of target variables is determined. Study-specific variables required to generate target variables are identified.
Data processing Secure servers are set-up in each study’s host institution and the subsets of data required to generate target variables are loaded onto each of these servers. Processing algorithms transforming study data into the target (i.e. harmonized) format are developed and implemented for each study whenever harmonization is deemed possible.
Harmonized data federation, dissemination and analysis A password protected web portal federates the servers found in the different study host institutions across Europe and allows remote retrieval of data summaries, descriptive statistics (frequencies, min, max, mean, standard deviation), and contingency tables. For more complex federated data analyses (e.g. linear regressions), the DataSHIELD method [28] is employed in the R software environment [36].