Genome-Centric Multimodal Data Integration in Personalised Cardiovascular Medicine.

The NextGen Tools: Genomic data curation

Genomic data curation is a particularly important component of the NextGen toolbox. Data curation is a process to ensure accuracy, consistency, and usability of genomic data by addressing structural and content-related issues – such as missing, invalid, or inconsistent entries.

Manual curation processes are very time-consuming and difficult to scale because of the size and scale of large-scale genetic association studies that can include millions of variants.The NextGen genomic data curation tool aims to address these complexities developing and deploying a flexible, AI-guided framework and pipelines, which will improve the accuracy, completeness, and usability of genomic information for the establishment of gene-disease relationships. The tool will also support the integration of relevant genomic data into multimodal datasets, a step that can enhance both algorithmic precision and predictive performance in cardiovascular disease.It has the purpose to replace manual, non-scalable methods with automated, human-centred pipelines that integrate quality control and enable efficient downstream analysis across distributed datasets, on which NextGen will base its dataspace for cardiovascular disease.

The methodologies developed for genomic data curation in cardiovascular research can be adapted to other domains, such as oncology and neurology.