Genome-Centric Multimodal Data Integration in Personalised Cardiovascular Medicine.

Federated genomics: data stays local, insight travels far and wide

Federated genomics is a distributed approach for managing and analysing human genomic data without consolidating it in a single location. Large-scale discovery needs cohorts spanning institutions and countries, yet genomes and linked clinical data are sensitive, governed by privacy rules, and tied to national or organisational ownership. A federated model supports cross-border collaboration while keeping data under local control, aligning practice with GDPR and related governance constraints.

At the technical layer, federated learning trains models across multiple sites by exchanging model updates rather than raw genomes. This enables joint risk prediction and other genomic analytics across hospitals and biobanks while keeping data on site. Strong deployments add privacy-by-design controls, including secure aggregation and quantitative limits on leakage. For fixed statistical tasks, secure multi-party computation provides a cryptographic route: multiple parties jointly compute genome-wide association statistics while each party reveals only what the result implies. Homomorphic encryption extends confidentiality by enabling computation on encrypted genomes, with a performance cost that teams manage through careful workload design. Trusted execution environments and secure enclaves complement these methods by running approved code close to the data inside protected execution zones.

In Europe, federated genomics increasingly depends on shared infrastructure. ELIXIR’s Federated European Genome–phenome Archive (Federated EGA, or FEGA) shows the model in production: national nodes store sensitive datasets under national jurisdiction, and shared metadata enables transnational discovery. Researchers search through a portal, while access decisions remain local. Interoperability is achieved through standard protocols such as the GA4GH Beacon API for federated discovery, ELIXIR AAI and GA4GH Passports for cross-node identity and permissions, and Crypt4GH for secure exchange into authorised environments.

Standards provide the connective tissue. The Global Alliance for Genomics and Health (GA4GH) develops APIs, data models, and policy guidance that allow platforms to interoperate responsibly. DUO makes consent and data-use terms machine-readable across systems; htsget and refget support remote retrieval of reads, variants, and reference sequences; Phenopackets encode phenotype and clinical context so multi-site data remains semantically comparable. When sites adopt these standards, analyses and workflows become portable across jurisdictions and platforms.

These building blocks align with the EU’s 1+ Million Genomes vision and the European Genomic Data Infrastructure (GDI), designed to enable federated and secure access to a “virtual cohort” across member states. EOSC complements this by federating compute and virtual research environments close to where data sits. Within this landscape, NextGen aims to extend secure federated analytics to genomic computation and to validate tools through real-world pilots in a multi-site Pathfinder network. It couples federated analytics with multimodal data integration, catalogue-based discovery, and governance processes aligned with the European Health Data Space and 1+MG, supporting auditable, accountable access across jurisdictions.