Pubblicato in: Medicina e Biologia, Persona Umana, Scienza & Tecnica

Archivio delle mutazioni del genoma umano in 141,456 soggetti. – Nature.

Giuseppe Sandro Mela.

2020-05-30.

2020-05-29__Nature geetic variation 013

«E’ pronto il più grande catalogo delle varianti genetiche umane: basato sull’analisi del Dna di oltre 140.000 persone di tutto il mondo, è come una Stele di Rosetta che aiuterà a interpretare il genoma per scoprire la funzione dei geni, identificando quelli responsabili di malattie che possono essere colpiti con farmaci. Il risultato è pubblicato in sette studi sulle riviste Nature, Nature Communications e Nature Medicine dal consorzio di ricerca internazionale gnomAD (Genome Aggregation Database), guidato dal Broad Institute e dal Massachusetts General Hospital. Anche l’Italia partecipa con il cardiologo Diego Ardissino dell’Azienda Ospedaliero-Universitaria di Parma.

Il catalogo, frutto di otto anni di lavoro, contiene oltre 443.000 varianti che determinano la perdita di funzione del gene e quindi impediscono la produzione della forma corretta della proteina corrispondente. I ricercatori guidati da Konrad Karczewski hanno provato a stabilire se le varianti potessero avere effetti sulla salute, arrivando così a identificare geni particolarmente sensibili che potrebbero essere legati a gravi condizioni come le disabilità intellettive.

Lo studio principale è accompagnato da altri due che arricchiscono il catalogo con ben 433.000 varianti genetiche strutturali, ovvero delezioni, duplicazioni o inversioni d’orientamento dei geni, che sono tra i principali ‘motori’ dell’evoluzione umana oltre che delle malattie.

Le altre ricerche pubblicate dal consorzio mostrano invece come le varianti genetiche che danno perdita di funzione possano essere utilizzate per diagnosticare malattie e per riconoscere nuovi target genetici da colpire con i farmaci. E’ il caso del gene Lrrk2 associato al Parkinson: studiando le sue varianti, i ricercatori hanno capito che il gene può essere colpito con farmaci che ne riducono l’attività senza causare gravi effetti collaterali.» [Fonte]

*

«In this paper and accompanying publications, we present the largest, to our knowledge, catalogue of harmonized variant data from any species so far, incorporating exome or genome sequence data from more than 140,000 humans. The gnomAD dataset of over 270 million variants is publicly available»

«Although the gnomAD dataset is of unprecedented scale, it has important limitations. At this sample size, we remain far from saturating all possible pLoF variants in the human exome»

«Examples such as PCSK9 demonstrate the value of human pLoF variants for identifying and validating targets for therapeutic intervention across a wide range of human diseases»

*

«Each dataset, totalling more than 1.3 and 1.6 petabytes of raw sequencing data»

Peta è il prefisso che indica 1015, ossia un milione di miliardi.

Dire che questo lavoro sia mastodontico sarebbe persino restrittivo.

Decisamente la genetica è una scienza assai complessa.

*


Konrad J. Karczewski, Laurent C. Francioli, […] Daniel G. MacArthur.

The mutational constraint spectrum quantified from variation in 141,456 humans.

Nature volume 581, pages 434–443 (2020).

«Abstract.

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.»

«The physiological function of most genes in the human genome remains unknown. In biology, as in many engineering and scientific fields, breaking the individual components of a complex system can provide valuable insight into the structure and behaviour of that system. For the discovery of gene function, a common approach is to introduce disruptive mutations into genes and determine their effects on cellular and physiological phenotypes in mutant organisms or cell lines»

«However, recent exome and genome sequencing projects have revealed a surprisingly high burden of natural pLoF variation in the human population, including stop-gained, essential splice, and frameshift variants, which can serve as natural models for inactivation of human genes»

«Here, we describe the detection of pLoF variants in a cohort of 125,748 individuals with whole-exome sequence data and 15,708 individuals with whole-genome sequence data, as part of the Genome Aggregation Database (gnomAD; https://gnomad.broadinstitute.org), the successor to the Exome Aggregation Consortium (ExAC)»

«We aggregated whole-exome sequencing data from 199,558 individuals and whole-genome sequencing data from 20,314 individuals. These data were obtained primarily from case–control studies of common adult-onset diseases, including cardiovascular disease, type 2 diabetes and psychiatric disorders. Each dataset, totalling more than 1.3 and 1.6 petabytes of raw sequencing data, respectively, was uniformly processed, joint variant calling was performed on each dataset using a standardized BWA-Picard-GATK pipeline»

«Among these individuals, we discovered 17.2 million and 261.9 million variants in the exome and genome datasets, respectively; these variants were filtered using a custom random forest process (Supplementary Information) to 14.9 million and 229.9 million high-quality variants»

«Some LoF variants will result in embryonic lethality in humans in a heterozygous state, whereas others are benign even at homozygosity, with a wide spectrum of effects in between»

«we developed the loss-of-function transcript effect estimator (LOFTEE) package, which applies stringent filtering criteria from first principles (such as removing terminal truncation variants, as well as rescued splice variants, that are predicted to escape nonsense-mediated decay) to pLoF variants annotated by the variant effect predictor»

«Applying LOFTEE v1.0, we discover 443,769 high-confidence pLoF variants, of which 413,097 fall on the canonical transcripts of 16,694 genes. The number of pLoF variants per individual is consistent with previous reports»

«The LOEUF metric can be applied to improve molecular diagnosis and advance our understanding of disease mechanisms. Disease-associated genes, discovered by different technologies over the course of many years across all categories of inheritance and effects, span the entire spectrum of LoF tolerance»

«In an independent cohort of 5,305 individuals with intellectual disability or developmental disorders and 2,179 controls, the rate of pLoF de novo variation in cases is 15-fold higher in genes belonging to the most constrained LOEUF decile, compared with controls»

«Schizophrenia and educational attainment are the most enriched traits (Fig. 5c), consistent with previous observations in associations between rare pLoF variants and these phenotypes»

«In this paper and accompanying publications, we present the largest, to our knowledge, catalogue of harmonized variant data from any species so far, incorporating exome or genome sequence data from more than 140,000 humans. The gnomAD dataset of over 270 million variants is publicly available»

«Although the gnomAD dataset is of unprecedented scale, it has important limitations. At this sample size, we remain far from saturating all possible pLoF variants in the human exome»

«Examples such as PCSK9 demonstrate the value of human pLoF variants for identifying and validating targets for therapeutic intervention across a wide range of human diseases»