2023

Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers

Sommer H, Djamalova D, Galardini M

Erschienen in

Microbial genomics: Volume 9, Issue 11, Page 001129

Abstract

The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using k-mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of k-mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each k-mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at https://github.com/microbial-pangenomes-lab/panfeed.

In PubMed öffnen

Diese Publikation zitieren

DOI: 10.1099/mgen.0.001129

Autoren

Hannes Sommer

Hannes Sommer

Doktorand (ZIB Programm)

Dilfuza Djamalova

Dilfuza Djamalova

Doktorandin

Marco Galardini

Prof. Dr. Marco Galardini

Forschungsgruppenleiter