microGWAS: a computational pipeline to perform large-scale bacterial genome-wide association studies
Burgaya J, Damaris B, Fiebig J, Galardini M
Published in
Microbial Genomics: Volume 11, Issue 2, Page 10.1099/mgen.0.001349
Abstract
Identifying genetic variants associated with bacterial phenotypes, such as virulence, host preference and antimicrobial resistance, has great potential for a better understanding of the mechanisms involved in these traits. The availability of large collections of bacterial genomes has made genome-wide association studies (GWAS) a common approach for this purpose. The need to employ multiple software tools for data pre- and postprocessing limits the application of these methods by experienced bioinformaticians. To address this issue, we have developed a pipeline to perform bacterial GWAS from a set of assemblies and annotations, with multiple phenotypes as targets. The associations are run using five sets of genetic variants: unitigs, gene presence/absence, rare variants (i.e. gene burden test), gene-cluster-specific k-mers and all unitigs jointly. All variants passing the association threshold are further annotated to identify overrepresented biological processes and pathways. The results can be further augmented by generating a phylogenetic tree and predicting the presence of antimicrobial resistance and virulence-associated genes. We tested the microGWAS pipeline on a previously reported dataset on Escherichia coli virulence, successfully identifying the causal variants and providing further interpretation of the association results. The microGWAS pipeline integrates state-of-the-art tools to perform bacterial GWAS into a single, user-friendly and reproducible pipeline, allowing for the democratization of these analyses. The pipeline, together with its documentation, can be accessed at https://github.com/microbial-pangenomes-lab/microGWAS.
Open in PubMed