Pangenome wide prediction of gene function
About this project
The advent of high-throughput sequencing has resulted in the possibility to obtain the genome sequence of hundreds of bacterial isolates with limited cost. We now know that for species such as E. coli individual strains differ for up to 60% in their gene content. Those genes with low conservation, also called accessory genes, are known to contribute to survival in specialized niches; even a broad functional characterization is however not available for many of them, with an even worse outlook for members of the human microbiome. Chemical genomics approaches can be used to reconstruct the functions of these genes, but are limited to a few tens of species because of cost and labour constraints. We are using computational approaches such as machine learning trained on the wealth of data available for model organisms and using features extracted from nucleotide sequences to improve the current function prediction methods.