Evolution, structure and population allele frequency accurately predict mutational effects in proteins
Natural protein sequences observed today are the result of evolutionary processes selecting for function. They can inform us on which and how sequence variations affect proteins’ biological functions, a central question in biology, bioengineering and medicine. The increasing wealth of genomic data has enabled the accurate prediction of complete mutational landscapes. State-of-the-art methods addressing this problem explicitly or implicitly model inter-dependencies between all positions in the sequence of interest to predict the effect of a particular mutation at a particular position. They infer hundreds of thousands of parameters from very large multiple sequence alignments. They require large variability in the input data and remain time consuming. Here, we present PRESCOTT (prescott.lcqb.upmc.fr), a fast, scalable and interpretable method to predict mutational outcomes. PRESCOTT considers the evolutionary history that relate natural sequences, structural information and allele frequency in human populations, when available. I will present the problem, the model, the impacts in genomic medicine and the potential improvements of the current model.