Modelling the Articulation of Selective and Neutral Mechanisms in the Evolution of Protein-Coding DNA Sequences
Abstract
Molecular evolution aims to characterize the mechanisms at work in the evolution of sequences, governed by a stochastic process whose main components are mutation, selection and genetic drift. In the long term, this stochastic process results in a history of substitution events along species trees, inducing complex patterns of molecular divergence between species. By analysing them, phylogenetic codon models aim at capturing the intrinsic parameters of evolution. In this context, this thesis has been focused on phylogenetic codon models, and on modelling the interplay between mutation, selection and drift shaping protein-coding DNA sequences. Because the composition of protein-coding DNA sequences does not reflect the underlying mutational process, but its filtering by selection at the level of amino acids, a careful modelling is necessary to tease apart mutation and selection. Therefore, I first developed a phylogenetic codon model of inference in which different rates of evolution give an accurate representation of how mutation and selection oppose each other at equilibrium. Between the opposing forces of mutation and selection, the balance is arbitrated by genetic drift, which in turn is modulated by effective population size. As a consequence, variation of effective population size along of a phylogeny can theoretically be inferred from the trails of substitutions along the lineages. I thus developed a second model of inference, reconstructing altogether site-specific fitness landscape, long-term trends in effective population size and in the changes in the mutation rate along the phylogeny. This Bayesian framework was tested against simulated data and then applied to empirical data. Estimates of the variation of effective population size corresponds to the expected direction of correlation with life-history traits or ecological variables. However, the magnitude of inferred variation is narrower than expected based on independent estimates. In order to understand this narrow variation in the estimated effective population size, I finally developed a theoretical model describing how changes in both effective population size or expression level of protein translate into a change in substitution rate. This response of the change in substitution rate is obtained under the assumption that proteins are under directional selection to maximize their conformational stability, and related the molecular parameters of protein biophysics. Results of this work imply a weak response of the substitution rate to changes in expression level or effective population size, which are interchangeable. This thesis demonstrates that the assumptions made on the structure of the fitness landscape have a critical importance on the sensitivity of changes in substitution rates to changes in ecological or molecular variables. Conversely, empirical observations of the patterns of substitutions in response to changes in molecular or ecological variables inform us about the underlying structure of the fitness landscape. Being based on the mutation-selection balance and by explicitly integrating effective population size, this work also presents a conceptual framework allowing to relate phylogenetics and population genetics, of which certain unification paths are envisaged
Type