Transition matrices

Next: Nucleotide substitution models implemented Up: Nucleotide substitution models Previous: A Markov model of Contents

Transition matrices

The mathematical expression of a DNA Markov model uses a matrix

of substitution rates in which each element $r_{ij}$ represents the rate of substitution from nucleotide

to nucleotide

. The diagonal elements of the instantaneous rate matrix must satisfy the equation

$\begin{displaymath} r_{ii} = - \sum_{j \neq i} r_{ij} \end{displaymath}$

(1)

so that each row of

sums to zero. The process must be homogeneous and stationary; if $\pi_A$ , $\pi_C$ , $\pi_G$ and $\pi_T$ are the four equilibrium bases frequencies then the rates must obey the following constraint:

$\begin{displaymath} \pi_i r_{ij} = \pi_j r_{ji} \quad \forall i,j \end{displaymath}$

(2)

also known as the time-reversibility constraint. To enforce this constraint we define ${\alpha_{ij}}$ so that

$\begin{displaymath} Q_{ij} = r_{ij} = m_r \times \pi_j \alpha_{ij} \quad \forall i, \forall j\neq i \end{displaymath}$

(3)

where

is a constant factor described later. The time-reversibility condition is satisfied with a symmetric choice of $\alpha_{ij}$ . In practice, PHASE uses one of these $\alpha_{ij}$ parameters as a reference and sets its value to

. Depending on the model, other parameters (we call them rate ratios) are fixed or inferred during an analysis.

With we can compute the transition probability matrix over time .

$\begin{eqnarray*} \frac{dP(t)}{dt} & = & P(t) \times Q\\ P(t) & = & \exp(Qt)\\ & = & \exp(\{\pi_j \alpha_{ij}\} \times m_r \times t) \end{eqnarray*}$

The transition probability matrix $P(t)=\{p_{ij}(t)\}$ is used to compute the probability that nucleotide

will be nucleotide

after time

(

can be equal to

). The ``rate ratios'' matrix in PHASE refers to the matrix ${\alpha_{ij}}$ and the ``transition rates'' matrix refers to

Inference methods used do not permit the separation of , a factor proportional to the average substitution rate of the model, and , branch lengths of the evolutionary tree which reflect an amount of change. The longer the branch, the bigger the evolutionary distance between its two incident nodes. We have to impose a scaling on the branch length. In practice, we fix the average rate of substitutions of our model to be one per ``unit of time''. This is done by adding a constraint for the factor .

$\begin{displaymath} m_r \times \sum_{i=1}^{nb_{states}} \sum_{j \neq i} \pi_i r_{ij} = 1.0 \end{displaymath}$

(4)

This last constraint does not hold when multiple substitution models are used simultaneously in the MIXED model. The average substitution rate of the first model is still fixed equal to 1.0 but the average substitution rate of other models is now a free parameter.

Next: Nucleotide substitution models implemented Up: Nucleotide substitution models Previous: A Markov model of Contents

Gowri-Shankar Vivek 2003-04-24