Tuesday, March 14, 2017

Detecting introgression versus hybridization

There has been considerable interest in recent years in developing methods that will detect hybridization in the presence of incomplete lineage sorting (ILS), which will allow the construction of a realistic hybridization network. Clearly, both ILS and hybridization create conflicting gene trees, which will lead to a very complex data-display network. However, if the ILS signals in the data can be used to construct a small collection of gene-tree groups, in which the gene trees within each group are congruent with a single species tree (under the ILS model), then the incongruence between groups can be used to construct a hybridization network. This network will then be an hypothesis for a realistic evolutionary network.

Recently, a paper has appeared that uses simulations to evaluate several of these methods:
Olga K. Kamneva and Noah A. Rosenberg (2017) Simulation-based evaluation of hybridization network reconstruction methods in the presence of incomplete lineage sorting. Evolutionary Bioinformatics 2017:13.
I am not a great fan of simulations, because they exist under very restricted and usually unrealistic mathematical conditions. They are, however, useful for exploring the mathematical properties of various methods, even if they are hard to connect to the biological properties.

My interpretations of the results from the particular scenarios explored by Kamneva and Rosenberg are:
  1. Most of the methods improve as the internal network edges increase in length.
  2. Most of the methods improve as the number of gene trees increases.
  3. Under good conditions the maximum-likelihood methods do better than the parsimony and consensus methods.
  4. The maximum-likelihood methods are more affected by gene-tree error than are the other methods.
  5. There are conditions under which none of the methods work well.
I doubt that any of this is controversial, in the sense that model-based methods usually work well when their models apply, but not necessarily otherwise. Reality is more complex than the models, and so the methods are likely to fail for real data.

For me, the most interesting part of the paper is the examination of balanced versus skewed parental contributions to the hybrid taxon. A balanced genetic contribution in the simulations is analogous to homoploid or polyploid hybridization, whereas a skewed contribution is analogous to introgression or horizontal gene transfer (HGT). The simulations seem to show that the methods examined do not deal very well with skewed contributions.

So, these methods may literally be hybridization-network methods only, with separate network methods needed for detecting introgression or HGT — for example, the admixture methods used for genomes (see the recent post on Producing admixture graphs).

This would mean that we cannot first produce networks with reticulations, and then afterwards explore what is causing the reticulations. Instead, we will need to decide on the possible biological mechanisms of reticulation before the analysis, and then mathematically explore possible networks that reflect those mechanisms.

This is not an issue for constructing trees, of course, since the only recognized mechanisms are speciation and extinction, both of which are explored post hoc rather than a priori. This is an important difference of networks versus trees.

No comments:

Post a Comment