Taxonomic concept identification, reconciliation, and the Open Tree of Life – Part 1
“Part 1” – allowing for subsequent parts (e.g., for the blue/non-blue use case; now added at end of post). The relevant OpenTree background discussion is here:
There as several interesting and generally closely overlapping issues and views here. I will pick just (or mainly) the one introduced by Jonathan Rees (November 02, 2014).“Here’s another example I’m struggling with: there are currently a couple of species in OTT that are misclassified as crustaceans instead of molluscs. When we fix this problem, there will be an incompatible ‘change’ in the membership of Arthropoda. Does this mean that the new group should get a new identifier? – after all its identity in some sense has changed. If so, annotations and OTU mappings linked to the old id have no home in the tree. It doesn’t get a new id with the current taxonomy generator, which assumes that names are tied uniquely to taxon concepts (with some exceptions), but with a more principled system where groups are defined by membership or phylogenetic hypotheses, it might. This would have an impact on OTU mappings and annotation carryover. I don’t have a good answer to this one, but am working on ways to anchor the semantics of ids.”
I am attempting to reproduce this under varying scenarios in the Euler/X toolkit. Starting simple, then expanding. I have created an initial scenario with two phylogenetic perspectives of the Ecdysozoa (molting animals) sec. ott1 (= OpenTree Topology at time =1; the later version) versus Ecdysozoa sec. ott0 (OTT at time = 0; the earlier version. Initially, an alignment with complete taxonomic congruence.
1A. Complete congruence, input visualization.
1B. Complete congruence, alignment visualization.
Now added to this, two species-level concepts “ProblemSpeciesA” and “ProblemSpeciesB” sec. ott1/ott0, respectively. These two concepts are “errorneously” assigned at time = 0 to a crustacean genus-level concepts in ott0 – i.e., as children of ott0.Crustacea_Genus17 – but subsequently placed at time = 1 as children of the molluscan genus-level concept ott1.Mollusca_Genus3. Shown is the toolkit’s default alignment (-e mnpw –rcgo).
2A. With Problem Species, input visualization.
2B. With Problem Species, alignment visualization.
Some insights emerge. First, and evidently, the combination of the default reasoner setting and default input annotation, plus the “radically” different placement of the two low-level concepts, cause a good amount of taxonomic incongruence at super-ordinated levels. Second, the “noise ends” at the level of Ecdysozoa sec. ott1/ott0. In this particular case, because we presumably have (1) differential taxonomic placements and (2) differential phenotypic property assignments to the four problematic concepts in question, I am not sure (at the moment) that property-centric individuation of concepts will change things.
And here is a simplified version of Arlin Stoltzfus’ example. Simplified to just represent blue and non-blue as character state(s) (concepts). These get assigned incongruent sets of taxon concept at time = 1 versus time = 0. Nevertheless they retain congruence.
3A. Blue/non-blue example, input visualization.
3B. Blue/non-blue example, alignment visualization.
Under this particular representation, the taxon/character duality is apparent – two views of “the same phenomenon”.