Thoughts: Why stability in nomenclature, and at what cost?
Another post on nomenclature, related to this previous post on the possibly thankfully strong influence of nomenclatural principles on taxonomic practice.
Many taxonomists, including myself certainly, continue to wonder and explore why exactly nomenclature is the way it is. The aim is first and foremost to obtain a sound explanatory account. Whether one likes the explanations, or the practice as illuminated in part by the explanations, is initially another subject.
In 1997, and in response to the then ascending PhyloCode, Dominguez & Wheeler published an article entitled:
The purpose there was to contrast the PhyloCode with the Linnaean system of nomenclature, as represented (e.g.) in the ICZN.
However, both systems are rather obviously somewhere in the middle between completely stable and completely unstable. I am assuming (not being an expert in phylogenetic nomenclature and how it tracks change in phylogenetic knowledge) that, if we first establish a fairly fine-tuned PhyloCode-based naming system informed largely by contemporary phylogenetic insights, and then at another turn discover that some “protozoans” (as initially named and classified) are instead reductive descendants of some lineage of more complex multi-cellular animals, then there will have to be some nomenclatural adjustments. The example could be made more, or less, extreme, and more specific. But I hope it is not wrong to say that phylogenetic nomenclature is responsive to realignments of some, possibly very dramatic (deep), assessments of phylogenetic identity.
The Linnaean naming system is rather universally acknowledged to respond to some yet not all kinds of taxonomic changes. We know how binomial naming works, and that many higher-level, non-Priority names can be transferred from one parent to another (“new placement”) with no name adjustments mandated.
I am interested in taking the contrast further. A naming system that is completely stable is, I suppose, somewhat akin to this. That is easy to learn! However, we need more than that to manage getting through life.
The other extreme is more interesting. In particular, we can ask what a taxonomic naming system might look like that is nearly completely unstable. Here are some suggested qualities of that system.
- Globally unique identifiers for every specimen (one or many per specimen, likely no large matter as long as they are linked [not a given]).
- Minimally, complete verbatim representation of all phenotypic and genotypic information associated with each specimen at the time of generation of that information. And “information” here stands for theory-contingent observations made and weighed by the current and corresponding systematic analysis. Ideally that information is semantically annotated and parsed (which opens up another arc for discussion that I will not follow here).
- Linkages of all specimens, and hence all associated information, that are inferred to constitute a particular taxonomic concept.
- Assignment of a globally unique identifier to the particular, one-time published taxonomic concept (say, at time = 1).
- Reuse of that taxonomic concept identifier only in those cases where reference to exactly the then (time = 1) entire set of information serves the intended reference purposes.
- Use of another, different taxonomic concept identifier in all other cases (time = x) that assert non-identical sets of information in comparison to “the package” asserted at time = 1. Even if the differences appear negligible to anyone except an algorithm that compares bit by bit.
- Complex semantics services that can establish granular specimen-, phenotype-, and genotype-anchored similarities and differences across multiple uniquely identified taxonomic concepts.
- If this sounds outrageous, let’s add another component. Every string identifying a taxonomic concept has a length of 1028 bits.
I am almost sure this system is somehow deeply flawed. That is not the main point. The initial point is that a more sensible version of this system might actually produce some services that we – human biological taxonomists – are interested in. We can query it for specimens and perceived pheno- and genotypic traits (all at time = 1, naturally). We can possibly extend those queries over multiple taxonomic concepts authored at different times. “How many concepts reference this specimen, trait, etc.?” We can possibly identify specimens to these taxonomic concepts. And we can potentially compute taxonomic provenance across concepts in light of their variously shared specimens and traits. I think this is all good.
However the system would also have exactly as many “names” (more appropriately called taxonomic concept identifiers) as it has taxonomic concepts, and the “boundaries” of identity of the latter are as narrow as defined above. Add a specimen – new identifier. Change the interpretation of a trait – new identifier. Each of these identifier strings is something of a tongue twister too.
In short, for humans the system would be utterly unusable. Impossible to learn. Millions of hard to remember data points and possibly trillions of linkages among them, none in a syntax that “speaks” to us.
My question is whether computers would look at this system differently. I think they would. And that they would do quite well with it, in terms of responding to queries where the provenance of the response is perfectly transparent and..logical.
Which leads me to the sudden end. I think that the Linnaean naming system is designed by and for our human minds and ways of communication, to strong measures. Scott Atran has written as much about this as any author I am familiar with. He speaks of “cognitive universals”.
To a computer, a Linnaean name functions logically as a taxonomic concept lineage identifier, where the lineage is potentially infinite in its temporal extension (many starting in the 1750s and still or even more rapidly evolving today), in the number of non-/congruent taxonomic concept instances connected along the reference chain, and in bifurcations and merges with regards to other (heterotypically synonymous) lineages. The strongest identifier of the lineage is the type (for those names and ranks where typification is required). That identity of the identifiers (the Linnaean names) effectively reduces the number of identifiers one (human) has to recall. Among other features that align types with our cognitive and communicative strengths. Cognitive universals also seem to have influenced the number of ranks that humans typically prefer to work with. In that sense at least, ranks are not “arbitrary” (and indeed can provide valuable cognitive services).
None of our immediate ancestors had to deal with biodiversity at the scale that our databases now do. Stability may ultimately be ignorance. It is also an evolutionary constraint on our human minds and communication. However, it is not a necessary constraint on how we build logically empowered biodiversity knowledge environments. Cue Avibase.