Skip to content

What might we (systematists) want out of phenotype ontologies

Quick note ahead of the main entry: New paper by István Mikó et al. 2015. Generating semantic phenotypes. Worth a careful read.

The innovative paper by Ramírez & Michalik (2014) made for (another) lively discussion last week. The paper is rich with ideas and densely presented, which motivated an attempt by us to enumerate the sequence of data production and analytical steps. Another interesting question is to what extent (and why!) the authors’ approach moves away from the prevalent multi-taxon phenotype ontology approach. For instance, statements like the following (page 642) depart from the prevalent OBO language:

“As the Spider Ontology arose to manage the morphological concepts used in phylogenetic datasets, it is natural that it incorporated much of the pre-processed homology correspondences on its structure and definitions, to make room for the variety of form and function that the same organ may have in different organisms. In this way, the ontology accommodates the vast majority of homology statements currently accepted in spider systematics.”

The authors make no attempt to be taxon- or homology-neutral; indeed the term “neutral” is not mentioned in their paper (much in contrast to this new paper on “comparative plant phenomics”). I get caught up in this, but will state here once more (not just) my view that “neutrality” with regards to comparative anatomical terms is not what systematics should stand for. Instead I am with Mary Windsor (2000) who remarks (in something of an epic takedown) of Gilmour’s (1940) influential “Taxonomy and Philosophy” paper (pages 461-474 in Huxley’s The New Systematics):

“History does not support Gilmour’s conviction that science would work better if it used “neutral” words. T. H. Morgan at first avoided using the recently-coined “gene” because W. Johannsen had specified its lack of material reference; the word lived on when geneticists ignored its creator and linked it to the chromosome theory (Allen 1978, pp. 209–210). As pope of a church sceptical of natural kinds, Gilmour had few converts, not enough certainly to sustain a language, though his influential sympathizers would later include, in addition to Walters and Heslop-Harrison, Arthur Cain, Robert Sokal, Peter Sneath, and Colin Patterson. The idea of ridding science of confusion and contention by sharpening the link between words and facts remains attractive.”

Ramírez & Michalik (2014) are firmly and refreshingly in the cladistic, phylogenetic systematics realm. The core of their analytical approach is a conventional cladistic analysis which provides optimizations of character state transformations along the internal and terminal edges of their phylogeny. The phenotype ontology is used ‘on top of’ this analysis, mainly to represent transformations in structural complexity (pragmatically realized) by evaluating the presence/absence of ontological classes and relationships as optimized in matrices coded for each ancestral and terminal node. The concept of “node ontologies” is new and perhaps necessary to actually attain hierarchical structure (transformation) in this context. The details of the analysis are maybe best unraveled by working through the many data files and scripts deposited in Dryad. We are not finished digesting this important paper.


Moving forward, I am collecting here our various ideas to employ phenotype ontologies for research questions close to home. In no particular order, and somewhat complexity-heavy (see above).

SA. 1. I am currently trying to make sense of conoderine generic relationships and tribal classification using internal characters, so far with genitalia and genitalic musculature. A number of papers exist describing the structures, many for one species, and a few synonymizing terms with those described in some previous publications. A controlled vocabulary of these terms would help in a fairly standard ontologies-are-useful way, like forcing deeper understanding of the structures and having a controlled vocabulary, which in turn would facilitate the generation of hypotheses of homology for their use in cladistic analysis.

2. & 3. (kind of). Correlation of character systems related to oviposition. Howden (1995) delimited 11 (non-phylogenetic) categories based on female oviposition structures and known oviposition behaviors. Looking at a group with well known hosts and oviposition behaviors, can the character systems directly related to oviposition be correlated to other character systems linked to host/oviposition, like mouthpart morphology?

4. And then in a group where hosts and oviposition behaviors are poorly known, how well can morphological complexity of these systems predict host type or even oviposition behavior? Or perhaps species diversity in a similar way to the “adaptive zone” hypothesis explaining the overall diversity of Curculionidae. In other words, if the acquisition of the rostrum and thus the ability to oviposit inside angiosperms can partly explain the radiation of Curculionidae, can similar adaptations in character systems tied to oviposition explain diversity in certain clades? For example, a specific feature of the mouthparts/degree of complexity of the mouthparts that allowed the ancestor of a diverse clade to radiate utilizing, for example, a specific plant family or type of oviposition (e.g. oviposition in stems). The use of ontologies here could be similar to the ancestral ontology used in Ramírez & Michalik.

References: Howden, A.T. 1995. Structures related to oviposition in Curculionoidea. Memoirs of the Entomological Society of Washington 14: 53-100.

NF. 1. How do particular character systems differentially contribute to the support of deeper or shallower clades in an inferred phylogeny?

2. Can phenotype ontologies be used to characterize and quantify structural variation and “innovation” at (time-) hierarchical levels ranging from specimens to higher-level taxa (clades)?

3. Which character systems are – as revealed by an ontological-phylogenetic representation – particularly homoplasious and therefore potentially require reexamination are more narrowly scoped codings (homology assessments, terms); and in turn which character systems appear to reflect homology (qua synapomorphy) in at the most granular levels?

4. What are the implications for reasoning of tying phenotype ontology E-Q statements to specimens versus terminal taxa versus higher-level taxa (nodes)? Which choice is ideal for phylogenetic revisions?

5. Should there be two parallel phenotype ontologies that differentially meet needs of diagnostic versus phylogenetic components entailed in phylogenetic revisions? If so, how these differ in their representational goals and execution?

MG. 1. Evolution of adhesive setae in Carabidae. A talk was given at the 2006 Coleopterists Society meeting in Indianapolis. The speaker sought to track the evolution of arboreality in carabids by examining particular character systems, and singled out the presence/absence of adhesive tarsal setae. Some in the audience claimed that the adhesive setae character states the author had combined into one actually consisted of multiple distinct setal types, with different structure and function. An ontological specification of these setal types would remove ambiguity from this attempt to correlate structure with behavior. Stork (1980), in a richly illustrated paper, already outlined many of these setal types across Coleoptera, so the pickings are ripe for development of such specification.

2. Globularity of Phalacridae among flat bark beetle ancestors. Phalacridae have been shown in molecular studies to be sister to Laemophloeidae, and this group in turn sister to Passandridae; both latter families are (largely) flat, subcortical beetles as adults. However, Phalacridae are about as globular as beetles get. This implies a massive, possibly rapid evolutionary change involving virtually the entire adult body. This causes some problems for morphological phylogenetic inference, since it is often difficult to separate truly independent characters from those that are a syndrome of globularity. It would be interesting to investigate the magnitude of phenotype change along the ancestral phalacrid branch, whether “complexity” decreased or increased, and just how long it all took. I suspect there would be a “simplification” of anatomy, but the results could be surprising. Additionally, it would be interesting to do a comparison of phylogenetically contextual phenotype changes with other similarly shaped groups of Coleoptera (Hydrophilidae, Leiodidae, Hydraenidae: Limnebius, Coccinellidae, Endomychidae, Archeocrypticidae, Chrysomelinae) to see what structural trends are inherent in becoming a round, shiny beetle and perhaps get to the bottom of what might drive beetles to become this way. Some changes I can think of are as follows: position of the antennal insertion (and articulation of the radicle), orientation of the elytral epipleura, disposition of the metendosternite, and reduction of the prothorax.

3. Correlated evolution of mouthparts and feeding strategy in Cleroidea. The various groups of Cleroidea have diverse feeding strategies (predatory, fungivores, pollen-feeders) and this seems to be roughly correlated with mandible morphology. Presence of a mola usually corresponds with fungivory, and absence of a mola with predatory or nectar-feeding behavior. However, there is an ambiguous “pseudomola” state whose homology with a true mola is unclear. Presence or absence and type of prostheca is also a prominent mandibular feature. The various ornamentations of the mandible suffer from a lack of consistent definition and positional classification, and this impedes comparative studies involving mandibular morphology, and whether certain structures “preadapt” certain diet shifts (fungivory to pollen-feeding, for example). A phenotype ontology would go a long way to rectify this situation.

AJanz. 1. Current work with Curculio rostrum. A phenotype ontology could be useful for addressing questions about the evolution of the rostrum in a functional/mechanical context through the development of a controlled vocabulary describing both structures of the rostrum (especially on the fine-scale) and the structure/layering of the cuticle. Currently, there are multiple vocabularies used to describe structures according to context. The first is biological (ex.: endocuticle), the next is structural (inner fibrous laminae), the third is functional/ mechanical (inner compliant layer of composite shell). A controlled vocabulary that links these terms would allow me to tie my finite element analyses and structural inferences to the biological entities more concretely/concisely, thus permitting more robust analysis of these biological/structural/mechanical entities and their behavior in a phylogenetic context.

2. Proposed work with mouthparts (especially mandibles) of Entiminae. Similar to the above, where there are 3 (or more) concurrent sets of vocabularies used to describe the entities in question and how they function. Because the functioning of the mandibles in inherently more complex (more moving parts, more layers, more terrible equations for describing mechanical fracture of various substrata), there are likely even more sets of language that would be necessary for describing/analyzing this system. Moreover, the biological vocabulary for parts of the mandible is quite scattered and underdeveloped for this kind of application. A robust ontology linking these biology entities to their functional, structural (in the mechanical sense), and what I will call “spatial/relational” vocabularies and models would be invaluable for making sense of and better describing/analyzing these traits in a broader phylogenetic context. Just an example of the spatial context, the way in which the mandibles more relative to each other to achieve fracture of a leaf is an emergent property that cannot be described from the motion, proposed function, or structure of a single mandible, and so requires another layer of terms and complexity (mathematically speaking) to fully describe.

3. Entimine mandibles – complexity trade-offs…

4. Modeling insect cuticle across taxa more broadly. This has been done in lobster cuticle sans ontologies, at least in the sense that we use the term (using a controlled vocabulary). Mathematically, the lobster has the most completely described cuticular structure of any arthropod. The cuticle of lobsters is described by several sets of nested, hierarchical equations (very ab initio, starting from the molecular structure of chitin) and models based on known physics, biology and chemistry in the cuticle, how it is structured on every level, right up to the macro-structure. Honestly it is really an amazing paper. What is lacking, however, is a set of terms that are semantically/logically linked to describe (non-mathematically) this hierarchical composite structure. If this could be done for insects, it would allow us to use a very complex set of discrete biological entities for phylogenetic analysis. There are a ton of really good characters that could come out of a structure like cuticle, if only we knew more about its structure in insects. Also, DARPA loves the idea of a lightweight composite that could stop projectiles..look up shrilk..what they lack is the hierarchical complexity of cuticle.

AJohn. Phenotypic Plasticity in Trogloderus (or others). The slight variation found in Trogloderus from one population to another is noticeable  and perhaps tied to environment but difficult to quantify. Having a robust phenotypic ontology to work with would help illustrate, discuss, and evaluate boundaries between intra- and inter-population variation, as well as subspecies/species boundaries.

Eco-Phenotypy and Niche Conservatism. Similar to the above idea, looking at not just geographic populations, but what do varied ecological settings do to phenotype? We see minor and major males, huge size variations, etc. Comparing these within-species differences would benefit from an ontology. Similarly Ecological Niche Conservatism is a buzz word I have heard frequently, and a theory that is used for explaining species richness/distributions. Having an ontology to  more quantitatively demonstrate true similarity across phylogeny would be a huge asset there.

Evaluating Morphology Character Matrices (either alone or in conjunction with molecular data). If your matrix is already tied to an ontology, you could (more easily) filter and explore whether character systems are more/less represented in your data set. Similarly you could compare differences in morphologically-ontologically linked (genitalic vs. head vs. thoracic) and how different those are versus how different gene fragments are. This could help support whether your morpholgical matrix is “robust” compared to a given molecular dataset.

GZ. 1. Is this species new? This would require querying a description of a potentially new species to an ontology-based description database, which has to contain all described species.

2. Show me all the reduviids that have elongated fore coxae and a spine on mesonotum.

3. Compare two taxonomic descriptions of closely related species (e.g., congenerics) and provide a list of variable characters.

4. What characters are convergent (uniquely shared) among aquatic insects?

5. Does the ovipositor in Hymenoptera comprise the same set of anatomical structures as that of Orthoptera?

6. What structures are different between Reduviidae and Belastomatidae and what genes or developmental pathways result in these differences?


Lots of neat ideas! Will add mine, and we will look for shared underlying structures and possible approaches for realization in the upcoming meeting.

Leave a Reply

You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS