Thoughts: How many concepts are we talking about

Fourth post in this sequence (here are posts 1, 2, 3, respectively). Changing gears a little. The motivation for this post is to explore the interactions of explicitly and implicitly communicated taxonomic concepts in conversations among (living, meeting) humans with comparable levels of taxonomic expertise. How many identifiers are we talking about?

The exploration has two parts. The first part simulates a brief conversation of the kind that two human speakers may engage in while meeting in the hallways at a taxonomically oriented conference. The speakers know of each other, either through prior personal interactions or (minimally) by having read several of each other’s taxonomic publications. The conversation is hypothetical, and even though certain real persons are mentioned, the sole purpose of this is to add some realism, not to pass my judgment on any taxonomic particulars. The post is about exploring how the issue of taxonomic name/concept identifier resolution relates to this kind of communication, generally.

The second part examines the conservation from the perspective of representing taxonomic reference – “logically”. By that I mean framing the taxonomic content identifiers communicated explicitly or implicitly by the human speakers in such a way that a computational, logic-based application can adequately represent them. Ok, so here goes (in part, as it will turn out).

SPNHC 2015 presentation – Taxonomic concept resolution for voucher-based biodiversity information platforms

Slides are up for our SPNHC 2015 concept taxonomy presentation.

Franz 2015 SPNHC Taxonomic concept resolution for voucher-based biodiversity information platforms from taxonbytes

PathwayMatrix visualization software shows Euler/X taxonomy alignment products and ambiguities

This post serves as an update on a new Euler/X compatible visualization software called PathwayMatrix, and also as a mini-review of the Exploring Taxonomic Concepts (ETC) Information Visualization Workshop, held on May 11-13, 2015, at the National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign. The workshop was organized by Bertram Ludäscher of the Euler/X Project and ETC lead information scientist Hong Cui.

Thoughts: Humans, computers, and identifier granularity

Third post in this sequence. In the first post, I reviewed that biological nomenclature promotes (even requires) fairly deep taxonomic semantics, due to semantically forceful principles such as Typification, Priority, Coordination, and Binomial Names. In the second post, I suggested (again, nothing very new here) that the Linnaean system has many features which, given the task on hand (reliably identifying nature’s hierarchy), are nearly optimally aligned with evolutionarily constrained human cognitive universals.

Both posts are ultimately about advancing biodiversity informatics infrastructure design. That motivation points to finding sound models of knowledge communication in the taxonomic domain. Lessons from the two preceding posts may be as follows. (1) If the goal is to build data environments that largely continue to reflect the strengths and weaknesses of human cognitive universals, then the particular balance struck by Linnaean names and name relationships acting as identifiers of evolving human taxonomy making is adequate. (2) There may be better solutions out there, particularly solutions that more effectively utilize the reasoning and scalability strengths of computational logic.

New publication: Provenance for explaining taxonomy alignments

A short paper related to the Euler/X toolkit and concept taxonomy alignment project has been published. It deals with the issue of diagnosing inconsistent input constraints in an attempted pairwise taxonomy alignment, analyzing and visualizing their logical provenance so that the user can localize the inconsistencies and proceed towards repairing them. These logic services are already implemented in the toolkit.

Abstract. Derivations and proofs are a form of provenance in automated deduction that can assist users in understanding how reasoners derive logical consequences from premises. However, system-generated proofs are often overly complex or detailed, and making sense of them is non-trivial. Conversely, without any form of provenance, it is just as hard to know why a certain fact was derived. We study provenance in the application of Euler/X, a logic-based toolkit for aligning multiple biological taxonomies. We propose a combination of approaches to explain both, logical inconsistencies in the input alignment, and the derivation of new facts in the output taxonomies.

Chen, M., S. Yu, P. Kianmajd, N. Franz, S. Bowers & B. Ludäscher. 2015. Provenance for explaining taxonomy alignments. In: Ludäscher, B. & B. Plale (Editors), Provenance and Annotation of Data and Processes. Revised Selected Papers of the 5th International Provenance and Annotation Workshop, IPAW 2014, Cologne, Germany, June 9-13, 2014. Lecture Notes in Computer Science 8628: 258-260. Available on-line here.

Weekly reading: Ramirez et al. on structural complexity in ancestral ontologies (again)

Last week we read and appreciated Seltmann et al.’s (2012) effort to carefully describe the benefits, use, and user community roll-out of the spectacularly well annotated Hymenoptera Anatomy Ontology Portal. We clearly need and want something like this for Coleoptera. That said, we continue to explore options to maybe do things a little differently. Looking for inspiration, we are reading once more what is to my mind one of the best demonstrations of how phenotype ontologies can be used to address research questions – by phylogenetic systematists, for phylogenetic systematists.

Ramírez, M.J. & P. Michalik. 2014. Calculating structural complexity in phylogenies using ancestral ontologies. Cladistics (Early View). Available here.

We are also starting, based on this semester’s cumulative readings, to formulate some interests of our own. Hence the following homework for all; due by next Wednesday’s discussion.

Formulate three research themes or questions that are comparative/phylogenetic in nature and could possibly make use of phenotype ontologies. Be very specific; ideally starting with the taxonomic group and character system that you are most intimately acquainted with. (in my case, e.g., that might be acalyptine weevil mouthparts). Best to work outward from the current core of your taxonomic expertise. Research ideas might take into account (yet are clearly not limited to):

  • Evolution of phenotype complexity, reduction.
  • Correlations across character systems.
  • Presence/absence of traits across larger phylogenetic groups and within/among subgroups.
  • Relationships of traits to non-organismal variables (e.g., environment).
  • Annotations and inferences targeting the specimen level versus or higher taxon entities.
  • Evolutionary rates, timing.
  • Associations, coevolutionary themes.
  • Information availability, completeness, suitability for analysis.
  • … [insert your favored domain of phenomena or inquiry here]

The idea is to engage in a bit of a reverse engineering exercise. We know that the earliest phenotype ontologies came out of the model organism community – what Nelson & Platnick (1981) might refer to as “general biology” (pages 4-5). Yet systematists tend to ask comparative questions. What (if any) general structures, entities, and relationships do these comparative/phylogenetic questions entail? Which kinds of inferences are we (most) interested in? How would the components needed to accommodate the inferences be fruitfully translated into a logic framework?

In other words, let’s pretend we are well advised to engage in some conceptual modeling for the future design of a Coleoptera Anatomy Ontology (which may not carry such a name in the end). Start with nailing down our most highly domain-specific questions. Abstract overarching design needs from these. Pretend that solutions will follow.

Weekly reading: Balhoff et al. on a semantic model for wasp species description

Following Daduhl et al. and Vogt et al., our third paper in the phenotype ontologies Weekly Discussion series will dive into an applied example by Balhoff and co-authors (mainly of the Deans Lab) with a clear taxonomic emphasis. Already we have seen that different scientific orientations draw on phenotype ontologies with the expectation of reframing and solving specific problem complexes.

Daduhl et al.‘s focus was firmly within the bounds of evolutionary and phylogenetic analyses of phenotypes across broader and deeper taxonomic scales. Implementation challenges notwithstanding, there was an underlying agreement that the legacy of phenotype-centric systematic work could be appropriated towards the outlined representation and inference goals.

Vogt et al., in turn, emphasized a need for consistent, machine-processable standards with regards to phenotype syntactics, semantics, etc.; including a separation of descriptive and evolutionary/explanatory elements in our morphological terminology. This has the makings of a potentially divergent paradigm in relation to Daduhl et al.‘s program and perspective.

Another interesting development is the Phenoscape team’s exploration of homology relations in ontologies, outlined here:

In light of these different lines of research, we set ourselves two immediate questions to address:

1. What are actual applications that utilize phenotype ontologies and (optionally) reasoning for (a) multi-taxon studies with (b) an evolutionary/systematic orientation?

2. Suppose we had the “awesome ontology & reasoning” infrastructure on hand, where current technological limits no longer apply. What kinds of questions would  we ask this infrastructure to solve for us (that cannot be addressed otherwise)?

The paper for next week applies directly to these questions.

Balhoff, J.P., I. Mikó, M.J. Yoder, P.L. Mullins & A.R. Deans. 2013. A semantic model for species description applied to the ensign wasps (Hymenoptera: Evaniidae) of New Caledonia. Systematic Biology 62: 639–659. Available on-line here.

Conference presentation: Explaining taxonomy’s legacy to computers – how and why?

I will give an updated presentation on the Euler/X project and concept taxonomy at the conference “The Meaning of Names: Naming Diversity in the 21st Century”, held at the Museum of Natural History, University of Colorado – Boulder, on September 29 to October 01, 2014. Slides are posted on Slideshare, and linked here. Thanks to Rob Guralnick for the invitation!

Franz. 2014. Explaining taxonomy’s legacy to computers – how and why? from taxonbytes