Skip to content

Using the Euler/X toolkit to align taxonomies – introductory notes

Most recent update: July 15, 2014 (minor editing updates; link to video).

Our group is involved in promoting concept taxonomy. Here are some preliminary, and evolving, step-by-step instructions on how to employ the Euler/X toolkit to align two taxonomies or phylogenies. A short video is also available here.

  • The toolkit is available for free download here. There are software dependencies with Python, logic reasoners (Prover9/Mace4, DLV, Potassco), and GraphViz.
  • Notes for the desktop-based installation of Euler/X are provided on the toolkit website. Mercurial is used to keep up with newer code versions provided via Bitbucket. For further details on installation (easiest with a Mac OS) contact Mingmin Chen or Shizhuo Yu.
  • At present Euler/X can align two taxonomies, of a size of approximately 100-250 concepts each (likely more, depending on the complexity of the input, but we have not yet tested this with real-life use cases). For this tutorial we will choose the following two higher-level phylogenies of weevils.
    • Kuschel, G. 1995. A phylogenetic classification of Curculionoidea to families and subfamilies. Memoirs of the Entomological Society of Washington 14: 5-33.
    • Marvaldi, A.E. & J.J. Morrone. 2000. Phylogenetic systematics of weevils (Coleoptera: Curculionoidea): a reappraisal based on larval and adult morphology. Insect Systematics & Evolution 31: 43-58.
  • Initially it is useful to represent each classification or phylogeny independently, and to give each taxon concept or clade (concept) a unique numerical identifier for the purpose of comparison. A screenshot of the two annotated input trees is shown here. Note the abundant re-use of names, different levels of resolution (family-/subfamily-level), and non-congruent phylogenetic arrangements that characterize these alternative perspectives.

Input trees showing the Kuschel (1995) and Marvaldi & Morrone (2000) higher-level weevil phylogenies. Each concept has a unique identifier.

  • In the next step, this input information can be translated into spreadsheet format with three basic tables. A suitable Excel template can be downloaded here.
    • (1) A table that just lists the concepts and identifiers (ID | Name_Simple | According_To).

Screenshot of concept table, with identifiers used in subsequent tables.

    • (2) A table that lists the parent-child relationships that are needed to assemble each input tree (From_TC | Relationship | To_TC). 

Screenshot of parent-child relations (is_a) table.

  • (3) A table that lists the input articulations (From_TC | Relationship_Symbol | To_TC). It is possible to add two or more concepts on one side of the assessment (1 == 2 + 3), and to express ambiguity with “or” (1 == or > 2). 

Screenshot of articulations (==, >, <, ><, |) table.

  • In the subsequent step (not yet automated), the Euler/X toolkit reads in this spreadsheet and translates it into a .txt file of the following type (download template here).
    • In this format, the first taxonomy is the more recent one (Marvaldi & Morrone 2000).
    • taxonomy (line 1) is a keyword for the toolkit, m is the reasoner-compatible name for this taxonomy and its concepts, and MaMo00 is a human-readable nickname.
    • (348 349 352) means: “348 is a parent of [all subsequent child concepts of that parent, here only 349 and 352]”.
    • articulation is a keyword for the toolkit, mk is the reasoner-compatible name for the articulations, and MaMo00Ku95 is a human-readable nickname.
    • [m.348 equals k.117] means: 348 == 117. Note the addition of the square brackets and m./k. prefixes, in the same m-k sequence established earlier in the input.
    • The available articulations are equals (==), includes (>), is_included_in (<), overlaps (><), and disjoint (|).
    • Ambiguity is expressed with [m.348 {equals includes} k.117].
    • For additional examples and practices related to concept mapping, see Franz & Peet (2009).
    • (More here, eventually, on implied children, non-coverage, concept addition).

Screenshot of an Euler/X compatible input file (.txt format).

  • In what follows we will illustrate how the Kuschel 1995/Marvaldi & Morrone 2000 input file can be analyzed with the toolkit.
    • Open a Terminal on the desktop.
    • Type cd euler-project [shifts to the corresponding program].
    • Type hg pull [connects to the Mercurial revision service on-line].
    • Type hg update [updates the local Euler/X code to the most recent version].
    • Type cd km [shifts to the directory, shown on the right, in which the input and output files (will) appear].

Screenshot of initial commands for analyzing an input alignment with Euler/X.

  • Input visualization. Type: euler -i [input filename] –iv   => This generates (inter alia) a PDF with the input visualization, shown here.

Euler/X toolkit command: input visualization.

  • Merge alignment, with overlap. Type: euler -i [input filename] -e mnpw –rcgo   => This generates (inter alia) a .csv file with all MIR – maximally informative relations (given, and newly inferred), and a PDF with the merge tree, including the visualization of overlapping articulations, shown here. The latter will not be shown if the –rcgo command is omitted.

Euler/X toolkit command: merge taxonomy, with overlap.

  • Merge alignment, showing combined (merge) concepts. Type: euler -i [input filename] -e mncb   => This command creates a PDF with the merge tree, including the visualization of new merge concepts that are the resulting partitions of overlapping input concepts, shown here.

Euler/X command: merge taxonomy, with combined concepts.

  • Below is a screenshot of the terminal and output folder (most important files highlighted) following the –iv and -e mnpw –rcgo commands.

Screenshot of Euler/X terminal and output files following the –iv and -e mnpw –rcgo commands.

  • Below is a legend that is helpful for understanding the current merge taxonomy displays.

Generic legend that allows interpretation of the Euler/X merge taxonomies. Overlap reflects articulations that cannot be accounted for by simple “lumping and splitting”.

  • More at a later stage…

Leave a Reply

You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS