Skip to content

TDWG 2013: Crowd sourcing and community management capabilities with Symbiota

Presentation sequence in development for the following TDWG 2013 presentation: Franz, N., C. Gries, T. Nash III & E. Gilbert. 2013. Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals.

  1. Introduction to the Lichen, Bryophytes and Climate Change Portal
  2. Project Goals
    1. Digitize 2.3 million lichen and bryophyte specimens.
    2. Image all specimen labels; 16 digitization centers located across the United States.
    3. Symbiota is the management platform used to coordinate the digitization workflows.
  3. Key Project URLs
    1. http://lichenportal.org
    2. http://bryophyteportal.org
  4. Data Entry Process – based on specimen labels and using a combination of several methods:
    1. General data entry through Symbiota by the respective herbarium personnel.
    2. OCR and NLP of imaged voucher labels through manual and batched processes.
    3. Identification of duplicate specimens already entered by other institutions, and ability to simply copy over data.
    4. Creation of a crowd sourcing application and workflow for data entry by the general public.
  5. Symbiota Crowd Sourcing (CS) App and Workflow
    1. New records – skeletal and with an OCR/NLP processed image – are submitted to the CS queue by respective collection managers.
    2. Data entry of records submitted to the queue can be carried out by any user with a viable (self created) login signature; no additional permissions are needed.
    3. Certain fields are locked from being edited: catalog number, scientific name (generally supplied as skeletal data collected at the time of imaging).
    4. Data entry form verifies data within certain fields: scientific name, date, coordinates against state/country.
    5. When a record is saved, the CS status for that record moves to “pending”.
    6. A collection manager reviews records periodically (nightly, weekly, etc). If needed, the manager can edit, change point assignments (2 points is default), and add comments. The manager then changes the record status to “approved”, which closes the record and again limits edits to only those who have editing rights for that specific collection.
    7. Collection managers can then periodically download closed records for import into their local database system, although managing data directly within the portal is considered most efficient and accurate.
  6. Drupal Front End Pages to the LBCC Portal
    1. A set of custom front end pages was built specifically for the LBCC project to provide custom access to LBCC records, as well as instructions specifically relevant to lichen and bryophyte specimens.
    2. The main purpose of this portal entry environment is to create a personalized and engaging user experience rather than conveying a sense of just dry data entry.
    3. An introduction for prospective croud source community members for the LBCC is available here: http://lbcc1.acis.ufl.edu/?q=volunteer
    4. Special expeditions are a subset of the records queue for CS data entry and are identified as being part of a “special group/theme” of specimens. Such groups are identified and assembled through wildcard searches of OCR text blocks. For example, the Harriman Alaska Expedition looks for queued records that have “Harriman” somewhere within the OCR text block.
    5. Expeditions are meant to educate those who are performing the data entry about a specific event. They also aid data entry because the user generally deals with a homogeneous type of label format, as opposed to shifting regularly between numerous layout types.
    6. At present (October, 2013) there only one special expedition is available: the Harriman Alaska Expedition. There are plans for the future to offer numerous special expedition “packages” to prospective crowdsourced members. Exsiccati (dried fungi) will likely be part of the specimen batches that make up some of the expeditions to be posted in the near future.
    7. Available lichen expeditions are at: http://lbcc1.acis.ufl.edu/?q=node/21
    8. Available bryophytes expeditions are at: http://lbcc1.acis.ufl.edu/?q=bryophyte_expeditions
    9. Custom Expeditions are located at the bottom of the expedition page.
    10. A user can query the CS queue by skeletal data that were collected by the imaging team. Skeletal data tend to entail information that is easily recorded in bulk because collections are grouped and filed according to certain criteria (e.g. country, state, scientific name, family). A user might want to process all specimens from their home region, or their preferred taxonomic groups; thereby personalizing the experience.
    11. A user can also search raw OCR blocks to return subsets that are related to a certain exsiccati name, collector, label title, etc. This feature is definitely useful though but not as reliable because (e.g.) “Nash” will return specimens collected or determined by lichen specialist Thomas Nash III, and also records where Dr. Nash is the author of the scientific name, or specimens from Nash County, NC.
  7. New Crowd Sourcing Central module for Symbiota
    1. Available at http://lichenportal.org/portal/collections/editor/crowdsource/central.php
    2. Shows scores and collections participating in the crowdsourcing, along with their statistics.
    3. Available to all viewers of the site, irrespective of whether they are logged in.
    4. If users are logged in, then their scores will be displayed in a separated information “box”.
    5. The link above will generally – on most portal sites – be added to the left menu, or made available from another crowdsourcing page that is custom generated for a project. For instance, the LBCC project will likely link to this page from their Drupal front page.
    6. Clicking on “review records” within the Current User’s Standing box will take the user to the review page (see below).
    7. Clicking on numbers within the collection table will take the user to a list of specimens queued up for data entry (and open specimens within the CS queue).
  8. Collection Manager’s Crowdsourcing Control Panel
    1. Available at http://lichenportal.org/portal/collections/editor/crowdsource/controlpanel.php?collid=22
    2. Available only to collection managers.
    3. Shows statistics only for a given collection.
    4. Available also from the collections control panel (not yet implemented in the public site) and in Crowdsource Central (via the editing symbol to the right of the collection names).
    5. Allows managers to edit crowdsourcing instructions or link to a training URL.
    6. A link to the right of “Available to Add” is where a collection manager would add their records to the Crowd Sourcing Queue.
  9. Review Page
    1. Available from a collection manager’s perspective or a user’s perspective, yet behaves slightly different depending on the perspective.
    2. Collection manager perspective:
      1. Main purpose is to enable a quick review of specimen records that are pending (or re-review of closed records).
      2. Available from the Collection Manager’s Crowd Sourcing Control Panel by clicking on the “review” link to the right of the numbers.
      3. A collection manager can assign points to an annotated record (2 points is the default value), comment, and change the CS status to closed (approved).
      4. Managers can edit all records, whether they are pending or closed.
    3. User perspective:
      1. Available from Crowd Sourcing Central by clicking on “review records” within the Current User’s Standing box.
      2. Allows user to review and access records with pending status.
      3. Allows user to review points and provide comments for closed records.
      4. Users can edit all pending records.
      5. Users can review yet not edit records that have been closed by a collection manager.

Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals

Nico Franz, Corinna Gries, Edward Gilbert

Building: Grand Hotel Mediterraneo; Room: Sala dei Continenti; Date: 2013-10-29: 12:00-12:10 pm

Abstract

Symbiota (http://symbiota.org/tiki/tiki-index.php) is an open source software designed to promote and facilitate collaboration among those working to document biodiversity. Symbiota has become increasingly popular in recent years in North America, due in part to its suitability to support large herbarium networks and NSF-sponsored Thematic Collections Networks (TCNs; see https://www.idigbio.org/content/thematic-collections-networks). The specimen-based Content Management System (CMS) provides a shared platform allowing researchers to manage biological resources as an integrated network. Data management through a community-based system has allowed for the development of several features and workflows that have enhanced efficient data entry while improving overall data integrity and quality. On-line data entry directly from an image of the specimen label allows for label transcription and error resolution that can call upon a global user community. A novel crowd sourcing feature in Symbiota offers collection managers the ability to submit specimen label images to a queue for group data entry by a volunteer task force. To improve efficiency and quality, the user interface incorporates Optical Character Recognition (OCR) and Natural Language Processing (NLP) capabilities, as well as duplicate and exsiccati record harvesting and real-time data validation. The duplicate clustering module groups duplicate specimen records across institutions, thereby obviating the need to re-enter a previously processed specimen and enhancing the task of locating and resolving misidentified specimens, viz. by highlighting the most recent annotation events within a cluster. As an additional review step, collections can opt to allow registered users to fix basic errors if and when they encountered them. Collection managers have the ability to review, approve, or revert such edits. Several other novel community features are available through Symbiota, including an integrated loan management module and pre-accessioned data entry by the original collector. We will demonstrate and discuss these features, their underlying concepts, implementation, utility, and future steps to further augment the community of contributing users.

Leave a Reply

You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS