The Taxonomic Data Refinement Workflow provides an environment for preparing observational and specimen data sets for use in scientific analyses such as: species distribution analysis,species richness and diversity studies, species occurrence studies, historical analysis, and other spatio-temporal analyses.
Name of the workflow and its myExperiment identifier
Name: Taxonomic Data Refinement (Integrated) Workflow
The workflow can be downloaded from workflow 2874 version 17
Date, version and licensing
Last updated: 17th December 2014
Version: Data Refinement Workflow v17
How to cite this workflow
To report work that has made use of this workflow, please add the following credit acknowledgement to your research publication:
The results reported in this publication come from processing data (<personal source or others--cite which, e.g. GBIF>) through BioVeL workflows and services (www.biovel.eu). The taxonomic data refinement workflows were run on <date of the workflow run>. BioVeL is funded by the EU’s Seventh Framework Program, grant no. 283359.
Mathew, C., Güntsch, A., Obst, M., Vicario, S., Haines, R., Williams, A. R., de Jong, Y. & Goble C.: A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control Biodiversity Data Journal 2: e4221 (11 Dec 2014), doi: 10.3897/BDJ.2.e4221
Taxonomy, Species richness and diversity, Species occurrence, Species distribution analysis, Taxonomic data cleaning and refinement, Historical analysis, Taxonomic Name Resolution, Synonym expansion, Geo-temporal data selection and filtering, spatio-temporal analysis, Occurrence retrieval, Data quality and filtering
Scientific workflow description
The workflow accepts input data in a recognized format, and these data can be combined from various sources (e.g., occurrence retrieval services, local user data sets). The workflow includes a number of graphical user interfaces to view and interact with the data, while the output of each part of the workflow is compatible with the input of each part. This implies that the user is free to choose any specific sequence of actions and repetition of steps. The construction of the workflow also allows for custom-built as well as third-party tools applications to easily be integrated into the workflow. Currently, the data refinement workflow is composed of three distinct parts:
1. Taxonomic Name Resolution / Occurrence retrieval. Here users can resolve a list of scientific names using taxonomic checklists. This process results in the retrieval of taxonomic information related to each scientific name, including related synonyms as well as other concept information like rank, classification, etc. The resulting information can then be used to retrieve occurrence data or saved as a list of expanded names. The synonym expansion and occurrence retrieval are built on generic frameworks allowing the inclusion of multiple sources.
2. Geo-temporal data selection. Here users select, filter, and refine data records according to spatial and temporal criteria. Geographical selection can be done by drawing polygons, circles, rectangles, etc., on a map as well as by filtering data based on geo-markers (e.g. country, latitude/longitude). Records relating to specific time periods can also be isolated using time-based filtering. The web-based ‘BioSTIF’ client provides these functionalities.
3. Data quality checks / filtering. Here the user can apply a set of data quality and data integrity rules on the data matrix. This allows users to perform data-specific cleaning and filtering. This tool is developed based on ‘Google Refine’ as the central interface for accessing the various local as well as external functionalities.
The workflow has been developed to be run in the Taverna automated workflow environment. In its current form (version 14), the workflow file (with the .t2flow extension) can be loaded and executed in the BioVeL portal as well as in the Taverna Workbench. In the case of running it in the Taverna Workbench, because the workflow is dependent on external libraries (written in JAVA ) as well on as an instance of the Google Refine, is necessary to follow the instructions described in the page How to install and run DR workflows on Taverna Workbench.
Note: The requirement of running the Google Refine server locally will be optional in future releases since a remote version of the server will be provided as default.