This Wiki is the documentation for the BioVeL Biodiversity Virtual Laboratory, If you discover problems or have suggestions for improving this documentation please send an email to

Please visit for background information about the BioVeL project itself.

Skip to end of metadata
Go to start of metadata


The Taxonomic Data Refinement Workflow provides an environment for preparing observational and specimen data sets for use in scientific analyses such as: species distribution analysis,species richness and diversity studies, species occurrence studies, historical analysis, and other spatio-temporal analyses.


Name of the workflow and its myExperiment identifier

Name: Taxonomic Data Refinement (Integrated) Workflow

The workflow can be downloaded from myExperiment workflow 2874 version 17

Date, version and licensing

Last updated: 17th December 2014

Version: Data Refinement Workflow v17

Licensing: Creative Commons Attribution ShareAlike CC-BY- SA

How to cite this workflow

To report work that has made use of this workflow, please add the following credit acknowledgement to your research publication:

The results reported in this publication come from processing data (<personal source or others--cite which, e.g. GBIF>) through BioVeL workflows and services ( The taxonomic data refinement workflows were run on <date of the workflow run>. BioVeL is funded by the EU’s Seventh Framework Program, grant no. 283359.

Literature reference

Mathew, C., Güntsch, A., Obst, M., Vicario, S., Haines, R.,  Williams, A. R.,  de Jong, Y. & Goble C.: A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control Biodiversity Data Journal 2: e4221 (11 Dec 2014), doi: 10.3897/BDJ.2.e4221

Scientific specifications


Taxonomy, Species richness and diversity, Species occurrence, Species distribution analysis, Taxonomic data cleaning and refinement, Historical analysis, Taxonomic Name Resolution, Synonym expansion, Geo-temporal data selection and filtering, spatio-temporal analysis, Occurrence retrieval, Data quality and filtering

Scientific workflow description

The workflow accepts input data in a recognized format, and these data can be combined from various sources (e.g., occurrence retrieval services, local user data sets). The workflow includes a number of graphical user interfaces to view and interact with the data, while the output of each part of the workflow is compatible with the input of each part. This implies that the user is free to choose any specific sequence of actions and repetition of steps. The construction of the workflow also allows for custom-built as well as third-party tools applications to easily be integrated into the workflow. Currently, the data refinement workflow is composed of three distinct parts:

1. Taxonomic Name Resolution / Occurrence retrieval. Here users can resolve a list of scientific names using taxonomic checklists. This process results in the retrieval of taxonomic information related to each scientific name, including related synonyms as well as other concept information like rank, classification, etc. The resulting information can then be used to retrieve occurrence data or saved as a list of expanded names. The synonym expansion and occurrence retrieval are built on generic frameworks allowing the inclusion of multiple sources.

2. Geo-temporal data selection. Here users select, filter, and refine data records according to spatial and temporal criteria. Geographical selection can be done by drawing polygons, circles, rectangles, etc., on a map as well as by filtering data based on geo-markers (e.g. country, latitude/longitude). Records relating to specific time periods can also be isolated using time-based filtering. The web-based ‘BioSTIF’ client provides these functionalities.

3. Data quality checks / filtering. Here the user can apply a set of data quality and data integrity rules on the data matrix. This allows users to perform data-specific cleaning and filtering. This tool is developed based on ‘Google Refine’ as the central interface for accessing the various local as well as external functionalities.

Technical specifications

The workflow has been developed to be run in the Taverna automated workflow environment. In its current form (version 14), the workflow file (with the .t2flow extension) can be loaded and executed in the BioVeL portal as well as in the Taverna Workbench. In the case of running it in the Taverna Workbench, because the workflow is dependent on external libraries (written in JAVA ) as well on as an instance of the Google Refine, is necessary to follow the instructions described in the page How to install and run DR workflows on Taverna Workbench.

Note: The requirement of running the Google Refine server locally will be optional in future releases since a remote version of the server will be provided as default.


  • No labels

1 Comment

  1. Anonymous

    species occurrence studies