Best Practice Guidelines
The following principles are taken from “Best practices for workflow design: how to prevent workflow decay” (Hettne et al. 2012) with additions and refinements by ourselves.
- Make an abstract workflow: A workflow sketch provides a reference to the main task(s) of the workflow through its implementation process. A workflow can be compared to a scientific protocol, so sketching out the method helps when designing the workflow. A scientific sketch of the workflow will help in communication with for example, supervisors and colleagues, while at the same time promoting sharing between computer and human generated systems due to its non-explicit nature.
Use modules: One of the main strengths of workflows is the possibility of plugging in and re-using parts and also swapping broken parts, plugging in different methods and comparing them. Implementing all the executable components of a workflow in such a way that they can be run as separate sub-workflows facilitates the understanding, maintenance, re-use and separate testing and validation of the workflow.
Create a nested workflow for the core analysis part of your workflow. As far as possible, keep interaction pages and file reading and writing outside of this nested workflow. This makes it simpler to reuse the nested analysis workflow as a component within a larger workflow (where the data provided by input files and interaction pages may already have been provided by the user).
- Think about the output: What is the output intended for? Is it supposed to be used as input to another workflow, stored in a database, or be presented to the end user? Should it be a graph, a table or text? And at what size and resolution? Thinking about the desired output of the workflow at the design stage is easier than trying to adjust a finished workflow. It will drive the design of the next steps. Also, a workflow has the potential to produce masses of data that need to be visualised and managed properly. Think also about the preparation of the input that may be needed.
- Provide input and output examples: Inputs and output examples are crucial for: the understanding of the workflow, validation, maintenance purposes, as well as to be able to use them as tools for training or tutorials.
- Annotate: Careful annotation of a workflow helps to record all steps and assumptions hidden in the workflow, what is not only needed for a publication later on but also crucial for the scientific method. It also facilitates use and re-use of workflows. There is no accepted standard for annotating a workflow. We propose to choose meaningful names for the workflow title, inputs, outputs, and for the processes that constitute the workflow as well as for the interconnections between the components, so that annotations are not only a collection of static tags but capture the dynamics of the workflow. A high-level functional annotation should be included (for example similar to the functional units suggested in ), as well as a description of the resource, keeping in mind a scenario where it may disappear or change at some time in the future. Don’t forget to include a version number and to increment it every time a new version of the workflow is made.
Annotate all the parts of your workflow. Especially, provide annotations about the inputs that are needed and the outputs that are produced by the workflow. Provide a comprehensive description of what the workflow does from the scientific perspective, including reference(s) to published work, if applicable.
- Make it executable from outside the local environment: This best practice leads to portability of the workflow, which potentially increases its reproducibility and re-use. It can for example, be realised either by using remote Web services, or platform independent code/plugins. However, if there is need to use local services, library or tools, then the workflow should be annotated in order to define its dependencies i.e. which local tool, version or operating system is required, where to find it, if it is licensed or any other particular restriction e.g. the application has to be called with a particular name.
- Choose services carefully: One of the major reasons that causes workflows to break are volatile third-party services. The status, reliability and stability of a Web Service as well as the reputation of the Service Provider are often the deciding factors for choosing a service. Choose services based on known reputation or from a well-founded catalogue, like the Biodiversity Catalogue (www.biodiversitycatalogue.org) where some measure of curation has been applied.
- Re-use existing workflows: Re-use is important for many reasons. It fights redundancy, and perpetuates “tried and tested” and published methods conveying good scientific practice. It will also help the workflow developer get ideas on methods and workflow patterns. It is also beneficial when repairing workflows: repairing a given workflow may entail repairing the workflows in which it is used as a sub-workflow.
- Test and validate: Defining test cases and implementing validation mechanisms facilitates maintenance, decay identification and helps to guarantees the correctness of the results by, for example including components in the workflow whose function is checking assertions that must be true.
Advertise and maintain: It is a duty of science to share your results. It also helps progress by letting others build on your work without reinventing it. Workflow maintenance is expected to increase the longevity of the workflows. Frequent testing, monitoring services used, communication with other users all represent ways to maintain a workflow. Workflows can be advertised by depositing them in public repositories (like www.myexperiment.org, for example) and referring to them from your publications.
To share workflows, upload them to . When providing links to others, ensure you provide a link containing the version number if you need to refer to a particular version (e.g. the workflow run during a study).