Flexible and Transparent Data Processing Pipelines using Common Workflow Language

Doing Digital History research often involves combining functionality from different compuational tools. Currently, this is mostly solved by writing custom data processing scripts. Generally, these scripts duplicate at least some data processing tasks, and need to be adapted when used for new datasets or in other software or hardware environments. This has a negative impact on research reproducibility and reuse of existing software. The use of workflow standards might help to reduce these problems. We propose to adopt Common Workflow Language (CWL), a recent specification for describing data analysis workflows and tools.

During the workshop, I will present scriptcwl, a Python package for combining processing steps into CWL workflows, and nlppln, a Python package that helps create Natural Language Processing (NLP) tools.

Presentation