On 2015-12-01 I gave a talk at Erlang Factory Lite.
Cuneiform: A Functional Workflow Language Implementation in Erlang
The need to analyze massive scientific data sets on the one hand and the availability of distributed compute resources with an increasing number of CPU cores on the other hand have promoted the development of a variety of languages and systems for parallel, distributed data analysis. Among them are data-parallel query languages such as Pig Latin or Spark as well as scientific workflow languages such as Swift or Pegasus DAX. While data-parallel query languages focus on the exploitation of data parallelism, scientific workflow languages focus on the integration of external tools and libraries. Cuneiform is a novel language for large-scale scientific data analysis that combines easy integration of arbitrary tools, treated as black boxes, with the ability to fully exploit data parallelism. We introduce a use case from next generation sequencing to discuss the way Cuneiform facilitates the reuse of existing software tools and the exploitation of data parallelism. Additionally, we discuss the way, this language was specified in Erlang and compare this specification to previous approaches. Finally, we discuss Cuneiform’s architecture and the way it is implemented in terms of Erlang services.