Slides and more info: erlang-factory.com/berlin2015/jorgen-brandt
The need to analyze massive scientific data sets on the one hand and the availability of distributed compute resources with an increasing number of CPU cores on the other hand have promoted the development of a variety of languages and systems for parallel, distributed data analysis. Among them are data-parallel query languages such as Pig Latin or Spark as well as scientific workflow languages such as Swift or Pegasus DAX. While data-parallel query languages focus on the exploitation of data parallelism, scientific workflow languages focus on the integration of external tools and libraries. Cuneiform is a novel language for large-scale scientific data analysis that combines easy integration of arbitrary tools, treated as black boxes, with the ability to fully exploit data parallelism. We introduce a use case from next generation sequencing to discuss the way Cuneiform facilitates the reuse of existing software tools and the exploitation of data parallelism. Additionally, we discuss the way, this language was specified in Erlang and compare this specification to previous approaches. Finally, we discuss Cuneiform's architecture and the way it is implemented in terms of Erlang services.
In this talk we aim to introduce the functional workflow language Cuneiform. Its merits are exemplified with applications from bioinformatics. In addition, we aim to highlight the role Erlang played in determining the language's semantics and share experiences from implementing its execution environment in Erlang.
Bioinformatics, data scientists, language enthusiasts.
Jörgen Brandt is a PhD student at the Humboldt-Universität in Berlin. His research interests include next generation sequencing, scientific workflows and functional programming languages. He graduated in Computer Science with a specialization on intelligent systems at the Technische Universität Berlin in 2011 and in Information Technology and Networked Systems at the Hochschule für Technik und Wirtschaft in 2008.