Galaxy Team and Penn State University
Many Galaxy tools are reliant upon having built-in data (e.g. genomic sequences, aligner indexes, etc) available. Although most tools can alternatively make use of data from a user’s history (e.g. a FASTA dataset), doing so often results in a decrease in performance, as e.g. one-off indexes need to be built by the Galaxy tool each time the dataset is used as a source. Unfortunately, until now, the steps required for generating and informing a Galaxy server of the availability of new built-in data has been an error-prone manual process. Here, we demonstrate new Galaxy features that simplify and automate this process.
A new class of Galaxy Utilities, known as Data Managers, has been developed. Data Managers allow an administrator to use the familiar Galaxy tool interface to download or generate the underlying data and automatically populate Galaxy’s internal built-in data registries (i.e. data tables / *.loc files). When a Data Manager finishes processing, new entries are updated (and persisted) in real-time without requiring a server restart. By using a Data Manager, not only does an administrator avoid the common pitfalls associated with manual curation of built-in data, but they also gain the same reproducibility and transparency associated with Galaxy tools.
Data Managers can be defined locally or installed automatically from a Tool Shed; the framework is flexible and is not restricted to genomic data. Administrators can access them interactively, within Workflows, and via the API. Just as with Galaxy tools, Data Manager jobs can be dispatched across existing compute resources.