Slides can be found here: slideshare.net/PyData/how-web-apis-and-data-centric-tools-power-the-materials-project
Python has been an important tool for analysis and manipulation of scientific data. This has traditionally taken the form of large datasets on disk or in local databases, which are then processed by sophisticated numerical and scientific libraries (SciPy and friends). Increasingly, science is becoming a collaborative enterprise where "big data" is generated in multiple locations and analyzed by multiple research groups.
In this talk we discuss how Python data analysis can help scientists work more collaboratively by integrating Web APIs to access remote data. We will discuss the details of this approach as applied to the Materials Project (see materialsproject.org), a Department of Energy project that aims to remove the guesswork from materials design using an open database of computed properties for all known materials. Using the Python Materials Genomics (pymatgen) analysis package (see packages.python.org/pymatgen), Materials Project data can be seamlessly analyzed alongside local computed and experimental data. We will describe how we make this data available as a web API (through Django) and how we provide access to both data and analysis under a single library. The talk will go over the technology stack and demonstrate the potential power of these tools within an IPython notebook. We will finish by describing plans to extend this work to address key challenges for distributed scientific data.
Loading more stuff…
Hmm…it looks like things are taking a while to load. Try again?