1. HDF5 is for Lovers


    from PyData / Added

    545 Plays / / 0 Comments

    Slides can be found here: http://www.slideshare.net/PyData/hdf5-isforlovers HDF5 is a hierarchical, binary database format that has become a de facto standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py). This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?! This tutorial is targeted at a more advanced audience which has a prior knowledge of Python and NumPy. Knowledge of C or C++ and basic HDF5 is recommended but not required. This tutorial will require Python 2.7, IPython 0.12+, NumPy 1.5+, and PyTables 2.3+. ViTables and MatPlotLib are also recommended. These may all be found in Linux package managers. They are also available through EPD or easy_install. ViTables may need to be installed independently.

    + More details
    • PyTables - Francesc Alted


      from PyData / Added

      1,553 Plays / / 0 Comments

      HDF5 is a standard de-facto binary file type specification. However, what makes HDF5 great is the numerous libraries to interact with files of this type and their extremely rich feature set. HDF5 has many bindings for different languages, like C, C, Fortran, Java, Perl and, of course, Python. During my tutorial I'm going to explain the basics on using HDF5 through PyTables, one of the Python bindings for Python, and how PyTables leverages (and enhances) HDF5 capabilities so as to cope with extremely large datasets, specially in tabular format. I'll start describing the basic capabilities that PyTables exposes out of HDF5, like creating and accessing large multidimensional datasets, both homogeneous and heterogeneous, and how they can be annotated with user-defined metadata (attributes). Then, I'll proceed on specific features of PyTables, like high performance compressors (Blosc), automatic parametrization for optimizing performance, how to do very fast queries (using OPSI, a query engine that allows different size/performance ratios in the indexes), and will finish with a glimpse on how to perform out-of-core (also called out-of-memory) computations on huge datasets in a very efficient, memory conscious, way (via the high performance numexpr library). This talk was presented at PyData NYC 2012: http://nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: http://sv2013.pydata.org/

      + More details

      What are Tags?


      Tags are keywords that describe videos. For example, a video of your Hawaiian vacation might be tagged with "Hawaii," "beach," "surfing," and "sunburn."