At AppNexus, we've experienced explosive growth over the last three years. Our data pipeline, horizontally scaled in Hadoop and Hbase, now processes more than 15 terabytes every day. This has meant the rapid scaling and iteration of our optimization tools used for big data exploration and aggregations. Unlike other more complicated programming languages, Python's versatility allows us to use it both for offline analytical tasks as well as production system development. Doing so allows us to bridge the gap between prototypes and production by relying on the same code libraries and frameworks for both, thereby tightening our innovation loop.
We'd like to share our best practices and lessons learned when iterating and scaling with Python. We'll discuss rapid prototyping and the importance of tightly integrating research with production. We'll explore specific tools including Pandas, numpy, and ipython and how they have enabled us to quickly data-mine across disparate data sources, explore new algorithms, and rapidly bring new processes into production.
Van is a lawyer at Haynes and Boone, where he spends most of his time helping clients with patent defense and open source questions. For a lawyer, though, he spends an inordinate amount of time working at a Python prompt, trying to automate all the tedious parts of his job and advancing his hobby of computational linguistics.
In the rest of his time, Van works as chairman of the Python Software Foundation where he speaks and writes on open source issues. His first book on open source software and intellectual property law was published by O'Reilly and he is working on a second book about the economics of open source.
Since v0.8, the pandas library has greatly expanded its timeseries functionality. This tutorial will give an introduction to working with timeseries data in pandas. We'll cover how to create date ranges, convert between point (Timestamp) and interval (Period) representations, convenient indexing and time shifting, changing frequencies, resampling, filtering, and how to work with timezones. Attendees should be familiar with Python, Numpy, and pandas basics.