1. Python is quickly becoming the glue language which holds together data science and related fields like quantitative finance. Zipline is a new, BSD-licensed quantitative trading system which allows easy backtesting of investment algorithms on historical data. The system is fundamentally event-driven and a close approximation of how live-trading systems operate. Moreover, Zipline comes "batteries included" as many common statistics like moving average and linear regression can be readily accessed from within a user-written algorithm. Input of historical data and output of performance statistics is based on Pandas DataFrames to integrate nicely into the existing Python eco-system. Furthermore, statistic and machine learning libraries like matplotlib, scipy, statsmodels, and sklearn integrate nicely to support development, analysis and visualization of state-of-the-art trading systems.

    Zipline is currently used in production as the backtesting engine powering Quantopian.com -- a free, community-centered platform that allows development and real-time backtesting of trading algorithms in the web browser. Zipline will be released in time for PyData NYC'12.

    The talk will be a hands-on IPython-notebook-style tutorial ranging from development of simple algorithms and their analysis to more advanced topics like portfolio and parameter optimization. While geared towards quantitative finance, the talk is a case study of how modern, general-purpose pydata tools support application-specific usage scenarios including statistical simulation, data analysis, optimization and visualization. We believe the talk to be of general interest to the diverse pydata community.

    This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/

    # vimeo.com/53064082 Uploaded 5,045 Plays 0 Comments
  2. Have a data science problem in Python? Need to do some ML or NLP, but find the options daunting? In this whirlwind tour, we'll go over some common use-cases, and explain where to start. More importantly, you'll learn what to avoid, and what WON'T be a valuable use of your time.

    This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/

    # vimeo.com/53058140 Uploaded 1,486 Plays 0 Comments
  3. Python's Natural Language Toolkit is one of the most widely used and actively developed natural language processing libraries in the open source community. This workshop will introduce the audience to NLTK -- what problems its aims to solve, how it differs from other natural language libraries in approach, and how it can be used for large-scale text analysis tasks. Concrete examples will be taken from Parse.ly's work on news article analysis, covering areas such as entity extraction, keyword collocations, and corpus-wide analysis.

    This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/

    # vimeo.com/53062324 Uploaded 1,479 Plays 1 Comment
  4. Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.

    This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/

    # vimeo.com/53061411 Uploaded 5,921 Plays 0 Comments
  5. The Message Passing Interface (MPI) has been called the assembly language of distributed parallel computing. It is the de facto message passing standard for effectively and portably utilizing the world's largest (and smallest) supercomputers. In this workshop, we will discuss how MPI can be utilized via several Python implementations, e..g mpi4py and pupyMPI, as the messaging strategy between your parallel programs.

    This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/

    # vimeo.com/53060517 Uploaded 314 Plays 0 Comments

PyData

PyData PRO

Videos from PyData Conferences and related to PyData tools and topics

Browse This Channel

Shout Box

Heads up: the shoutbox will be retiring soon. It’s tired of working, and can’t wait to relax. You can still send a message to the channel owner, though!

Channels are a simple, beautiful way to showcase and watch videos. Browse more Channels.