1. There was a time when the go to machine learning library was Weka, a behemoth of a Java library. Recently, Scikit-Learn has chipped away at the functionality provided by Weka, and given the Python community a comparable machine learning all-in-one library. In this talk Brian will discuss how Scikit-Learn is used to solve organic & inorganic problems at bitly. An organic decode was one in which a user makes an explicit decision to click on a link, and inorganic decode is one in which a link gets triggered without the users explicit knowledge. An example of an inorganic type of link is using a bitly encoded link to wrap a small gif that is embedded in a web page. Often these links get over inflated decode counts, which gives the naive appearance of them being popular. Brian will show how Scikit-Learn is used to decide on discriminative features, build the classifier, and test the classifier.

    This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/

    # vimeo.com/53112972 Uploaded 4,170 Plays / / 0 Comments Watch in Couch Mode
  2. Scikit-learn is a popular Python machine learning library. In this tutorial, I'll give an introduction to the core concepts of machine learning, using scikit-learn to demonstrate applications of these concepts on real-world datasets. We'll cover some of the most powerful and popular supervised and unsupervised learning techniques, including classification and regression models like Support Vector Machines and Random Forests, clustering models like K Means and Gaussian Mixtures, and dimensionality reduction models like PCA and manifold learning. Throughout, I'll emphasize the key features of the scikit-learn API, so that participants will be well-poised to begin exploring their own datasets using the wide array of algorithms implemented in scikit-learn.

    Jake Vanderplas

    Jake Vanderplas is an NSF Postdoctoral fellow working jointly in the Computer Science and Astronomy departments at the University of Washington. His research involves large-scale machine learning applications within astronomy and astrophysics. He is a maintainer of the Python packages Scikit-learn and Scipy, and regularly contributes to several of the other packages within the numpy/scipy ecosystem. He occasionally blogs about Python-related topics at Pythonic Perambulations - jakevdp.github.com.

    What is PyData?
    PyData.org is the home for all things related to the use of Python in data management and analysis. This site aims to make open source data science tools easily accessible by listing the links in one location. If you would like to submit a download link or any items to be listed in PyData News, please let us know at: admin@pydata.org

    Conferences
    PyData conferences are a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply the language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.

    We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

    A major goal of PyData events and conferences is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.

    PyData is organized by NumFOCUS with the generous help and support of our sponsors. Proceeds from PyData are donated to NumFOCUS and used for the continued development of the open-source tools used by data scientists If you would like to volunteer to be a part of the PyData team contact us at: admin@pydata.org

    # vimeo.com/80093925 Uploaded 3,305 Plays / / 0 Comments Watch in Couch Mode
  3. I'll walk you through Python's best tools for getting your hands dirty with a new dataset: IPython Notebook and pandas. I'll show you how to read in data, clean it up, graph it, and draw some conclusions, using some open data about the number of cyclists on Montréal's bike paths as an example.

    Julia Evans

    Julia Evans is a programmer & data scientist based in Montréal, Quebec. She loves coding, math, playing with datasets, teaching programming, open source communities, and late night discussions on how to dismantle oppression. She co-organizes PyLadies Montréal and Montréal All-Girl Hack Night. Right now she is attending Hacker School in New York City.

    What is PyData?
    PyData.org is the home for all things related to the use of Python in data management and analysis. This site aims to make open source data science tools easily accessible by listing the links in one location. If you would like to submit a download link or any items to be listed in PyData News, please let us know at: admin@pydata.org

    Conferences
    PyData conferences are a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply the language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.

    We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

    A major goal of PyData events and conferences is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.

    PyData is organized by NumFOCUS with the generous help and support of our sponsors. Proceeds from PyData are donated to NumFOCUS and used for the continued development of the open-source tools used by data scientists If you would like to volunteer to be a part of the PyData team contact us at: admin@pydata.org

    # vimeo.com/79835526 Uploaded 4,122 Plays / / 1 Comment Watch in Couch Mode
  4. Computing, and thus software, is one of the foundations of modern technical work across a broad range of fields. Like anything, all software has attributes: slow, fast, buggy, robust, etc. However, these attributes are not passive and neutral. In this talk I will describe how the attributes of software have a profound affect on human behavior, attitudes and thought patterns. These attributes, for better or worse, infect all of the work that is done using the software. To explore these ideas, I will provide an attribute based tour of the IPython Notebook. This tour will elucidate the overall vision for the project and cover our recent work on interactive widgets and converting notebooks to different formats.

    Brian Granger

    Brian Granger is an Assistant Professor of Physics at Cal Poly State University in San Luis Obispo, CA. He has a background in theoretical atomic, molecular and optical physics, with a Ph.D from the University of Colorado. His current research interests include quantum computing, parallel and distributed computing and interactive computing environments for scientific and technical computing. He is a core developer of the IPython project, the creator of PyZMQ and a contributor to SymPy. Contact him at ellisonbg@gmail.com or @ellisonbg (Twitter, GitHub).

    What is PyData?
    PyData.org is the home for all things related to the use of Python in data management and analysis. This site aims to make open source data science tools easily accessible by listing the links in one location. If you would like to submit a download link or any items to be listed in PyData News, please let us know at: admin@pydata.org

    Conferences
    PyData conferences are a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply the language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.

    We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

    A major goal of PyData events and conferences is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.

    PyData is organized by NumFOCUS with the generous help and support of our sponsors. Proceeds from PyData are donated to NumFOCUS and used for the continued development of the open-source tools used by data scientists If you would like to volunteer to be a part of the PyData team contact us at: admin@pydata.org

    # vimeo.com/79832657 Uploaded 2,216 Plays / / 0 Comments Watch in Couch Mode
  5. Jake Vanderplas

    Jake Vanderplas is an NSF post-doctoral fellow at University of Washington, working jointly between the Computer Science and Astronomy departments. His research involves applying recent advances in machine learning to large astronomical datasets, in order to learn about the Universe at the largest scales. He is co-author of "Statistics, Data Mining, and Machine Learning in Astronomy", a Python-centric textbook to be published by Princeton Press in 2013, and has presented many technical talks and papers in this subject area.

    In the Python world, Jake is active in maintaining and contributing to several core Python scientific computing packages, including Scikit-learn, Scipy, Matplotlib, and others. He occasionally blogs on python-related topics at jakevdp.github.com.

    What is PyData?
    PyData.org is the home for all things related to the use of Python in data management and analysis. This site aims to make open source data science tools easily accessible by listing the links in one location. If you would like to submit a download link or any items to be listed in PyData News, please let us know at: admin@pydata.org

    Conferences
    PyData conferences are a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply the language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.

    We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.

    A major goal of PyData events and conferences is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.

    PyData is organized by NumFOCUS with the generous help and support of our sponsors. Proceeds from PyData are donated to NumFOCUS and used for the continued development of the open-source tools used by data scientists If you would like to volunteer to be a part of the PyData team contact us at: admin@pydata.org

    # vimeo.com/79820956 Uploaded 1,332 Plays / / 0 Comments Watch in Couch Mode

Data_analysis

Giovanni Mazzocco

Channel reserved to video about data analysis (theory and implementation)

Browse This Channel

Shout Box

Channels are a simple, beautiful way to showcase and watch videos. Browse more Channels. Channels