Python is a great language for developing software quickly, and by combining several large libraries, we can obtain a system for scientific computing that has every bit as much functionality as Matlab or R, but also has a programming language that is not completely crazy.
As you'll see from the list below, there are a large number of libraries that you need to know to do data analysis in Python:
First, Dive Into Python is a book about the programming language. Here is a quick reference guide for Python
iPython provides the interactive environment and notebook functionality that we will use
This iPython notebook provides a reference about the iPython notebook functionality. The short version is: If you are in command mode, i.e., not currently entering text, then type "h" for a list of keyboard shortcuts.
sklearn is a machine learning toolkit for Python
scipy is a collection of a large number of libraries for scientific computing, which covers a lot of Matlab functionality, including...
numpy (part of SciPy but can be used separately) provides matrix operations. If you know Matlab, here is an Numpy Cheat Sheet for Matlab users
matplotlib is a plotting library
The SciPy Cookbook is a great resource for the above.
pandas provides Python with the goodness of R data frames. We won't use it on this lab because it would repeat ideas that you've see for R, but if you use Python long-term, then this is very much worth checking out.
You will need to ensure that scikit-learn is installed on your machine. It might not be available on the DICE machines. You can test this by calling import scikit from a Python prompt. If you do not receive an error, then it is installed.
To install scikit-learn into your home directory (it does not require root access), do
pip install --user -U scikit-learn
The rest of the instructions for this lab will be contained in an iPython notebook. To start the ipython notebook, first go to a command line and run
ipython notebook
This command will not terminate, but instead will start a web server at an address like http://localhost:8888/. It should open up a browser automatically and send it to that URL, but if not, you can enter it manually.
Now, in your browser, create a blank iPython notebook. Although this looks like a fancy schmancy web page, really this is just an interactive Python interpreter, but that uses the HTML to present both your command text and the results. In the first text box, enter this Python code:
import urllib urllib.urlretrieve('http://www.inf.ed.ac.uk/teaching/courses/irds/2014-autumn/labs/Lab3.ipynb', 'Lab3.ipynb')
This will download the iPython notebook that contains the rest of this lab. Now you should be able to find the notebook by clicking File -> Open on the iPython notebook menu.
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |