Main | Lectures | Labs | Projects

IRDS: Lab Session 3: Python, iPython, SciPy, and all that

Python is a great language for developing software quickly, and by combining several large libraries, we can obtain a system for scientific computing that has every bit as much functionality as Matlab or R, but also has a programming language that is not completely crazy.

As you'll see from the list below, there are a large number of libraries that you need to know to do data analysis in Python:

Setting up for DICE

For this lab we will be using Python along with a few open-source libraries (packages). These packages cannot be installed directly, so we will have to create a virtual environment. We are using virtual enviroments to make the installation of packages and retention of correct versions as simple as possible. You can read here if you want to learn about virtual environments, but this is not neccessary for this tutorial.

Now open a terminal and follow these instructions. We are expecting you to enter these commands in one-by-one. Waiting for each command to complete will help catch any unexpected warnings and errors. Please read heed any warnings and errors you may encounter. We are on standby in the labs to help if required.

  1. Change directory to home and create a virtual enviroment

    cd
        

    virtualenv --distribute virtualenvs/irds_env # Creates a virtual environment called iaml_irds
        
  2. Navigate to and activate the virtual enviroment (you will need to activate the virtual environment every time you open a new terminal - this adds the correct python version with all installed packages to your system's $PATH environment variable)

    cd virtualenvs/irds_env
        

    source ./bin/activate # Activates the environment, your shell prompt should now change to reflect you are in the iaml_irds enviornment
        
  3. Install all the python packages we need (once the correct virtual environment is activated, pip install will install packages to the virtual environent - if you're ever unsure which python you are using, type which python in the terminal) WATCH FOR WARNINGS AND ERRORS HERE. We have split these commands up to encourage you to enter them one-by-one.

    pip install -U setuptools # The -U flag upgrades the current version
        

    pip install -U pip
        

    pip install yolk
        

    pip install jupyter
        

    pip install numpy
        

    pip install scipy
        

    pip install matplotlib
        

    pip install pandas
        

    pip install statsmodels
        

    pip install scikit-learn
        

You should now have all the required modules installed. Our next step is to make a new directory where you will keep all the lab notebook. Within your terminal:

  1. Navigate back to your home directory

    cd
        
  2. Make a new directory (e.g. called irds_lab3)

    mkdir irds_lab3
        
  3. Navigate home and ensure the irds_env virtualenv is activated

    cd
        

    source virtualenvs/irds_env/bin/activate # Activates the environment
        
  4. Enter the directory you just created

    cd irds_lab_3
        
  5. Start a jupyter notebook

    jupyter notebook
        
  6. In the first text box, enter this Python code

    import urllib
    urllib.urlretrieve('http://www.inf.ed.ac.uk/teaching/courses/irds/2016-autumn/labs/Lab3.ipynb', 'Lab3.ipynb')
        
  7. This will download the Jupyter notebook that contains the rest of this lab. Now you should be able to find the notebook by clicking File -> Open on the Jupyter notebook menu.

It is probably a good idea to delete the virtual environment once you have finished working on the lab to free up some space:

  1. Navigate back to your home directory

    cd
        
  2. Remove the virtual environment

    rm -rf virtualenvs/irds_env
        

Setting up for personal machine (Windows / OS X / Ubuntu)

If you are using a personal machine, you can choose whether to do as above or use the Anaconda distribution (Python version 2.7, choose the appropriate installer according to your operating system). Anaconda is a standard set of packages used in scientific computing which the Anaconda team curate to keep them consistent. It's also recommended that you set up a virtual environment for this project. This way, if you update anything in your anaconda base install, this virtual environment will remain unchanged. To create a virtual environment called irds, open a Terminal (or Command Prompt window if you are running Windows) and type:

conda create -n irds python=2.7 anaconda
  

Don't forget to activate the virtual environment every time you begin work from a new terminal:

source activate irds
  

Once you have finished installed everything, open a terminal (or Command Prompt in Windows), navigate to the lab folder and type:

jupyter notebook
  

In the first text box, enter this Python code:

import urllib
urllib.urlretrieve('http://www.inf.ed.ac.uk/teaching/courses/irds/2016-autumn/labs/Lab3.ipynb', 'Lab3.ipynb')
  


Home : Teaching : Courses : Irds 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh