Note: This page refers to a past version of the course. You can also
consult the current Inf1-DA course web
|| Alex.Simpson @ ed.ac.uk
|| Areti Manataki
|| A.Manataki @ sms.ed.ac.uk
The goal of this module is to provide an introduction to collecting,
representing and interpreting data across the range of
Informatics. Students will learn the different perspectives from which
data is used, the different terminology used when referring to them and
a number of representation and manipulation methods. A small
number of running, illustrative examples wil be presented, from the
perspectives of hypothesis testing and query formation and answering.
After completing the course successfully, students should be able to:
- Demonstrate knowledge of the terminology and paradigms used in
different areas of informatics for collecting, representing and
interpreting data, by being able to apply them to sample problems.
- Demonstrate understanding of the different types of data
quantitative/qualitative), by being able to identify the correct type
of data for a given application.
- Demonstrate proficiency of the entity/relationship model by being
able to specify appropriate representations and queries for simple
- Show awareness of the importance of logic for the representation of
data by being able to design simple logical representation of a given
- Present data in a variety of forms (textual, graphical, quantitative), across a range of data types.
- Show awareness of the distinction between object data and
meta-data, by being able to apply it to a number of applications across
informatics (e.g., databases, corpora).
- Demonstrate knowledge of the basic algorithms for interpreting and
processing data, by being able to demonstrate how these algorithms work
for simple data sets.
You must read the
Informatics 1 Semester 2 Course Guide 2009-10
These will be made available as the course progresses.
Slides / Lecture notes
Please display these notes using Acrobat Reader. Other readers might
not work. Printed copies have been handed out in lectures. Spare copies can
be picked up from the shelves outside room 5.03 Appleton Tower.
- Introductory lecture: Overview and logistics
- Part I: Structured Data (pdf)
- Part II: Semistructured Data (pdf)
- Part III: Unstructured Data (pdf)
- Revision lecture: Answering the why-question
Corrections to slides
The following is a list of corrections to the printed slides
distributed in lectures. The on-line slides have been corrected.
Slide III: 47. Corrected mu to m (twice),
since the variance and standard deviation are estimated
using the estimated mean, not the poplulation mean
(which is not a value we have available).
Slide III: 62. Replaced second occurrence of 0.875 with 0.834.
Slide III: 64. Replaced second occurrence of 8.288 with
Slide III: 82. Corrected "Part III" to "Part II"
Slide III: 84. Corrected "III.3" to "II.5"
Slides III: 87-88. Rewritten.
Revision Slide: 23. Removed the "not examinable" comment at Corpora,
since the material on corpora is examinable, but CQP is not.
For Part I, the main supplementary text is:
Database Management Systems
R. Ramakrishnan and J. Gehrke
Third Edition, McGraw-Hill, 2003
Chapter 2 and much of Chapter 3 are available on-line at
For Part II, the supplementary texts on XML are [DMS] and:
An Introduction to XML and Web Technologies
A. Møller and M. Schwartzbach
Addison Wesley, 2006
XPath tutorial: http://www.w3schools.com/xpath/
The supplementary text on Corpora is:
T. McEnery and A. Wilson
Second edition, Edinburgh University Press, 2001
Chapter 2: What is a corpus and what is in it?
For part III (lecture III.3 and Tutorial 8):
Table of critical values for Pearson correlation
Lectures are held Tuesdays 11.10-12 and Fridays 2-2.50 in AT LT5, from
Tuesday 12th January 2010 to Friday 19th March 2010.
(In the lecture slot of Tuesday 26th January there is a special
careers lecture in place of the regular DA lecture.)
- Lecture 1 (12/01/10).
Covered introductory lecture. (Also collected data for use later in course.)
- Lecture on 15/01/10 cancelled.
- Lecture 2 (19/01/10).
Covered slides I:1-19.
Also introduced ER notation for
key constraints (non-bold arrows) and
particpation constraints (bold lines).
- Lecture 3 (22/01/10).
Covered slides I:20-34. Also emphasised that in foreign key constraints
the field names do not need to coincide in the
source and referencing tables.
- Special careers lecture on 26/01/10.
- Lecture 4 (29/01/10).
Covered slides I:35-48. Also discussed the "cascade", "no action" and
"set null" options of the "on delete" command.
- Lecture 5 (02/02/10).
Covered slides I:49-67. Also discussed avoiding naming conflicts
by disambiguation (via prefix names, and via positional names) as
well as by renaming.
- Lecture 6 (05/02/10).
Covered slides I:68-85.
- Lecture 7 (09/02/10).
Covered slides I:84-107.
- Lecture 8 (12/02/10).
Covered slides I:106-II:12.
- Lecture 9 (16/02/10).
Covered slides II:13-33.
- Lecture 10 (19/02/10).
Covered slides II:34-62.
- Lecture 11 (23/02/10).
Covered slides II:60-85.
- Lecture 12 (26/02/10).
Covered slides II:86-110.
- Lecture 13 (02/03/10).
Covered slides II:109-III:13.
(Also mentioned that there will be a staff-student
liason meeting on Wednesday 3rd March.)
- Lecture 14 (05/03/10).
Covered slides III:14-34. (Also mentioned release of DA coursework assignment.)
- Lecture 15 (09/03/10).
Covered slides III:35-49.
- Lecture 16 (12/03/10).
Covered slides III:50-66.
- Lecture 17 (16/03/10).
Covered slides III:67-88. End of course material!
- Lecture 18 (19/03/10).
Revision Lecture by Areti Manataki. Presented information about exam
(slides under "Exam-related Material" below.)
Also gave survey of course material (slides under
"Slides/Lecture Notes" above.)
- End of lectures!
Videos of the lectures given so far are available
(Sorry about the absence of sound on some videos.
Apparently, I forgot to turn the microphone on.)
- 2010 Data & Analysis coursework assigment
This is the May 2009 Data & Analysis Exam.
The hand-in deadline was noon Friday 12th March at the
Informatics Teaching Office.
Tutorial exercises will appear here.
- Tutorial exercise 1 (for Week 3 tutorials):
- Tutorial exercise 2 (for Week 4 tutorials):
- Tutorial exercise 3 (for Week 5 tutorials):
- Tutorial exercise 4 (for Week 6 tutorials):
- Tutorial exercise 5 (for Week 7 tutorials):
(pdf, restaurants.xml, restaurants.xq)
- Tutorial exercise 6 (for Week 8 tutorials):
- Tutorial exercise 7 (for Week 9 tutorials):
- Tutorial exercise 8 (for Week 10 tutorials):
- Slides with exam information presented in lecture of 19th March
- The 2008 mock exam (pdf)
For last year's course material see the
2008-9 Data and Analysis