Note: This page refers to a past version of the course. You can also
consult the current Inf1-DA course web
|| Alex.Simpson @ ed.ac.uk
|| s0450680 @ sms.ed.ac.uk
The goal of this module is to provide an introduction to collecting,
representing and interpreting data across the range of
Informatics. Students will learn the different perspectives from which
data is used, the different terminology used when referring to them and
a number of representation and manipulation methods. A small
number of running, illustrative examples wil be presented, from the
perspectives of hypothesis testing and query formation and answering.
After completing the course successfully, students should be able to:
- Demonstrate knowledge of the terminology and paradigms used in
different areas of informatics for collecting, representing and
interpreting data, by being able to apply them to sample problems.
- Demonstrate understanding of the different types of data
quantitative/qualitative), by being able to identify the correct type
of data for a given application.
- Demonstrate proficiency of the entity/relationship model by being
able to specify appropriate representations and queries for simple
- Show awareness of the importance of logic for the representation of
data by being able to design simple logical representation of a given
- Present data in a variety of forms (textual, graphical, quantitative), across a range of data types.
- Show awareness of the distinction between object data and
meta-data, by being able to apply it to a number of applications across
informatics (e.g., databases, corpora).
- Demonstrate knowledge of the basic algorithms for interpreting and
processing data, by being able to demonstrate how these algorithms work
for simple data sets.
These will be made available as the course progresses.
Slides / Lecture notes
Please display these notes using Acrobat Reader. Other readers might
not work. Printed copies have been handed out in lectures. Spare copies can
be picked up from the shelves outside room 5.03 Appleton Tower.
- Introductory lecture: Overview and logistics (pdf)
- Part I: Structured Data (pdf)
- Part II: Semistructured Data (pdf)
- Part III: Corpora (pdf)
- Part IV: Data Retrieval (pdf)
- Part V: Statistical Analysis of Data
Corrections to slides
The following is a list of corrections to the printed slides
distributed in lectures. The on-line slides have been corrected.
Slide I: 15. Corrected definition of key constraint to
allow "at most one relationship instance".
Slide I: 64. Added mention of naming conflict.
Slide I: 74. Inserted "predicate" in "first-order logic".
Slide I: 90. The most recent SQL standard is 2008.
Slide II: 26. Changed (CDATA) to CDATA (the
brackets are erroneous).
Slide II: 35. Changed (CDATA) to CDATA
Slide II: 61. Inserted child::
Slide II: 63. Inserted child::
Slide III: 35. Changed "explicit in the corpus" to
"explicit in the data itself"
Slide III: 35. Changed "labels for labels for" to "labels for"
Slide IV: 10. Changed "ducuments" to "documents"
For Part I, the main supplementary text is:
Database Management Systems
R. Ramakrishnan and J. Gehrke
Third Edition, McGraw-Hill, 2003
Chapter 2 and much of Chapter 3 are available on-line at
For Part II, the supplementary texts are [DMS] and:
An Introduction to XML and Web Technologies
A. Møller and M. Schwartzbach
Addison Wesley, 2006
XPath tutorial: http://www.w3schools.com/xpath/
For Part III, the supplementary text is:
T. McEnery and A. Wilson
Second edition, Edinburgh University Press, 2001
Chapter 2: What is a corpus and what is in it?
- Lecture 1 (12/01/09).
Handed out course guide. Covered introductory lecture.
- Lecture 2 (15/01/09).
Covered slides I:1-19.
Also introduced ER notation for
key constraints (non-bold arrows) and
particpation constraints (bold lines).
- Lecture 3 (19/01/09).
Covered slides I:20-29. (Also discussed ACM Turing Award - non-examinable.)
- Lecture 4 (26/01/09).
Covered slides I:29-44. Also emphasised that in foreign key constraints
the field names do not need to coincide in the
source and referencing tables.
- Lecture 5 (29/01/09).
Covered slides I:44-63. Also discussed the "cascade", "no action" and
"set null" options of the "on delete" command.
- Lecture 6 (02/02/09).
Covered slides I:64-79. Also discussed avoiding naming conflicts
by disambiguation (via prefix names, and via positional names) as
well as by renaming.
- Lecture 7 (05/02/09).
Covered slides I:80-90.
- Lecture 8 (09/02/09).
Covered slides I:91-114.
- Lecture 9 (12/02/09).
Covered slides II:1-19.
- Lecture 10 (17/02/09).
Covered slides II:20-II:36.
- Lecture 11 (19/02/09).
Covered slides II:37-II:64.
- Lecture 12 (23/02/09).
Covered slides II:62-III:24.
- Lecture 13 (26/02/09).
Covered slides III:25-41. (Also demonstrated British National Corpus.)
- Lecture 14 (02/03/09).
Covered slides III:41-62.
- Lecture 15 (05/03/09).
Covered Part IV. (Also mentioned release of coursework assignment,
and that there will be a staff-student
liason meeting on Friday 13th March.)
- Lecture 16 (09/03/09).
Covered slides V:1-23.
- Lecture 17 (12/03/09).
Covered slides V:24-40. (Also presented example analysis of real data.)
- Lecture 18 (16/03/09).
Covered slides V:41-61. (Also collected data for Tutorials 8 and 9.)
- Lecture 19 (19/03/09).
Revision Lecture. Presented information about exam.
(The slides are on-line under exam-related material below.)
- End of lectures!
- 2009 Data & Analysis coursework assigment
The hand-in deadline was noon Friday 13th March
Informatics Teaching Office
- Tutorial exercise 1 (for Week 3 tutorials):
- Tutorial exercise 2 (for Week 4 tutorials):
- Tutorial exercise 3 (for Week 5 tutorials):
- Tutorial exercise 4 (for Week 6 tutorials):
- Tutorial exercise 5 (for Week 7 tutorials):
(pdf, restaurants.xml, restaurants.xq)
- Tutorial exercise 6 (for Week 8 tutorials):
- Tutorial exercise 8 (for Week 10 tutorials):
- Tutorial exercise 9 (for Week 11 tutorials):
- Slides with exam information presented in lecture of 19th March
- The 2008 mock exam
For last year's course material see the
2007-8 Data and Analysis