Note: This page refers to a past version of the course. You can also
consult the current Inf1-DA course web
pages.
Lecturer : |
Alex Simpson |
Alex.Simpson @ ed.ac.uk |
Teaching Assistant: |
Laura Hutchins-Korte |
L.Korte @ sms.ed.ac.uk |
The goal of this module is to provide an introduction to collecting,
representing and interpreting data across the range of
Informatics. Students will learn the different perspectives from which
data is used, the different terminology used when referring to them and
a number of representation and manipulation methods. A small
number of running, illustrative examples wil be presented, from the
perspectives of hypothesis testing and query formation and answering.
After completing the course successfully, students should be able to:
- Demonstrate knowledge of the terminology and paradigms used in
different areas of informatics for collecting, representing and
interpreting data, by being able to apply them to sample problems.
- Demonstrate understanding of the different types of data
(structured/unstructured, observational/experimental,
quantitative/qualitative), by being able to identify the correct type
of data for a given application.
- Demonstrate proficiency of the entity/relationship model by being
able to specify appropriate representations and queries for simple
examples.
- Show awareness of the importance of logic for the representation of
data by being able to design simple logical representation of a given
data set.
- Present data in a variety of forms (textual, graphical, quantitative), across a range of data types.
- Show awareness of the distinction between object data and
meta-data, by being able to apply it to a number of applications across
informatics (e.g., databases, corpora).
- Demonstrate knowledge of the basic algorithms for interpreting and
processing data, by being able to demonstrate how these algorithms work
for simple data sets.
These will be made available as the course progresses.
Slides / Lecture notes
Please display these notes using Acrobat Reader. Other readers might
not work. Printed copies have been handed out in lectures. Spare copies can
be picked up from the shelves outside room 5.03 Appleton Tower.
Structured Data
- Note 1 - The ER data model (note1.pdf)
- Note 2 - The relational model (note2.pdf)
- Note 3 - Relational algebra (note3.pdf)
- Note 4 - Tuple-relational calculus (note4.pdf)
- Note 5 - The SQL query language (note5.pdf) (Correction 3/3/08: bracketed the optional DISTINCT correctly in
the aggregate operators on 5.25.)
Semistructured data
- Note 6 - Semistructured data and XML (note6.pdf)
- Note 7 - Querying XML documents with XQuery (note7.pdf)
- Note 8 - Introduction to corpora (note8.pdf)
- Note 9 - Data acquisition and annotation (note9.pdf)
- Note 10 - Querying a corpus (note10.pdf)
Unstructured data
- Note 11 - Unstructured data and information retrieval
(note11.pdf)
- Note 12 - Statistical analysis of data I
(note12.pdf)
- Note 13 - Statistical analysis of data II
(note13.pdf)
Additional reading
For lecture notes 1-6, the main supplementary text is:
- [DMS]
Database Management Systems
R. Ramakrishnan and J. Gehrke
Third Edition, McGraw-Hill, 2003
Large portions of Chapters 2-4 are available on-line at
Google Books.
Copies of Chapters 4, 5 and a portion of Chapter 7 have been handed out in lectures. Spare copies of the handouts can
be picked up from the shelves outside room 5.03 Appleton Tower.
For lecture note 7, there are useful on-line references:
For lecture notes 8-10, the supplementary text is:
- [CL]
Corpus Linguistics
T. McEnery and A. Wilson
Second edition, Edinburgh University Press, 2001
Copies of Chapter 2 are available from
the shelves outside room 5.03 Appleton Tower.
Lecture log
- Lecture 1 (10/01/08).
Discussed Inf1A and asked for
feedback. Covered 1.1-1.15 of lecture notes.
- Lecture 2 (14/01/08).
Covered 1.16-2.19 of lecture notes.
(Correction to lecture handout: remove row "char(20)," from 2.19.
Corrected on-line.)
- Lecture 3 (17/01/08).
Covered 2.18(recap)-3.4 of lecture notes.
- Lecture 4 (24/01/08).
Finished note 3 (several corrections to handout, all made on-line).
Handed out Chapter 4 of [DMS] as supplementary reading material.
- Lecture 5 (28/01/08).
Covered note 4. (The query on 4.11 has been corrected on-line to
agree with the tree drawn on the same slide.)
- Lecture 6 (31/01/08).
Covered note 5. Handed out Chapter 5 of [DMS] as supplementary reading material.
- Lecture 7 (07/02/08).
Covered note 6. Handed out Sections 7.4.1-2 of [DMS]
as supplementary reading material.
- Lecture 8 (11/02/08).
Covered note 7.
- Lecture 9 (14/02/08).
Covered note 8. Required reading for next week: Chapter 2 of [CL], from start of chapter to end of Section 2.2.1.
- Lecture 10 (25/02/08).
Covered 9.1-9.19.
- Lecture 11 (28/02/08).
Covered 9.20-9.24, and all of note 10. (Correction to note 9: the closing w tags
were omitted from 9.22. Corrected on-line.)
Coursework assignment available from today. See next section on this webpage.
- Lecture 12 (06/03/08).
Covered 11.1-11.25.
- Lecture 13 (10/03/08).
Covered 11.26-28, and 12.1-17. (Copies of note 12 will be handed out in Thursday's lecture.)
- Lecture 14 (13/03/08).
Covered 12.18-24, 13.1-4 and 13.10-23.
(Corrections to note 13:
Value 0.914 changed to -0.914 at bottom of 13.8. Negation sign moved at bottom of
13.12. Data changed in work example to correct data from Dickens corpus.
Other minor changes made. All corrected on-line.)
- Lecture 15 (20/03/08).
Finished note 13. Discussed structure of Data & Analysis Exam, see
slides on "Exam Information" below.
End of course
Coursework assignment
Made available from Thursday 28th February:
The
hand-in deadline was
noon Friday 7th March at the
Informatics Teaching Office
Tutorial and lab exercises
Tutorial exercises
Additional material for drop-in labs
Exam-related material
- Slides with exam information presented in lecture of 20th March
(examinfo.pdf)
- A mock exam in approximately the format of the actual exam
(mock08.pdf)
Old material
For last year's course material see the
2006-7 Data and Analysis webpage.