ANLP 2016


Lecture 28: Discourse, coherence, cohesion

Henry S. Thompson
With input from Johanna Moore and Bonnie Webber
21 November 2016
Creative CommonsAttributionShare Alike

1. "If we do not hang together

then surely we must hang separately" (Benjamin Franklin)

Not just any collection of sentences makes a discourse.

The difference?

Cohesion
The (linguistic) clues that sentences belong to the same discourse
Coherence
The underlying (semantic) way in which it makes sense that they belong together

2. Linking together

Cohesive discourse often uses lexical chains

Longer texts usually contain several discourse segments

Intuition: When the topic shifts, different words will be used

But, the presence of cohesion does not guarantee coherence

  • John found some firm ripe apples and dropped them in a wooden bucket filled with water
  • Newton is said to have discovered gravity when hit on the head by an apple that dropped from a tree.

There are four lexical chains in the above mini-discourse, indicated by the words in red.

3. Automatically identifying sub-topics/segmenting discourse

Discourse-level NLP can sometimes profit from working with coherent sub-discourses

There are several alternative approaches available:

Useful for

4. Finding discontinuities: TextTiling

An unsupervised approach based on lexical chains

Originally developed and tested using a corpus of scientific papers

Three steps:

  1. Preprocess: tokenise, filter and partition
  2. Score: pairwise cohesion
  3. Locate: threshhold discontinuities

5. TextTiling: Preprocessing

In order to focus on what is assumed to matter

Moderately aggressive preprocessing is done:

6. TextTiling: Scoring

Compute a score for the gap between each adjacent pair of token sequences, as follows

  1. Merge blocks of k pseudo-sentences on either side of the gap to a bag of words
    • That is, a vector of counts
    • With one position for every 'word' in the whole text
    • Hearst used k=6
  2. Compute the normalised dot product of the two vectors
    • The cosine distance
  3. Smooth the resulting score sequence by averaging the scores in a window of width w
    • Hearst used w=3
    • That is, for a distance yi Hearst used yi=yi-1+yi+yi+13 for the smoothed distance

7. TextTiling: Locate

We're looking for discontinuities

That is, something like this:

score graph fragment showing valley around y[i]

The depth score (s) at each gap is then given by s=(yi-1-yi)+(yi+1-yi)

Larger depth scores correspond to deeper 'valleys'

Scores larger than some threshhold are taken to mark topic boundaries

Liberal
s¯-σ
Conservative
s¯-σ2

8. Evaluating segmentation

How well does TextTiling work?

Treating this as a two-way forced-choice classification task

And scoring every gap as correctly or incorrectly classified doesn't work

But counting just correctly labelled boundary gaps seems too strict

9. Evaluation, cont'd

The WindowDiff metric, which counts only misses (incorrect classifications) within a window attempts to address both problems

Specifically, to compare boundaries in a gold standard reference (Ref) with those in a hypothesis (Hyp):

We will slide a window of size k over Hyp and Ref comparing the number of boundaries in each

Then we compare the boundary counts for each possible window position

Sum for all possible window positions, and normalise by the number of such positions: 1N-ki=1N-k|ri-hi|0

10. Evaluation example

An example from J&M with

The block of wood always guessing "no" would score 823-4=0.42

Note that this approach to evaluation is appropriate for any segmentation task where the ratio of candidate segmentation points to actual segments is high

11. Machine learning?

More recently, (semi-)supervised machine learning approaches to uncovering topic structure have been explored

Over-simplifying, you can think of the problem as similar to POS-tagging

So you can even use Hidden Markov Models to learn and label:

But now the distribution governs the whole space of (substantive) lexical choice within a topic

See Purver, M. 2011, "Topic Segmentation", in Tur, G. and de Mori, R. Spoken Language Understanding for a more detailed introduction

12. Topic is not the only dimension of discourse change

Topic/sub-topic is not the only structuring principle we find in discourse

Some common patterns, by genre

Expository
Topic/sub-topic
Task-oriented
Function/precondition
Narrative
Cause/effect, sequence/sub-sequence, state/event

But note that some of this is not necessarily universal

Cohesion sometimes manifests itself differently for different genres

13. Functional Segmentation

Texts within a given genre

generally share a similar structure, independent of topic

That is, their structure

14. Example: news stories

The conventional structure is so 'obvious' that you hardly notice it

In decreasing order of importance

15. Example: Scientific journal papers

Individual disciplines typically report on experiments in highly conventionalised ways

Front matter
Title, Abstract
Body
    • Introduction (or Objective), including background
    • Methods
    • Results
    • Discussion
  • (or, mnemonically, IMRAD)
Back matter
Acknowledgements, References

The major divisions (IMRAD) will usually be typographically distinct and explicitly labelled

16. Richer structure

Discourse structure is not (always) just ODTAA

Sometimes detecting this structure really matters

  • Welcome to word processingi
    • Thati’s using a computer to type letters and reports
    • Make a typoj?
      • No problem
      • Just back up, type over the mistakej, and itj’s gone
      • *And, itj eliminates retyping
    • And, iti eliminates retyping