Sentence Augmentation: A Text-to-Text Generation Component for Summarisation
Supervisors: Dr. Robert Dale, Dr. Mark Dras, Dr. Cécile Paris
This thesis was
presented for the degree of
Doctor of Philosophy
at the
Centre for Language Technology
Department of Computing
Faculty of Science
Macquarie University
NSW2109 Australia
Submitted December 2009; Completed June 2010
Abstract:
An examination of a corpus of manually authored executive summaries suggests a predominant strategy that human writers appear to adopt: key sentences, which form the core of the summary, are fleshed out, or augmented, using information from additional sentences in the document, referred to here as auxiliary sentences. In this thesis, we focus on developing methods that will enable a computational account of this strategy, which we describe as the Sentence Augmentation process.
We model sentence augmentation as a text-to-text generation process in which a novel sentence is produced as a result of re-organising content from a key sentence in conjunction with information from auxiliary sentences. As in related work on text-to-text revision, we characterise sentence augmentation as a Noisy Channel problem.
In particular, we concentrate on two key facets of the process for which no suitable account yet exists in the literature: first, auxiliary content must be selected to be added into the sentences being generated; and second, the key and auxiliary content must be organised such that the result is a grammatical sentence. Our investigation of these two facets of leads to the following three findings:
1. A model of content selection: Information from within auxiliary sentences can be automatically chosen to support key information using schema-like patterns, represented as a statistical model that captures the prototypical juxtaposition of words. We show that the automatically derived schema-based model better predicts content selection compared with baseline vector space approaches using term frequency weights.
2. A representation for content re-organisation: Summary sentences generated using representations of dependency structures better reflects the content of the input text and are more grammatical, compared to sentences generated using just representations of the Markov context. Dependency structure thus provides a suitable representation for re-organising selected content in language modelling tasks.
3. An account of grammaticality: Spanning tree algorithms can be combined with statistical dependency models to induce an ordering of selected content, allowing issues of grammaticality in English to be handled in a statistical text-to-text generation process. The spanning tree approach, which provides a global sentence-level representation of linguistic validity, is able to generate more grammatical sentences than n-gram models.
The thesis describes our corpus-based investigation of the sentence augmentation process. Our development and evaluation of models for the core facets of sentence augmentation provides the missing pieces for the automation of the process as a whole. In doing so, we contribute to our general understanding of text-to-text generation approaches.
Thesis-related Published Papers:
StephenWan, Robert Dale,Mark Dras, and Cécile Paris. (2003)
Straight to the Point: Discovering Themes for Summary Generation. In the
Proceedings of the Australian Workshop on Natural Language Processing,
pages 122-129. Melbourne, Australia.
Stephen Wan, Mark Dras, Robert Dale, and C´ecile
Paris. (2003) Using Thematic
Information in Statistical Headline Generation. In the Proceedings of
the Workshop on Multilingual Summarization and Question Answering at ACL
2003, pages 11-20. Sapporo, Japan.
StephenWan, Robert Dale,Mark Dras, and Cécile Paris. (2005)
Searching for Grammaticality and Consistency: Propagating Dependencies in
the Viterbi Algorithm. In the Proceedings of the European Workshop on
Natural Language Generation 2005, pages 211-216. Aberdeen, Scotland.
Stephen Wan, Robert Dale, Mark Dras and Cécile Paris (2005)
Statistically Generated Summary Sentences: A Preliminary Evaluation of
Verisimilitude using Precision of Dependency Relations. In the
Proceedings of the Workshop on Using Corpora for Natural Language Generation
(UCNLG'05) at Corpus Linguistics 2005. Birmingham, UK.
Stephen Wan, Mark Dras, Robert Dale, and Cécile Paris.
(2005) Towards Statistical Paraphrase
Generation: Preliminary Evaluations of Grammaticality. In the
Proceedings of The 3rd International Workshop on Paraphrasing (IWP2005),
pages 88-95. Jeju Island, South Korea.
Stephen Wan, Mark Dras, Robert Dale, and Céecile Paris.
(2006) Using
Dependency-based Features to take the Para-farce out of Paraphrase. In
the Proceedings of the Australasian Language Technology Workshop 2006,
pages 131-138. Sydney, Australia
Stephen Wan, Robert Dale, Mark Dras, and Cécile Paris.
(2007)
Global Revision in Summarisation: Generating Novel Sentences with Prim’s
Algorithm. In the Proceedings of PACLING 2007 - 10th Conference of
the Pacific Association for Computational Linguistic, pages 852-860.
Melbourne, Australia.
Stephen Wan, Robert Dale, Mark Dras, and écile Paris. (2008)
Seed and Grow:
Augmenting Statistically Generated Summary Sentences using Schematic Word
Patterns. In the Proceedings of the 2008 Conference on Empirical
Methods in Natural Language Processing, pages 543-552. Hawaii, USA.
Stephen Wan, Mark Dras, Robert Dale, and Cécile Paris.
(2009)
Improving Grammaticality in Statistical Sentence Generation: Introducing a
Dependency Spanning Tree Algorithm with an Argument Satisfaction Model.
In the Proceedings of the 12th Conference of the European Chapter of the
ACL (EACL 2009), pages 852-860. Athens, Greece.
Stephen Wan, Mark Dras, Robert Dale, Cécile Paris (2010) Spanning Tree Approaches for Statistical Sentence Generation. In Krahmer, E., Theune, M., eds.: Empirical Methods in Natural Language Generation. Volume 5980 of Lecture Notes in Computer Science. Springer, Berlin/Heidelberg
Primary Funding:
Awarded an APA but accepted the Research Award For Areas and Centres of
Excellence (RAACE) at Macquarie University
CSIRO CMIS Top-Up Scholarship