Linguistic Knowledge and Word Sense Disambiguation

Dr Tanja Gaustad van Zaanen
Computational Linguist
Appen Pty Ltd
Sydney

(A joint HAIL/SALS-SIG Seminar)

Tuesday 26 April 2005 at 11am

Abstract

In this talk I will present the findings of my PhD research. The main research question I tried to answer in my thesis is which linguistic knowledge sources are most useful for word sense disambiguation (WSD), more specifically word sense disambiguation of Dutch. Therefore, the structure of the thesis - and of this talk - is based on the various levels of linguistic information tested for WSD, including morphology, information on the syntactic class of a particular ambiguous word, and the syntactic structure of the entire sentence containing an ambiguous word.

The goal of the project was to develop a tool which is able to automatically determine the meaning of a particular ambiguous word in context, a so called word sense disambiguation system. I will first introduce the experimental setup of the WSD system, including a brief presentation of the corpus, the classification algorithm used, as well as the "features" (or sources of linguistic knowledge) integrated in the model. In a second step, I will present a novel approach to building a statistical WSD system which includes morphological information in the form of lemmas as the key element. This will be followed by a presentation of various results, highlighting the importance of linguistic information in the WSD system presented.

In this statistical WSD system, especially the addition of deep linguistic knowledge greatly improves disambiguation accuracy. In combination with an approach taking advantage of morphological information, the best results for WSD of Dutch (on the Senseval-2 data set) are obtained. My system achieves significantly higher disambiguation accuracy than any results for Dutch that have been reported in the literature up to now and is thus state-of-the-art for Dutch WSD.

Short resume

After an MA in French Philology, General Linguistics and Computer Science at the University of Basel, Switzerland, I moved to Groningen, the Netherlands, in April 2000 to start a PhD in Computational Linguistics under the supervision of Dr. Gertjan van Noord and Prof. John Nerbonne. On November 1st 2004, I publicly defended my thesis and was awarded the degree of Dr. During my period in Groningen, I have also worked as a researcher on a joint project between the University of Groningen and industrial partners investigating automatic e-mail classification and as a lecturer. Since coming to Australia in November 2004, I have been a guest researcher at the Language Technology Group at Macquarie University, Sydney, and I am now working as a computational linguist for Appen Pty Ltd in Sydney.

Back to HAIL Home Page