The Text Attribution Tool: author profiling for English emails

Dr Dominique Estival and Dr Tanja Gaustad van Zaanen
Appen Pty Ltd

Tuesday 6th November 2007 at 11am

 

Abstract

The Text Attribution Tool (TAT) aims at automating the analysis of texts for the purpose of author profiling and identification. It provides probabilities for the author's basic demographic traits (gender, age, geographic origin, level of education and native language) as well as for five psychometric traits. The TAT has been developed for the purpose of language-independent author profiling and has now been trained on two email corpora, English and Arabic. In this talk, we will describe the email data which was collected for the project, the ways this data is processed and analysed, and the experimental setup used for classification with the TAT. We will describe the overall TAT system and the Machine Learning experiments resulting in classifiers for the different author traits before presenting our results for the demographic and psychometric traits using the English email data. Results are very promising for all ten traits examined.

Short resume

Tanja Gaustad van Zaanen is a Computational Linguist at Appen in Sydney. After an MA in French Philology, General Linguistics and Computer Science at the University of Basel, Switzerland, obtained in July 1999, she moved to Groningen, the Netherlands, to pursue a PhD in Computational Linguistics under the supervision of Dr. Gertjan van Noord and Prof. John Nerbonne. On November 1st 2004, she publicly defended her thesis and was awarded the degree of Dr. During her appointment in Groningen, she has also worked as a researcher on a joint project between the University of Groningen and industrial partners investigating automatic e-mail classification and as a lecturer. Since coming to Australia in November 2004, she has been a guest researcher at the Language Technology Group at Macquarie University, Sydney. At Appen, she is currently working on a variety of R&D projects.

Dominique Estival is a Senior Manager (Projects and Research) at Appen in Sydney. Prior to that appointment in early 2006, she was a Senior Research Scientist at DSTO (2002-2006). After receiving her PhD in linguistics from the University of Pennsylvania in 1986, she started working as a computational linguist in industry: first in a machine translation company (Weidner, Chicago, USA; 1986-88) and then at Wang Laboratories (Boston, USA; 1988-89). She was a researcher at ISSCO (Geneva, Switzerland, 1989-1995) before coming to Australia to take up the position of lecturer in Computational Linguistics at the University of Melbourne (1995-1998). She then joined Syrinx Speech Systems in 1999 to head the Natural Language Processing group and lead the NLP R&D project to develop a natural language telephone dialogue system. Her research interests have included the investigation of the computational modelling of language change, machine translation, grammar formalisms, grammar development and linguistic engineering, and spoken dialogue systems. At DSTO, she continued to work on spoken dialogue systems and machine translation tools and on other applications, such as document classification and multi-media interaction. At Appen, she oversees the research in language processing and has been leading the project on Text Attribution.

Back to HAIL Home Page