Using Linguistically Motivated Features in Document Retrieval for Question AnsweringLuiz Augusto Pizzato Tuesday 10th March 2009 at 11am
AbstractIn this talk I will present the work conducted for my thesis where I investigated the impact of using linguistic features in the Information Retrieval (IR) stage of Question Answering (QA) systems. We hypothesise that techniques that are commonly used in the final answer extraction stage can improve the overall results of a QA system when adopted in the earlier IR stage. In particular, we study the use of the following information: i) named entities in a pseudo-relevance feedback process; and ii) semantic relations between words of questions and text sentences. The study of the use of named entities is inspired by the common practice of filtering out sentences that do not contain the expected answer type. We consequently introduce a pseudo-relevance feedback that inserts entities of the correct answer type in the original query. Our experiments show that this technique leads to a query drift and the final results do not improve with respect to a query without additional feedback. To study the use of relational information, we design an IR framework that is more efficient (in both speed and memory consumption) than standard approaches based on relational databases and on the concatenation of word pairs at the indexing stage. The resulting framework allows a multi-layer index that uses an extension to the standard vector space model as a ranking strategy. The resulting ranking strategy improves precision, without compromising the overall recall, by including linguistic word relations. We present the Question Prediction Language Model (QPLM), a model of relational information that borrows concepts from Semantic Role Labelling (SRL) but is designed for the fast generation of annotation and its use for indexing and retrieval. The results are of quality comparable to SRL and indicate that linguistic information encoded in the form of semantic relations does enhance the retrieval quality of text and the final accuracy of QA systems. Short resumeLuiz received his Bachelor of Computer Science in 2000 from the Pontifical Catholic University of Porto Alegre (PUCRS), Brazil. In the same year, he joined the Hewlett Packard/PUCRS Research Centre in High Performance Computing. In 2003, Luiz received a Master of Computer Science from PUCRS for his thesis involving information retrieval (IR) and thesauri information. During his Masters degree he also developed the Folha-RIcol corpus, which has been used by different researchers to evaluate IR systems for the Brazilian Portuguese language. In 2003, Luiz integrated his Masters research with the SINO search engine to enable the online search of legal decisions made by the Portuguese Attorney General. In 2008 at Macquarie University, Luiz submitted his PhD thesis which focused on using language information in the IR stages of the question answering task. Luiz has also produced over a dozen peer-reviewed publications. |