Discourse Relation Identification in Biomedical Literature
“Discourse Relation Identification in Biomedical Literature”
Speaker: Rashmi Prasad, University of Wisconsin-Milwaukee, Department of Health Informatics and Administration
Host: Prof. Giuseppe Riccardi
Date: Tuesday, April 10, 2012
Ofek meeting room, Povo 1 (Polo scientifico e tecnologico Fabio Ferrari) Via Sommarive 5 – Povo-Trento, Italy
The desire to retrieve, organize, and extract knowledge from biomedical literature has boosted NLP research in biomedical text mining. For iscourse-level NLP, research efforts have largely focused on sentence-wise labeling of rhetorical functions and coreference. In this talk, I will describe our recent effort towards identification of “discourse relations”, such as causal and contrastive relations between abstract objects (events, facts, situations, etc.). Discourse relations allow for deep and complex inferences about the text and are important for many biomedical text-mining applications, including question answering, information extraction and summarization.
I will describe our work in developing the Biomedical Discourse Relation Bank (BioDRB), a corpus containing annotations of discourse relations (realized as explicit connectives or as implicit) in full-text biomedical articles. I will then talk about some experiments with the BioDRB for automatic discourse relation identification, focusing on identification of the senses of explicit discourse connectives. Since the BioDRB corpus is based on the much larger but news-domain Penn Discourse Treebank corpus, a major goal of these experiments was to explore the feasibility of cross-domain
classification. Our results have shown that in-domain classifiers out-perform the cross-domain classifiers, signaling a need for creating domain-specific annotated data sets for this task. Time-permitting, I will also talk briefly about our work for detecting explicit connectives, since words that function as connectives can have other non-discourse functions as well.
Rashmi Prasad received her PhD in Linguistics from the University of Pennsylvania in 2003, with her dissertation focusing on computational modeling of discourse anaphora. During her PhD, she was also a consultant at AT&T Labs Research, working on dialogue act tagging and sentence plan generation for dialogue systems. Since 2003, she has worked at the University of Pennsylania, developing the Penn Discourse Treebank Corpus, a large scale annotated corpus of discourse relations. Since 2007, her research interests have extended to information extraction in the biomedical and clinical domain, and she has worked as a consultant in both academia (University of Wisconsin-Milwaukee) and industry (Kaiser Permanente HealthConnect), working on projects focusing on part-of-speech tagging and syntactic parsing of clinical text, and discourse parsing of both biomedical literature and clinical text. She is currently an assistant professor of Health Informatics at the University of Wisconsin-Milwaukee.