Publications | Sislab

Affective Computing Autism Conversational and Interactive Systems Discourse Education Analytics Health Health Analytics Interactive Systems Language Analytics Language Modeling Machine Learning Natural Language Processing Signal Annotation and Interpretation Speech Analytics Speech Processing Statistical Machine Translation

2006
Hakkani-Tur D., Bechet F., Riccardi G. and Tur G. Beyond ASR 1-Best: Using Word Confusion Network (Article) Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006, 2006. (Abstract \| Links \| BibTeX \| Tags: Language Modeling, Speech Processing) @article{D.2006b, title = {Beyond ASR 1-Best: Using Word Confusion Network}, author = {Hakkani-Tur D., Bechet F., Riccardi G. and Tur G.}, url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/CSL-pivot-slu.pdf}, year = {2006}, date = {2006-01-01}, journal = {Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006}, abstract = {We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. Ó 2005 Elsevier Ltd. All rights reserved.}, keywords = {Language Modeling, Speech Processing} } Close We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. Ó 2005 Elsevier Ltd. All rights reserved. Close https://sisl.disi.unitn.it/wp-content/uploads/2014/11/CSL-pivot-slu.pdf Close
2000
Riccardi G. and Gorin A. L. Spoken language adaptation over time and state in a natural spoken dialog system (Article) IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000, 2000. (Abstract \| Links \| BibTeX \| Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) @article{Riccardi2000, title = {Spoken language adaptation over time and state in a natural spoken dialog system}, author = {Riccardi G. and Gorin A. L.}, url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP00-LMAdapt.pdf}, year = {2000}, date = {2000-01-01}, journal = {IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000}, abstract = {We are interested in adaptive spoken dialog systems for automated services. Peoples’ spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialog. Thus, it is crucial to adapt automatic speech recognition (ASR) language models to these varying conditions. We characterize and quantify these variations based on a database of 30K user-transactions with AT&T’s experimental How May I Help You? spoken dialog system. We describe a novel adaptation algorithm for language models with time and dialog-state varying parameters. Our language adaptation framework allows for recognizing and understanding unconstrained speech at each stage of the dialog, enabling context-switching and error recovery. These models have been used to train state-dependent ASR language models. We have evaluated their performance with respect to word accuracy and perplexity over time and dialog states. We have achieved a reduction of 40% in perplexity and of 8.4% in word error rate over the baseline system, averaged across all dialog states.}, keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing} } Close We are interested in adaptive spoken dialog systems for automated services. Peoples’ spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialog. Thus, it is crucial to adapt automatic speech recognition (ASR) language models to these varying conditions. We characterize and quantify these variations based on a database of 30K user-transactions with AT&T’s experimental How May I Help You? spoken dialog system. We describe a novel adaptation algorithm for language models with time and dialog-state varying parameters. Our language adaptation framework allows for recognizing and understanding unconstrained speech at each stage of the dialog, enabling context-switching and error recovery. These models have been used to train state-dependent ASR language models. We have evaluated their performance with respect to word accuracy and perplexity over time and dialog states. We have achieved a reduction of 40% in perplexity and of 8.4% in word error rate over the baseline system, averaged across all dialog states. Close https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP00-LMAdapt.pdf Close
1999
Arai K., Wright J. H., Riccardi G. and Gorin A. L. Grammar fragment acquisition using syntactic and semantic clustering (Article) Speech Communication, vol. 27, no. 1, Jan. 1999, 1999. (Abstract \| Links \| BibTeX \| Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) @article{Arai1999, title = {Grammar fragment acquisition using syntactic and semantic clustering}, author = {Arai K., Wright J. H., Riccardi G. and Gorin A. L.}, url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/fragclustering-speechcomm-19981.pdf}, year = {1999}, date = {1999-01-01}, journal = {Speech Communication, vol. 27, no. 1, Jan. 1999}, abstract = {A new method for automatically acquiring Fragments for understanding ̄uent speech is proposed. The goal of this method is to generate a collection of Fragments, each representing a set of syntactically and semantically similar phrases. First, phrases observed frequently in the training set are selected as candidates. Each candidate phrase has three associated probability distributions: of following contexts, of preceding contexts, and of associated semantic actions. The similarity between candidate phrases is measured by applying the Kullback±Leibler distance to these three probability distributions. Candidate phrases that are close in all three distances are clustered into a Fragment. Salient sequences of these Fragments are then automatically acquired, and exploited by a spoken language understanding module to classify calls in AT&T\'s ``How may I help you?\'\' task. These Fragments allow us to generalize unobserved phrases. For instance, they detected 246 phrases in the test-set that were not present in the training-set. This result shows that unseen phrases can be automatically discovered by our new method. Experimental results show that 2.8% of the improvement in call-type classi®catio}, keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing} } Close A new method for automatically acquiring Fragments for understanding ̄uent speech is proposed. The goal of this method is to generate a collection of Fragments, each representing a set of syntactically and semantically similar phrases. First, phrases observed frequently in the training set are selected as candidates. Each candidate phrase has three associated probability distributions: of following contexts, of preceding contexts, and of associated semantic actions. The similarity between candidate phrases is measured by applying the Kullback±Leibler distance to these three probability distributions. Candidate phrases that are close in all three distances are clustered into a Fragment. Salient sequences of these Fragments are then automatically acquired, and exploited by a spoken language understanding module to classify calls in AT&T's ``How may I help you?'' task. These Fragments allow us to generalize unobserved phrases. For instance, they detected 246 phrases in the test-set that were not present in the training-set. This result shows that unseen phrases can be automatically discovered by our new method. Experimental results show that 2.8% of the improvement in call-type classi®catio Close https://sisl.disi.unitn.it/wp-content/uploads/2014/11/fragclustering-speechcomm-1[...] Close
1996
Riccardi G., Pieraccini R. and Bocchieri E. Stochastic automata for language modeling (Article) Computer Speech and Language, vol. 10(4), 1996, pp. 265-293, 1996. (Abstract \| Links \| BibTeX \| Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) @article{Riccardi1996, title = {Stochastic automata for language modeling}, author = {Riccardi G., Pieraccini R. and Bocchieri E.}, url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/csl96.pdf}, year = {1996}, date = {1996-01-01}, journal = { Computer Speech and Language, vol. 10(4), 1996, pp. 265-293}, abstract = {Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models to extract the most likely uttered sentence and its correspondent interpretation. The design of the language models has to be effective in order to mostly constrain the search algorithms and has to be efficient to comply with the storage space limits. In this work we present the Variable N-gram Stochastic Automaton (VNSA) language model that provides a unified formalism for building a wide class of language models. First, this approach allows for the use of accurate language models for large vocabulary speech recognition by using the standard search algorithm in the one-pass Viterbi decoder. Second, the unified formalism is an effective approach to incorporate different sources of information for computing the probability of word sequences. Third, the VNSAs are well suited for those applications where speech and language decoding cascades are implemented through weighted rational transductions. The VNSAs have been compared to standard bigram and trigram language models and their reduced set of parameters does not affect by any means the performances in terms of perplexity. The design of a stochastic language model through the VNSA is described and applied to word and phrase class-based language models. The effectiveness of VNSAs has been tested within the Air Travel Information System (ATIS) task to build the language model for th}, keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing} } Close Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models to extract the most likely uttered sentence and its correspondent interpretation. The design of the language models has to be effective in order to mostly constrain the search algorithms and has to be efficient to comply with the storage space limits. In this work we present the Variable N-gram Stochastic Automaton (VNSA) language model that provides a unified formalism for building a wide class of language models. First, this approach allows for the use of accurate language models for large vocabulary speech recognition by using the standard search algorithm in the one-pass Viterbi decoder. Second, the unified formalism is an effective approach to incorporate different sources of information for computing the probability of word sequences. Third, the VNSAs are well suited for those applications where speech and language decoding cascades are implemented through weighted rational transductions. The VNSAs have been compared to standard bigram and trigram language models and their reduced set of parameters does not affect by any means the performances in terms of perplexity. The design of a stochastic language model through the VNSA is described and applied to word and phrase class-based language models. The effectiveness of VNSAs has been tested within the Air Travel Information System (ATIS) task to build the language model for th Close https://sisl.disi.unitn.it/wp-content/uploads/2014/11/csl96.pdf Close