Signal & Interactive Systems Lab

Vinciarelli A., Esposito A., Andre’ E., Bonin F., Chetouani M., Cohn F. J., Cristani M., Fuhrmann F., Gilmartin E., Hammal Z., Heylen D., Kaiser R., Koutsombogera M., Potamianos A., Renals S., Riccardi G., Salah A. G.

Open Challenges in Modelling, Analysis and Synthesis of Human Behaviour in Human–Human and Human–Machine Interactions (Article)

Cognitive Computation, pp. 1-17, April 2015, 2015.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing)

Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G.

Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages (Article)

IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011, 2014.

(Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing)

@article{S.2014,
title = {Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages},
author = {Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP10-MultSLU.pdf},
year = {2014},
date = {2014-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011},
abstract = {One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.},
keywords = {Signal Annotation and Interpretation, Speech Processing}
}

Close

Griol D., Callejas Z., Lopez-Cozar R. and Riccardi G.

A Domain-Independent Statistical Methodology for Dialog Management in Spoken Dialog Systems (Article)

Computer Speech and Language, to be published in 2014, 2014.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing)

Dinarelli M., Moschitti A. and Riccardi G.

Discriminative Reranking for Spoken Language Understanding (Article)

IEEE Trans. on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 526-539, 2012, 2012.

(Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing)

De Mori R., Bechet F., Hakkani-Tur D., McTear M., Riccardi G. and Tur G.

Spoken Language Understanding (Article)

IEEE Signal Processing Magazine vol. 25, pp.50-58 ,2008, 2008.

(BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing)

Hakkani-Tur D., Riccardi G. and Tur G.

An Active Approach to spoken Language Processing (Article)

ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006, 2006.

(Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing)

@article{D.2006,
title = {An Active Approach to spoken Language Processing},
author = {Hakkani-Tur D., Riccardi G. and Tur G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/acm-tslp-06.pdf},
year = {2006},
date = {2006-01-01},
journal = {ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006},
abstract = {State of the art data-driven speech and language processing systems require a large amount of human intervention ranging from data annotation to system prototyping. In the traditional supervised passive approach, the system is trained on a given number of annotated data samples and evaluated using a separate test set. Then more data is collected arbitrarily, annotated, and the whole cycle is repeated. In this article, we propose the active approach where the system itself selects its own training data, evaluates itself and re-trains when necessary. We first employ active learning which aims to automatically select the examples that are likely to be the most informative for a given task. We use active learning for both selecting the examples to label and the examples to re-label in order to correct labeling errors. Furthermore, the system automatically evaluates itself using active evaluation to keep track of the unexpected events and decides on-demand to label more examples. The active approach enables dynamic adaptation of spoken language processing systems to unseen or unexpected events for nonstationary input while reducing the manual annotation effort significantly. We have evaluated the active approach with the AT&T spoken dialog system used for customer care applications. In this article, we present our results for both automatic speech recognition and spoken language understanding. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Speech recognition and synthesis; I.5.1 [Pattern Recognition]: Models—Statistical General Terms: Algorithms, Languages, Performance Additional Key Words and Phrases: Passive learning, active learning, adaptive learning, unsupervised learning, active evaluation, spoken language understanding, automatic speech recognition, spoken dialog systems, speech and language processing},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}

Close

Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M.

The AT&T Spoken Language Understanding System (Article)

IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006, 2006.

(Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing)

@article{N.2006,
title = {The AT&T Spoken Language Understanding System},
author = {Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE-SAP-2005-SLU.pdf},
year = {2006},
date = {2006-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006},
abstract = {Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users’ intents for call classification, to combinations of users’ intents and named entities. In this paper, we present the SLU system of VoiceTone ® (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users’ utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}

Close

Hakkani-Tur D., Bechet F., Riccardi G. and Tur G.

Beyond ASR 1-Best: Using Word Confusion Network (Article)

Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006, 2006.

(Abstract | Links | BibTeX | Tags: Language Modeling, Speech Processing)

Riccardi G. and Baggia P.

Spoken Dialog Systems: From Theory to Technology (Article)

Edizione della Normale di Pisa, 2006, 2006.

(BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing)

Potamianos A., Narayanan S. and Riccardi G.

Adaptive Categorical Understanding for Spoken Dialogue Systems' (Article)

Potamianos A., Narayanan S and Riccardi, G., 2005.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing)

Gorin A., Abella A., Alonso T., Riccardi G. and Wright J.

Automated Natural Spoken Dialog (Article)

IEEE Computer, vol. 35, n.4, pp. 51-56, April, 2002 (invited paper), 2002.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing)

Rahim M., Riccardi G., Saul L., Wright J., Buntschuh B. and Gorin A. L.

Robust Numeric Recognition in Spoken Language Dialogue (Article)

Speech Communication, 34, pp. 195-212, 2001, 2001.

(Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing)

Rose R. C., Yao H., Riccardi G. and Wright J. H.

Integration of Utterance Verification with Statistical Language Modeling and Spoken Language Understanding (Article)

Speech Communication, 34, pp. 321-331, 2001, 2001.

(Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing)

Riccardi G. and Gorin A. L.

Spoken language adaptation over time and state in a natural spoken dialog system (Article)

IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000, 2000.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing)

Arai K., Wright J. H., Riccardi G. and Gorin A. L.

Grammar fragment acquisition using syntactic and semantic clustering (Article)

Speech Communication, vol. 27, no. 1, Jan. 1999, 1999.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing)

Gorin A. L., Riccardi G. and Wright J. H.

How may I help you? (Article)

Speech Communication, vol. 23, Oct. 1997, pp. 113-127., 1997.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing)

Riccardi G., Pieraccini R. and Bocchieri E.

Stochastic automata for language modeling (Article)

Computer Speech and Language, vol. 10(4), 1996, pp. 265-293, 1996.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing)

@article{Riccardi1996,
title = {Stochastic automata for language modeling},
author = {Riccardi G., Pieraccini R. and Bocchieri E.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/csl96.pdf},
year = {1996},
date = {1996-01-01},
journal = { Computer Speech and Language, vol. 10(4), 1996, pp. 265-293},
abstract = {Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models to extract the most likely uttered sentence and its correspondent interpretation. The design of the language models has to be effective in order to mostly constrain the search algorithms and has to be efficient to comply with the storage space limits. In this work we present the Variable N-gram Stochastic Automaton (VNSA) language model that provides a unified formalism for building a wide class of language models. First, this approach allows for the use of accurate language models for large vocabulary speech recognition by using the standard search algorithm in the one-pass Viterbi decoder. Second, the unified formalism is an effective approach to incorporate different sources of information for computing the probability of word sequences. Third, the VNSAs are well suited for those applications where speech and language decoding cascades are implemented through weighted rational transductions. The VNSAs have been compared to standard bigram and trigram language models and their reduced set of parameters does not affect by any means the performances in terms of perplexity. The design of a stochastic language model through the VNSA is described and applied to word and phrase class-based language models. The effectiveness of VNSAs has been tested within the Air Travel Information System (ATIS) task to build the language model for th},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}

Close

Bocchieri E., Levin E., Pieraccini R. and Riccardi G.

Understanding spontaneous speech (Article)

J. of the Italian Assoc. of Artificial Intelligence, Sept. 1995, 1995.

(BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing)