Signal & Interactive Systems Lab

In Wu, Gennari, Huang, Xie and Cao Y. (eds) Emerging Technologies for Education, Lecture Notes in Computer Science, vol 10108, pp. 635-644, 2017, 2017.

(Abstract | Links | BibTeX | Tags: Interactive Systems)

Stepanov A. E., Chowdhury A. S., Bayer A. O., Ghosh A., Klasinas I., Calvo M., Sanchis E. and Riccardi G.

Cross-Language Transfer of Semantic Annotation via Targeted Crowdsourcing: Task Design and Evaluation (Article)

Language Resources and Evaluation, https://doi.org/10.1007/s10579-017-9396-5 , Springer, 2017, 2017.

(Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation)

@article{E.2017,
title = {Cross-Language Transfer of Semantic Annotation via Targeted Crowdsourcing: Task Design and Evaluation},
author = {Stepanov A. E., Chowdhury A. S., Bayer A. O., Ghosh A., Klasinas I., Calvo M., Sanchis E. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2017/10/10.1007s10579-017-9396-5.pdf},
year = {2017},
date = {2017-01-01},
journal = {Language Resources and Evaluation, https://doi.org/10.1007/s10579-017-9396-5 , Springer, 2017},
abstract = {Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.
},
keywords = {Signal Annotation and Interpretation}
}

Close

Celli F., Ghosh A., Alam F. and Riccardi G.

In the mood for Sharing Contents: Emotions, personality and interaction styles in the diffusion of news (Article)

Information Processing and Management, Nov 2015, 2015.

(Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation)

Mogessie M, Riccardi G. and Ronchetti M.

Predicting Students’ Final Exam Scores from their Course Activities (Article)

Proc. IEEE Frontiers in Education, El Paso ( USA), 2015., 2015.

(Abstract | Links | BibTeX | Tags: Education Analytics, Machine Learning)

Vinciarelli A., Esposito A., Andre’ E., Bonin F., Chetouani M., Cohn F. J., Cristani M., Fuhrmann F., Gilmartin E., Hammal Z., Heylen D., Kaiser R., Koutsombogera M., Potamianos A., Renals S., Riccardi G., Salah A. G.

Open Challenges in Modelling, Analysis and Synthesis of Human Behaviour in Human–Human and Human–Machine Interactions (Article)

Cognitive Computation, pp. 1-17, April 2015, 2015.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing)

Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G.

Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages (Article)

IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011, 2014.

(Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing)

@article{S.2014,
title = {Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages},
author = {Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP10-MultSLU.pdf},
year = {2014},
date = {2014-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011},
abstract = {One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.},
keywords = {Signal Annotation and Interpretation, Speech Processing}
}

Close

Griol D., Callejas Z., Lopez-Cozar R. and Riccardi G.

A Domain-Independent Statistical Methodology for Dialog Management in Spoken Dialog Systems (Article)

Computer Speech and Language, to be published in 2014, 2014.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing)

Dinarelli M., Moschitti A. and Riccardi G.

Discriminative Reranking for Spoken Language Understanding (Article)

IEEE Trans. on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 526-539, 2012, 2012.

(Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing)

De Mori R., Bechet F., Hakkani-Tur D., McTear M., Riccardi G. and Tur G.

Spoken Language Understanding (Article)

IEEE Signal Processing Magazine vol. 25, pp.50-58 ,2008, 2008.

(BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing)

Riccardi G. and Baggia P.

Spoken Dialog Systems: From Theory to Technology (Article)

Edizione della Normale di Pisa, 2006, 2006.

(BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing)

Hakkani-Tur D., Riccardi G. and Tur G.

An Active Approach to spoken Language Processing (Article)

ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006, 2006.

(Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing)

@article{D.2006,
title = {An Active Approach to spoken Language Processing},
author = {Hakkani-Tur D., Riccardi G. and Tur G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/acm-tslp-06.pdf},
year = {2006},
date = {2006-01-01},
journal = {ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006},
abstract = {State of the art data-driven speech and language processing systems require a large amount of human intervention ranging from data annotation to system prototyping. In the traditional supervised passive approach, the system is trained on a given number of annotated data samples and evaluated using a separate test set. Then more data is collected arbitrarily, annotated, and the whole cycle is repeated. In this article, we propose the active approach where the system itself selects its own training data, evaluates itself and re-trains when necessary. We first employ active learning which aims to automatically select the examples that are likely to be the most informative for a given task. We use active learning for both selecting the examples to label and the examples to re-label in order to correct labeling errors. Furthermore, the system automatically evaluates itself using active evaluation to keep track of the unexpected events and decides on-demand to label more examples. The active approach enables dynamic adaptation of spoken language processing systems to unseen or unexpected events for nonstationary input while reducing the manual annotation effort significantly. We have evaluated the active approach with the AT&T spoken dialog system used for customer care applications. In this article, we present our results for both automatic speech recognition and spoken language understanding. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Speech recognition and synthesis; I.5.1 [Pattern Recognition]: Models—Statistical General Terms: Algorithms, Languages, Performance Additional Key Words and Phrases: Passive learning, active learning, adaptive learning, unsupervised learning, active evaluation, spoken language understanding, automatic speech recognition, spoken dialog systems, speech and language processing},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}

Close

Hakkani-Tur D., Bechet F., Riccardi G. and Tur G.

Beyond ASR 1-Best: Using Word Confusion Network (Article)

Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006, 2006.

(Abstract | Links | BibTeX | Tags: Language Modeling, Speech Processing)

Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M.

The AT&T Spoken Language Understanding System (Article)

IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006, 2006.

(Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing)

@article{N.2006,
title = {The AT&T Spoken Language Understanding System},
author = {Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE-SAP-2005-SLU.pdf},
year = {2006},
date = {2006-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006},
abstract = {Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users’ intents for call classification, to combinations of users’ intents and named entities. In this paper, we present the SLU system of VoiceTone ® (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users’ utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}

Close

Riccardi G. and Hakkani-Tur D.

Grounding Emotions in Human-Machine Conversational Systems (Article)

Lecture Notes in Computer Science, Springer-Verlag, , pp. 144 – 154, 2005, 2005.

(Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems )

Riccardi G. and Hakkani-Tur D.

Active Learning: Theory and Applications to Automatic Speech Recognition (Article)

IEEE Trans. on Speech and Audio, vol. 13, n.4 , pp. 504-511, 2005, 2005.

(Abstract | Links | BibTeX | Tags: Machine Learning)

Potamianos A., Narayanan S. and Riccardi G.

Adaptive Categorical Understanding for Spoken Dialogue Systems' (Article)

Potamianos A., Narayanan S and Riccardi, G., 2005.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing)

Bangalore S. and Riccardi G.

Stochastic Finite-State Models for Spoken Language Machine Translation (Article)

Machine Translation , vol 17, n. 3, pp. 165-184, 2002 (Invited paper), 2002.

(Abstract | Links | BibTeX | Tags: Statistical Machine Translation)

Gorin A., Abella A., Alonso T., Riccardi G. and Wright J.

Automated Natural Spoken Dialog (Article)

IEEE Computer, vol. 35, n.4, pp. 51-56, April, 2002 (invited paper), 2002.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing)

Rahim M., Riccardi G., Saul L., Wright J., Buntschuh B. and Gorin A. L.

Robust Numeric Recognition in Spoken Language Dialogue (Article)

Speech Communication, 34, pp. 195-212, 2001, 2001.

(Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing)

Rose R. C., Yao H., Riccardi G. and Wright J. H.

Integration of Utterance Verification with Statistical Language Modeling and Spoken Language Understanding (Article)

Speech Communication, 34, pp. 321-331, 2001, 2001.

(Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing)

Riccardi G. and Gorin A. L.

Spoken language adaptation over time and state in a natural spoken dialog system (Article)

IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000, 2000.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing)

Arai K., Wright J. H., Riccardi G. and Gorin A. L.

Grammar fragment acquisition using syntactic and semantic clustering (Article)

Speech Communication, vol. 27, no. 1, Jan. 1999, 1999.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing)

Gorin A. L., Riccardi G. and Wright J. H.

How may I help you? (Article)

Speech Communication, vol. 23, Oct. 1997, pp. 113-127., 1997.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing)

Riccardi G., Pieraccini R. and Bocchieri E.

Stochastic automata for language modeling (Article)

Computer Speech and Language, vol. 10(4), 1996, pp. 265-293, 1996.

(Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing)

@article{Riccardi1996,
title = {Stochastic automata for language modeling},
author = {Riccardi G., Pieraccini R. and Bocchieri E.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/csl96.pdf},
year = {1996},
date = {1996-01-01},
journal = { Computer Speech and Language, vol. 10(4), 1996, pp. 265-293},
abstract = {Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models to extract the most likely uttered sentence and its correspondent interpretation. The design of the language models has to be effective in order to mostly constrain the search algorithms and has to be efficient to comply with the storage space limits. In this work we present the Variable N-gram Stochastic Automaton (VNSA) language model that provides a unified formalism for building a wide class of language models. First, this approach allows for the use of accurate language models for large vocabulary speech recognition by using the standard search algorithm in the one-pass Viterbi decoder. Second, the unified formalism is an effective approach to incorporate different sources of information for computing the probability of word sequences. Third, the VNSAs are well suited for those applications where speech and language decoding cascades are implemented through weighted rational transductions. The VNSAs have been compared to standard bigram and trigram language models and their reduced set of parameters does not affect by any means the performances in terms of perplexity. The design of a stochastic language model through the VNSA is described and applied to word and phrase class-based language models. The effectiveness of VNSAs has been tested within the Air Travel Information System (ATIS) task to build the language model for th},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}

Close

Bocchieri E., Levin E., Pieraccini R. and Riccardi G.

Understanding spontaneous speech (Article)

J. of the Italian Assoc. of Artificial Intelligence, Sept. 1995, 1995.

(BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing)