Singla K., Stepanov A. E., Bayer A. O., Riccardi G. and Carenini G. Automatic Community Creation for Abstractive Spoken Summarization (Conference) EMNLP 2017 Workshop on New Frontiers in Summarization, Copenhagen, 2017, 2017. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Chowdhury A. S., Stepanov E. A., Danieli M.. and Riccardi G. Functions of Silences towards Information Flow in Spoken Conversation (Conference) EMNLP 2017 Workshop on Speech-Centric Natural Language Processing, Copenhagen, 2017, 2017. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Bayer A. O., Stepanov A. E. and Riccardi G. Towards End-to-End Spoken Dialogue Systems (Proceeding) Proc. INTERSPEECH , Stockholm, 2017, 2017. (Abstract | Links | BibTeX | Tags: Interactive Systems, Speech Processing) Chowdhury A. S. and Riccardi G. A Deep Learning Approach To Modeling Competitiveness In Spoken Conversations (Conference) Proc. ICASSP, New Orleans, 2017, 2017. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Cervone A., Tortoreto G., Mezza S., Gambi E. and Riccardi G Roving Mind: a balancing act between open–domain and engaging dialogue systems (Conference) 2017. (Links | BibTeX | Tags: Conversational and Interactive Systems , Interactive Systems, Machine Learning, Natural Language Processing, Speech Processing) Riccardi G., Stepanov A. E. and Chowdhury S. Discourse Connective Detection in Spoken Conversations (Proceeding) Proc. ICASSP, Shanghai, 2016, 2016. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing, Speech Processing) Chowdhury S. , Stepanov A. E. and Riccardi G. Predicting User Satisfaction from Turn-Taking in Spoken Conversations (Proceeding) Proc. INTERSPEECH, San Francisco, 2016., 2016. (Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems , Discourse, Interactive Systems, Signal Annotation and Interpretation, Speech Processing) Stepanov E., Favre B., Alam F., Chowdhury S., Singla K., Trione J., Bechet F. and Riccardi G. Automatic Summarization of Call-Center Conversations (Conference) 2015. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Danieli M. , Riccardi G. and Alam F. Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations (Conference) 2015. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Bayer A. O. and Riccardi G. Deep Semantic Encodings for Language Modeling (Conference) 2015. (Abstract | Links | BibTeX | Tags: Language Modeling, Signal Annotation and Interpretation, Speech Processing) Vinciarelli A., Esposito A., Andre’ E., Bonin F., Chetouani M., Cohn F. J., Cristani M., Fuhrmann F., Gilmartin E., Hammal Z., Heylen D., Kaiser R., Koutsombogera M., Potamianos A., Renals S., Riccardi G., Salah A. G. Cognitive Computation, pp. 1-17, April 2015, 2015. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Bayer A. O. and Riccardi G. Semantic Language Models for Automatic Speech Recognition (Conference) 2014. (Abstract | Links | BibTeX | Tags: Language Modeling, Natural Language Processing, Speech Processing) Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G. Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages (Article) IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011, 2014. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing) Stepanov E., Riccardi G. and Bayer A. O. The Development of the Multilingual LUNA Corpus for Spoken Language System Porting (Conference) 2014. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing, Statistical Machine Translation) Griol D., Callejas Z., Lopez-Cozar R. and Riccardi G. A Domain-Independent Statistical Methodology for Dialog Management in Spoken Dialog Systems (Article) Computer Speech and Language, to be published in 2014, 2014. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Bayer A. O. and Riccardi G. On-line Adaptation of Semantic Models for Spoken Language Understanding (Conference) 2013. (Abstract | Links | BibTeX | Tags: Machine Learning, Speech Processing) Bayer A. O. and Riccardi G. Instance-Based On-Line Language Model Adaptation (Conference) 2013. (Abstract | Links | BibTeX | Tags: Machine Learning, Speech Processing) Stepanov E. and Riccardi G. Comparative Evaluation of Argument Extraction Algorithms in Discourse Relation Parsing (Conference) 2013. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Alam F. and Riccardi G. Comparative Study of Speaker Personality Traits Recognition in Conversational and Broadcast News Speech (Conference) 2013. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Bayer A. O. and Riccardi G. Joint Language Models for Automatic Speech Recognition and Understanding (Conference) 2012. (Abstract | Links | BibTeX | Tags: Machine Learning, Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Discriminative Reranking for Spoken Language Understanding (Article) IEEE Trans. on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 526-539, 2012, 2012. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing) Garcia F., Hurtado L. F., Segarra E., Sanchis E. and Riccardi G. Combining Machine Translation Systems for Spoken Language Understanding Portability (Conference) 2012. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing, Statistical Machine Translation) Quarteroni S., Ivanov A. V. and Riccardi G. Simultaneous Dialog Act Segmentation and Classification from Human-Human Spoken Conversations (Conference) 2011. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Signal Annotation and Interpretation, Speech Processing) Quarteroni S., Gonzalez M., Riccardi G. and Varges S. Combining User Intention and Error Modeling for Statistical Dialog Simulators (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Ivanov A. V., Riccardi G., Ghosh S.,Tonelli S. and Stepanov E. Acoustic Correlates of Meaning Structure in Conversational Speech (Conference) 2010. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Quarteroni S. and Riccardi G. Classifying Dialog Acts in Human-Human and Human-Machine Spoken Conversations (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Varges S., Quarteroni S., Riccardi G. and Ivanov A. V. Investigating Clarification Strategies in a Hybrid POMDP Dialog Manager (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Gonzalez M., Quarteroni S., Riccardi G. and Varges S. Cooperative User Models in Statistical Dialog Simulators (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Dinarelli M.,Stepanov E.,Varges S. and Riccardi G. The LUNA Spoken Dialogue System: Beyond Utterance Classification (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Griol D., Riccardi G. and Sanchis E. A Statistical Dialog Manager for the LUNA Project (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Varges S., Riccardi G., Quarteroni S., Ivanov A. V. The Exploration/Exploitation Trade-Off in Reinforcement Learning for Dialogue Management (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P. Leveraging POMDPs trained with User Simulations and Rule-Based Dialog Management in a SDS (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Griol D., Riccardi G. and Sanchis E. Learning the Structure of Human-Computer and Human-Human Spoken Conversations (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Sporka A. J., Jakub F. and Riccardi G. Can Machines Call People?- User Experience While Answering Telephone Calls Initiated by Machine (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P. On-Line Strategy Computation in Spoken Dialog Systems (Conference) 2009. (Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Riccardi G., Mosca N., Roberti P. and Baggia P. The Voice Multimodal Application Framework (Conference) 2009. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Riccardi G., Baggia P. and Roberti P. Spoken Dialog Systems: From Theory to Technology (Conference) 2009. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Quarteroni S., Dinarelli M. and Riccardi G. Ontology-Based Grounding of Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Quarteroni S., Riccardi G. and Dinarelli M. What's in an Ontology for Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Concept Segmentation and Labeling for Conversational Speech (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Re-Ranking Models Based on Small Training Data for Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Re-Ranking Models For Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Sebastian V., Riccardi G. and Quarteroni S. Persistent Information State in a Data-Centric Architecture (Conference) 2008. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) De Mori R., Bechet F., Hakkani-Tur D., McTear M., Riccardi G. and Tur G. Spoken Language Understanding (Article) IEEE Signal Processing Magazine vol. 25, pp.50-58 ,2008, 2008. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Dinarelli M., Moschitti A., Riccardi G. Joint Generative And Discriminative Models For Spoken Language Understanding (Conference) 2008. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Raymond C. and Riccardi G. Learning with Noisy Supervision for Spoken Language Understanding (Conference) 2008. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Varges S. and Riccardi G. A Data-Centric Architecture for Data-Driven Spoken Dialog Systems (Conference) 2007. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Moschitti A., Riccardi G. and Raymond C. Spoken Language Understanding with Kernels for Syntactic/Semantic Structures (Conference) 2007. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Raymond C., Riccardi G. Generative and Discriminative Algorithms for Spoken Language Understanding (Conference) 2007. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Riccardi G. and Baggia P. Spoken Dialog Systems: From Theory to Technology (Article) Edizione della Normale di Pisa, 2006, 2006. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Hakkani-Tur D., Riccardi G. and Tur G. An Active Approach to spoken Language Processing (Article) ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006, 2006. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M. The AT&T Spoken Language Understanding System (Article) IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006, 2006. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Coppola B., Moschitti A., Riccardi G. Shallow Semantic Parsing for Spoken Language Understanding (Conference) 2006. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Hakkani-Tur D., Bechet F., Riccardi G. and Tur G. Beyond ASR 1-Best: Using Word Confusion Network (Article) Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006, 2006. (Abstract | Links | BibTeX | Tags: Language Modeling, Speech Processing) Liscombe J., Riccardi G. and Hakkani-Tur D. Using Context to Improve Emotion Detection in Spoken Dialog Systems (Conference) 2005. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Goffin V., Allauzen C., Bocchieri E., Hakkani-Tur D., Ljolje A., Parthasarathy S., Rahim M., Riccardi G. and Saraclar M. The AT&T WATSON Speech Recognizer (Conference) 2005. (BibTeX | Tags: Speech Processing) Potamianos A., Narayanan S. and Riccardi G. Adaptive Categorical Understanding for Spoken Dialogue Systems' (Article) Potamianos A., Narayanan S and Riccardi, G., 2005. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Hakkani-Tur D., Tur G., Riccardi G. and Kim H. K. Error Prediction in Spoken Dialog: from Signal-to-Noise Ratio to Semantic Confidence Scores (Conference) 2005. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Bechet F., Riccardi G. and Hakkani-Tur D. Mining Spoken dialogue Corpora for system Evaluation and Modeling (Conference) 2005. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Karahan M., Hakkani-Tur D., Riccardi G. and Tur G. Combining Classifiers for Spoken Language Understanding (Conference) 2005. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Hakkani-Tur D., Tur G., Rahim M. and Riccardi G. Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification (Conference) 2004. (BibTeX | Tags: Language Modeling, Machine Learning, Speech Processing) Bechet F., Riccardi G. and Hakkani-Tur D. Multi-channel Sentence Classification for Spoken Dialogue Modeling (Conference) 2003. (BibTeX | Tags: Machine Learning, Speech Processing) Hakkani-Tur D. and Riccardi G. A General Algorithm for Word Graph Decomposition (Conference) 2003. (BibTeX | Tags: Speech Processing) Falavigna D., Gretter R. and Riccardi G. Acoustic and Word Lattice Based Algorithms for Confidence Scores (Conference) 2002. (BibTeX | Tags: Speech Processing) Fabbrizio G., Dutton D., Gupta N., Hollister B., Rahim M., Riccardi G., Schapire R. and Schroeter J. AT&T Help Desk (Conference) 2002. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Gorin A., Abella A., Alonso T., Riccardi G. and Wright J. Automated Natural Spoken Dialog (Article) IEEE Computer, vol. 35, n.4, pp. 51-56, April, 2002 (invited paper), 2002. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Gokhan T., Wright J., Gorin A., Riccardi G., Tur H. Improving Spoken Language Understanding Using Word Confusion Networks (Conference) 2002. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Rochery M., Schapire R., Rahim M., Gupta G., Riccardi G., Bangalore S., Alshawi H. and Douglas S. Combining Prior Knowledge and Boosting for Call Classification in Spoken Language Dialogue (Conference) 2002. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Rahim M., Riccardi G., Saul L., Wright J., Buntschuh B. and Gorin A. L. Robust Numeric Recognition in Spoken Language Dialogue (Article) Speech Communication, 34, pp. 195-212, 2001, 2001. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Gretter R. and Riccardi G. On-line learning of language models with word error probability distributions (Conference) 2001. (BibTeX | Tags: Machine Learning, Speech Processing) Rose R. C., Yao H., Riccardi G. and Wright J. H. Speech Communication, 34, pp. 321-331, 2001, 2001. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Kamm C., Narayanan S. A Spoken Dialog System for Conference/Workshop Services (Conference) 2000. (BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Petrovska-Delacretaz D., Gorin A. L., Riccardi G. and Wright J. H. Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding (Conference) 2000. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Gorin A. L., Wright J. H., Riccardi G., Abella A. and Alonso T. Semantic information processing of spoken language (Conference) 2000. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Riccardi G. and Gorin A. L. Spoken language adaptation over time and state in a natural spoken dialog system (Article) IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000, 2000. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Lin C., Kamm C. W99- A Spoken Dialog System for the ASRU99 Workshop (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Riccardi G., Bangalore S. and Sarin P. Learning head-dependency relations from unannotated corpora (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Conkie A., Riccardi G. and Rose R. C. Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events (Conference) 1999. (BibTeX | Tags: Machine Learning, Speech Processing) Potamianos A., Riccardi G. and Narayanan S. Categorical understanding using statistical N-gram models (Conference) 1999. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Rose R. C. and Riccardi G. Automatic speech recognition using acoustic confidence conditioned language models (Conference) 1999. (BibTeX | Tags: Speech Processing) Gorin A. L. and Riccardi G. Spoken language variation over time and state in a natural spoken dialog system (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rose R. C. and Riccardi G. Modeling dysfluency and background events in ASR for a natural language understanding task (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Arai K., Wright J. H., Riccardi G. and Gorin A. L. Grammar fragment acquisition using syntactic and semantic clustering (Article) Speech Communication, vol. 27, no. 1, Jan. 1999, 1999. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rahim M., Riccardi G., Wright J., Buntschuh B. and Gorin A. Robust automatic speech recognition in a natural spoken dialog (Conference) 1999. (BibTeX | Tags: Language Modeling, Speech Processing) Riccardi G., Potamianos A. and Narayanan S. Language model adaptation for spoken dialog systems (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Riccardi G. and Gorin A. L. Stochastic language models for speech recognition and understanding (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rose R. C., Yao H., Riccardi G. and Wright J. Integration of utterance verification with statistical language modeling and spoken language understanding (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Riccardi G. and Bangalore S. Automatic acquisition of phrase grammars for stochastic language modeling (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Arai K., Wright J., Riccardi G. and Gorin A. Grammar fragment acquisition using syntactic and semantic clustering,'' Proc. Workshop Spoken Language Understanding & Communication (Conference) 1997. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Gorin A. L., Riccardi G. and Wright J. H. How may I help you? (Article) Speech Communication, vol. 23, Oct. 1997, pp. 113-127., 1997. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Riccardi G., Gorin A. L., Ljolje A. and Riley M. A spoken language system for automated call routing (Conference) 1997. (BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Rose R. C., Yao H., Riccardi G. and Wright J. Integrating multiple knowledge sources for utterance verification in a large vocabulary speech understanding system (Conference) 1997. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Wright J. H., Gorin A. L. and Riccardi G. Automatic acquisition of salient grammar fragments for call-type classification (Conference) 1997. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Riccardi G., Pieraccini R. and Bocchieri E. Stochastic automata for language modeling (Article) Computer Speech and Language, vol. 10(4), 1996, pp. 265-293, 1996. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Bocchieri E. and Riccardi G. State tying of triphone HMM's for the 1994 AT&T ARPA ATIS recognizer (Conference) 1995. (BibTeX | Tags: Speech Processing) Bocchieri E., Levin E., Pieraccini R. and Riccardi G. Understanding spontaneous speech (Article) J. of the Italian Assoc. of Artificial Intelligence, Sept. 1995, 1995. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Riccardi G., Bocchieri E. and Pieraccini R. Non deterministic stochastic language models for speech recognition (Conference) 1995. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Mumolo E., Rebelli A. and Riccardi G. Improved multipulse algorithm for speech coding by means of adaptive Boltzmann annealing (Article) European Transactions on Telecommunications, vol. 5, no. 6, Nov. 1994, 1994. (BibTeX | Tags: Speech Processing) Menardi P., Mian G. A. and Riccardi G. Dynamic bit allocation in subband coding of wideband audio with multipulse (Conference) 1994. (BibTeX | Tags: Speech Processing) Mian G. A. and Riccardi G. A localization property of line spectrum pairs (Article) IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 536-539, Oct. 1994, 1994. (Links | BibTeX | Tags: Speech Processing) Bocchieri E. and Riccardi G. The 1993 AT&T ATIS system (Conference) 1994. (BibTeX | Tags: Language Modeling, Signal Annotation and Interpretation, Speech Processing) Riccardi G. and Mian G. A. Analysis-by-synthesis algorithms for low bitrate coding (Conference) 1993. (BibTeX | Tags: Speech Processing) Bocchieri E. and Riccardi G. Use of the forward-backward search for large vocabulary recognition with continuous observation density HMM's (Conference) 1993. (BibTeX | Tags: Speech Processing) Fratti M., Mian G. A. and Riccardi G. An approach to parameter reoptimization in multipulse based coders (Article) IEEE Trans. Speech & Audio Proc., vol. 1, no. 4, pp. 463-465, Oct. 1993, 1993. (Links | BibTeX | Tags: Speech Processing) Fratti M., Mian G. A. and Riccardi G. On the effectiveness of parameter reoptimization in multipulse based coders (Conference) 1992. (BibTeX | Tags: Speech Processing)2017
title = {Automatic Community Creation for Abstractive Spoken Summarization},
author = {Singla K., Stepanov A. E., Bayer A. O., Riccardi G. and Carenini G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_NewSum_Singla_etal.pdf},
year = {2017},
date = {2017-01-01},
publisher = {EMNLP 2017 Workshop on New Frontiers in Summarization, Copenhagen, 2017},
abstract = {Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation using cosine similarity on different levels of representation: raw text, WordNet SynSet IDs, and word embeddings. We show that the abstractive summarization systems with automatic communities significantly outperform previously published results on both English and Italian corpora.},
keywords = {Natural Language Processing, Speech Processing}
}
title = {Functions of Silences towards Information Flow in Spoken Conversation},
author = {Chowdhury A. S., Stepanov E. A., Danieli M.. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_SCNLP_Chowdhury_etal.pdf},
year = {2017},
date = {2017-01-01},
publisher = {EMNLP 2017 Workshop on Speech-Centric Natural Language Processing, Copenhagen, 2017},
abstract = {Silence is an integral part of the most frequent turn-taking phenomena in spoken conversations. Silence is sized and placed within the conversation flow and it is coordinated by the speakers along with the other speech acts. The objective of this analytical study is twofold: to explore the functions of silence with duration of one second and above, towards information flow in a dyadic conversation utilizing the sequences of dialog acts present in the turns surrounding the silence itself; and to design a feature space useful for clustering the silences using a hierarchical concept formation algorithm. The resulting clusters are manually grouped into functional categories based on their similarities. It is observed that the silence plays an important role in response preparation, also can indicate speakers’ hesitation or indecisiveness. It is also observed that sometimes long silences can be used deliberately to get a forced response from another speaker thus making silence a multi-functional and an important catalyst towards information flow.
},
keywords = {Affective Computing, Speech Processing}
}
title = {Towards End-to-End Spoken Dialogue Systems},
author = {Bayer A. O., Stepanov A. E. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_IS_Bayer_etal.pdf},
year = {2017},
date = {2017-01-01},
publisher = {Proc. INTERSPEECH , Stockholm, 2017},
abstract = {Training task-oriented dialogue systems requires significant amount of manual effort and integration of many independently built components; moreover, the pipeline is prone to errorpropagation. End-to-end training has been proposed to overcome these problems by training the whole system over the utterances of both dialogue parties. In this paper we present an end-to-end spoken dialogue system architecture that is based on turn embeddings. Turn embeddings encode a robust representation of user turns with a local dialogue history and they are trained using sequence-to-sequence models. Turn embeddings are trained by generating the previous and the next turns of the dialogue and additionally perform spoken language understanding. The end-to-end spoken dialogue system is trained using the pre-trained turn embeddings in a stateful architecture that considers the whole dialogue history. We observe that the proposed spoken dialogue system architecture outperforms the models based on local-only dialogue history and it is robust to automatic speech recognition errors.},
keywords = {Interactive Systems, Speech Processing}
}
title = {A Deep Learning Approach To Modeling Competitiveness In Spoken Conversations},
author = {Chowdhury A. S. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_ICASSP_Chowdhury_Riccardi.pdf},
year = {2017},
date = {2017-01-01},
publisher = {Proc. ICASSP, New Orleans, 2017},
abstract = {The motivation behind the research on overlapping speech has always been dominated by the need to model humanmachine interaction for dialog systems and conversation analysis. To have more complex insights of the interlocutors’ intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocutor’s intention to grab the floor. This act could be a competitive or non-competitive act, which either signals a problem or indicates assistance in communication. In this paper, we present a Deep Learning approach to modeling competitiveness in overlapping speech using acoustic and lexical features and their combination. We compare a fully-connected feed-forward neural network to the Support Vector Machine (SVM) models on real call center human-human conversations. We have observed that feature combination with DNN (significantly) outperforms SVM models, both the individual feature baselines and the feature combination model by 4% and 2% respectively.},
keywords = {Affective Computing, Speech Processing}
}
title = {Roving Mind: a balancing act between open–domain and engaging dialogue systems},
author = {Cervone A., Tortoreto G., Mezza S., Gambi E. and Riccardi G},
editor = {1st Alexa Prize Conference, Las Vegas},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2019/11/AMZ17Conf-RovingMIndPaper.pdf},
year = {2017},
date = {2017-01-01},
keywords = {Conversational and Interactive Systems , Interactive Systems, Machine Learning, Natural Language Processing, Speech Processing}
}
2016
title = {Discourse Connective Detection in Spoken Conversations},
author = {Riccardi G., Stepanov A. E. and Chowdhury S.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2016/11/ICASSP16-DiscourseConnective.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. ICASSP, Shanghai, 2016},
abstract = {Discourse parsing is an important task in Language Understanding with applications to human-human and human-machine communication modeling. However, most of the research has focused on written text, and parsers heavily rely on syntactic parsers that themselves have low performance on dialog data. In our work, we address the problem of analyzing the semantic relations between discourse units in human-human spoken conversations. In particular, in this paper we focus on the detection of discourse connectives which are the predicate of such relations. The discourse relations are drawn from the Penn Discourse Treebank annotation model and adapted to a domain-specific Italian human-human spoken conversations. We study the relevance of lexical and acoustic context in predicting discourse connectives. We observe that both lexical and acoustic context have mixed effect on the prediction of specific connectives. While the oracle of using lexical and acoustic contextual feature combinations is F1 = 68.53, the lexical context alone significantly outperforms the baseline by more than 10 points with F1 = 64.93.},
keywords = {Discourse, Natural Language Processing, Speech Processing}
}
title = {Predicting User Satisfaction from Turn-Taking in Spoken Conversations},
author = {Chowdhury S. , Stepanov A. E. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2016/11/IS16-PredictingUserSatisfactionTurnTaking.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. INTERSPEECH, San Francisco, 2016.},
abstract = {User satisfaction is an important aspect of the user experience while interacting with objects, systems or people. Traditionally user satisfaction is evaluated a-posteriori via spoken or written questionnaires or interviews. In automatic behavioral analysis we aim at measuring the user emotional states and its descriptions as they unfold during the interaction. In our approach, user satisfaction is modeled as the final state of a sequence of emotional states and given ternary values positive, negative, neutral. In this paper, we investigate the discriminating power of turn-taking in predicting user satisfaction in spoken conversations. Turn-taking is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing. In this paper, we train different characterization of turn-taking, such as competitiveness of the speech overlaps. To extract turn-taking features we design a turn segmentation and labeling system that incorporates lexical and acoustic information. Given a human-human spoken dialog, our system automatically infers any of the three values of the state of the user satisfaction. We evaluate the classification system on real-life call-center human-human dialogs. The comparative performance analysis shows that the contribution of the turn-taking features outperforms both prosodic and lexical features.},
keywords = {Affective Computing, Conversational and Interactive Systems , Discourse, Interactive Systems, Signal Annotation and Interpretation, Speech Processing}
}
2015
title = {Automatic Summarization of Call-Center Conversations},
author = {Stepanov E., Favre B., Alam F., Chowdhury S., Singla K., Trione J., Bechet F. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2015/11/ASRU15-SpeechSummarizationDemo.pdf},
year = {2015},
date = {2015-12-13},
journal = {IEEE ASRU, Scottsdale, 2015. ( Demo )},
abstract = {This paper presents the SENSEI approach to automatic summarization which represents spoken conversation in terms of factual descriptors and abstractive synopses that are useful for quality assurance supervision in call centers. We demonstrate a browser-based graphical system that automatically produces these summary descriptors and synopses.
Index Terms— Summarization, Speech Processing},
keywords = {Natural Language Processing, Speech Processing}
}
Index Terms— Summarization, Speech Processing
title = {Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations},
author = {Danieli M. , Riccardi G. and Alam F.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2015/11/ICMI15-ERM4CT.pdf},
year = {2015},
date = {2015-11-09},
journal = {Proc. ICMI Workshop on Representations and Modelling for Companion Systems, Seattle, 2015},
abstract = {The manifestation of human emotions evolves over time and space. Most of the work on affective computing research is limited to the association of context-free signal segments, such as utterances and images, to basic emotions. In this paper, we discuss the hypothesis that interpreting emotions
requires a conceptual description of their dynamics within the context of their manifestations. We describe the unfolding of emotions through the proposed affective scene framework.
Affective scenes are defined in terms of who first expresses the variation in their emotional state in a conversation, how this affects the other speaker’s emotional appraisal and response, and which modifications occur from the initial through the final state of the scene. This conceptual framework is applied and evaluated on real human-human conversations drawn from call centers. We show that the automatic classification of affective scenes achieves more than satisfactory results and it benefits from acoustic, lexical and psycholinguistic features of the speech and linguistics signals.},
keywords = {Affective Computing, Speech Processing}
}
requires a conceptual description of their dynamics within the context of their manifestations. We describe the unfolding of emotions through the proposed affective scene framework.
Affective scenes are defined in terms of who first expresses the variation in their emotional state in a conversation, how this affects the other speaker’s emotional appraisal and response, and which modifications occur from the initial through the final state of the scene. This conceptual framework is applied and evaluated on real human-human conversations drawn from call centers. We show that the automatic classification of affective scenes achieves more than satisfactory results and it benefits from acoustic, lexical and psycholinguistic features of the speech and linguistics signals.
title = {Deep Semantic Encodings for Language Modeling},
author = {Bayer A. O. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2015/11/IS15-SELMAutoEncoding.pdf},
year = {2015},
date = {2015-09-06},
journal = {Proc. INTERSPEECH , Dresden, 2015},
abstract = {Word error rate (WER) is not an appropriate metric for spoken language systems (SLS) because lower WER does not necessarily yield better understanding performance. Therefore, language models (LMs) that are used in SLS should be trained to jointly optimize transcription and understanding performance. Semantic LMs (SELMs) are based on the theory of frame semantics and incorporate features of frames and meaning bearing words (target words) as semantic context when training LMs.
The performance of SELMs is affected by the errors on the ASR and the semantic parser output. In this paper we address the problem of coping with such noise in the training phase of the neural network-based architecture of LMs. We propose the use of deep autoencoders for the encoding of semantic context while accounting for ASR errors. We investigate the optimization of SELMs both for transcription and understanding by using deep semantic encodings. Deep semantic encodings
suppress the noise introduced by the ASR module, and enable SELMs to be optimized adequately. We assess the understanding performance by measuring the errors made on target words and we achieve 3.7% relative improvement over recurrent neural network LMs.
Index Terms: Language Modeling, Semantic Language Models, Recurrent Neural Networks, Deep Autoencoders},
keywords = {Language Modeling, Signal Annotation and Interpretation, Speech Processing}
}
The performance of SELMs is affected by the errors on the ASR and the semantic parser output. In this paper we address the problem of coping with such noise in the training phase of the neural network-based architecture of LMs. We propose the use of deep autoencoders for the encoding of semantic context while accounting for ASR errors. We investigate the optimization of SELMs both for transcription and understanding by using deep semantic encodings. Deep semantic encodings
suppress the noise introduced by the ASR module, and enable SELMs to be optimized adequately. We assess the understanding performance by measuring the errors made on target words and we achieve 3.7% relative improvement over recurrent neural network LMs.
Index Terms: Language Modeling, Semantic Language Models, Recurrent Neural Networks, Deep Autoencoders
title = {Open Challenges in Modelling, Analysis and Synthesis of Human Behaviour in Human–Human and Human–Machine Interactions},
author = {Vinciarelli A., Esposito A., Andre’ E., Bonin F., Chetouani M., Cohn F. J., Cristani M., Fuhrmann F., Gilmartin E., Hammal Z., Heylen D., Kaiser R., Koutsombogera M., Potamianos A., Renals S., Riccardi G., Salah A. G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2015/11/CogniComp15-ChallengesHHHM-Review.pdf},
year = {2015},
date = {2015-04-01},
journal = {Cognitive Computation, pp. 1-17, April 2015},
abstract = {Modelling, analysis and synthesis of behaviour are the subject of major efforts in computing science,
especially when it comes to technologies that make sense of human–human and human–machine interactions. This article outlines some of the most important issues that still need to be addressed to ensure substantial progress in the field, namely (1) development and adoption of virtuous data collection and sharing practices, (2) shift in the focus of interest from individuals to dyads and groups, (3) endowment of artificial agents with internal representations of users and context, (4) modelling of cognitive and semantic processes underlying social behaviour and (5) identification of application domains and strategies for moving from laboratory to the real-world products.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
especially when it comes to technologies that make sense of human–human and human–machine interactions. This article outlines some of the most important issues that still need to be addressed to ensure substantial progress in the field, namely (1) development and adoption of virtuous data collection and sharing practices, (2) shift in the focus of interest from individuals to dyads and groups, (3) endowment of artificial agents with internal representations of users and context, (4) modelling of cognitive and semantic processes underlying social behaviour and (5) identification of application domains and strategies for moving from laboratory to the real-world products.2014
title = {Semantic Language Models for Automatic Speech Recognition},
author = {Bayer A. O. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT14-SemanticSLM.pdf},
year = {2014},
date = {2014-10-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Lake Tahoe, 2014},
abstract = {We are interested in the problem of semantics-aware train- ing of language models (LMs) for Automatic Speech Recog- nition (ASR). Traditional language modeling research have ignored semantic constraints and focused on limited size his- tories of words. Semantic structures may provide information to capture lexically realized long-range dependencies as well as the linguistic scene of a speech utterance. In this paper, we present a novel semantic LM (SELM) that is based on the the- ory of frame semantics. Frame semantics analyzes meaning of words by considering their role in the semantic frames they occur and by considering their syntactic properties. We show that by integrating semantic frames and target words into re- current neural network LMs we can gain significant improve- ments in perplexity and word error rates. We have evaluated the semantic LM on the publicly available ASR baselines on the Wall Street Journal (WSJ) corpus. SELMs achieve 50% and 64% relative reduction in perplexity compared to n-gram models by using frames and target words respectively. In ad- dition, 12% and 7% relative improvements in word error rates are achieved by SELMs on the Nov’92 and},
keywords = {Language Modeling, Natural Language Processing, Speech Processing}
}
title = {Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages},
author = {Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP10-MultSLU.pdf},
year = {2014},
date = {2014-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011},
abstract = {One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.},
keywords = {Signal Annotation and Interpretation, Speech Processing}
}
title = {The Development of the Multilingual LUNA Corpus for Spoken Language System Porting},
author = {Stepanov E., Riccardi G. and Bayer A. O.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/LREC14-MultilingualLUNACorpusPorting.pdf},
year = {2014},
date = {2014-01-01},
journal = {LREC , Reykjavik, 2014},
abstract = {The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we address the problem of creating multilingual aligned corpora and its evaluation in the context of a spoken language understanding (SLU) porting task. We discuss the challenges of the manual creation of multilingual corpora, as well as present the algorithms for the creation of multilingual SLU via Statistical Machine Translation (SMT).},
keywords = {Natural Language Processing, Speech Processing, Statistical Machine Translation}
}
title = {A Domain-Independent Statistical Methodology for Dialog Management in Spoken Dialog Systems},
author = {Griol D., Callejas Z., Lopez-Cozar R. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/CSL14-StatisticalDialogueManager1.pdf},
year = {2014},
date = {2014-01-01},
journal = {Computer Speech and Language, to be published in 2014},
abstract = {This paper proposes a domain-independent statistical methodology to develop dialog managers for spoken dialog systems. Our methodology employs a data-driven classification procedure to generate abstract representations of system turns taking into account the previous history of the dialog. A statistical framework is also introduced for the development and evaluation of dialog systems created using the methodology, which is based on a dialog simulation technique. The benefits and flexibility of the proposed methodology have been validated by developing statistical dialog managers for four spoken dialog systems of different complexity, designed for different languages (English, Italian, and Spanish) and application domains (from transactional to problem-solving tasks). The evaluation results show that the proposed methodology allows rapid development of new dialog managers as well as to explore new dialog strategies, which permit developing new enhanced versions of already existing systems. © 2013 Elsevier Ltd. All rights reserved.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
2013
title = {On-line Adaptation of Semantic Models for Spoken Language Understanding},
author = {Bayer A. O. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU13-OnLineSLUAdapt.pdf},
year = {2013},
date = {2013-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, 2013},
abstract = {Spoken language understanding (SLU) systems extract semantic information from speech signals, which is usually mapped onto concept sequences. The distribution of concepts in dialogues are usually sparse. Therefore, general models may fail to model the concept distribution for a dialogue and semantic models can benefit from adaptation. In this paper, we present an instance-based approach for on-line adaptation of semantic models. We show that we can improve the performance of an SLU system on an utterance, by retrieving relevant instances from the training data and using them for on-line adapting the semantic models. The instancebased adaptation scheme uses two different similarity metrics edit distance and n-gram match score on three different tokenizations; word-concept pairs, words, and concepts. We have achieved a significant improvement (6% relative) in the understanding performance by conducting re-scoring experiments on the n-best lists that the SLU outputs. We have also applied a two-level adaptation scheme, where adaptation is first applied to the automatic speech recognizer (ASR) and then to the SLU.},
keywords = {Machine Learning, Speech Processing}
}
title = {Instance-Based On-Line Language Model Adaptation},
author = {Bayer A. O. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS13-InstanceBaseLearning.pdf},
year = {2013},
date = {2013-01-01},
journal = {INTERSPEECH, Lyon, 2013},
abstract = {Language model (LM) adaptation is needed to improve the performance of language-based interaction systems. There are two important issues regarding LM adaptation; the selection of the target data set and the mathematical adaptation model. In the literature, usually statistics are drawn from the target data set (e.g. cache model) to augment (e.g. linearly) background statistical language models, as in the case of automatic speech recognition (ASR). Such models are relatively inexpensive to train, however they do not provide the necessary high-dimensional language context description needed for language-based interaction. Instance-based learning provides high-dimensional description of the lexical, semantic, or dialog context. In this paper, we present an instance-based approach to LM adaptation. We show that by retrieving similar instances from the training data and adapting the model with these instances, we can improve the performance of LMs. We propose two different similarity metrics for instance retrieval, edit distance and n-gram match score. We have performed instance-based adaptation on feed forward neural network LMs (NNLMs) to re-score n-best lists for ASR on the LUNA corpus, which includes conversational speech. We have achieved significant improvements in word error rate (WER) by using instance-based on-line LM adaptation on feed forward NNLMs.},
keywords = {Machine Learning, Speech Processing}
}
title = {Comparative Evaluation of Argument Extraction Algorithms in Discourse Relation Parsing},
author = {Stepanov E. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IWPT13-DiscourseParsing.pdf},
year = {2013},
date = {2013-01-01},
journal = {International Conference on Parsing Technologies, Nara, 2013},
abstract = {Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. One of the subtasks of discourse parsing is the extraction of argument spans of discourse relations. A relation can be either intra-sentential – to have both arguments in the same sentence – or inter-sentential – to have arguments span over different sentences. There are two approaches to the task. In the first approach the parser decision is not conditioned on whether the relation is intra- or intersentential. In the second approach relations are parsed separately for each class. The paper evaluates the two approaches to argument span extraction on Penn Discourse Treebank explicit relations; and the problem is cast as token-level sequence labeling. We show that processing intra- and inter-sentential relations separately, reduces the task complexity and significantly outperforms the single model approach.},
keywords = {Natural Language Processing, Speech Processing}
}
title = {Comparative Study of Speaker Personality Traits Recognition in Conversational and Broadcast News Speech},
author = {Alam F. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS13-Personality.pdf},
year = {2013},
date = {2013-01-01},
journal = {INTERSPEECH, Lyon, 2013},
abstract = {Natural human-computer interaction requires, in addition to understand what the speaker is saying, recognition of behavioral descriptors, such as speaker’s personality traits (SPTs). The complexity of this problem depends on the high variability and dimensionality of the acoustic, lexical and situational context manifestations of the SPTs. In this paper, we present a comparative study of automatic speaker personality trait recognition from speech corpora that differ in the source speaking style (broadcast news vs. conversational) and experimental context. We evaluated different feature selection algorithms such as information gain, relief and ensemble classification methods to address the high dimensionality issues. We trained and evaluated ensemble methods to leverage base learners, using three different algorithms such as SMO (Sequential Minimal Optimization for Support Vector Machine), RF (Random Forest) and Adaboost. After that, we combined them using majority voting and stacking methods. Our study shows that, performance of the system greatly benefits from feature selection and ensemble methods across corpora.},
keywords = {Affective Computing, Speech Processing}
}
2012
title = {Joint Language Models for Automatic Speech Recognition and Understanding},
author = {Bayer A. O. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT12-NNLMSLU.pdf},
year = {2012},
date = {2012-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Miami, 2012},
abstract = {Language models (LMs) are one of the main knowledge sources used by automatic speech recognition (ASR) and Spoken Language Understanding (SLU) systems. In ASR systems they are optimized to decode words from speech for a transcription task. In SLU systems they are optimized to map words into concept constructs or interpretation representations. Performance optimization is generally designed independently for ASR and SLU models in terms of word accuracy and concept accuracy respectively. However, the best word accuracy performance does not always yield the best understanding performance. In this paper we investigate how LMs originally trained to maximize word accuracy can be parametrized to account for speech understanding constraints and maximize concept accuracy. Incremental reduction in concept error rate is observed when a LM is trained on word-to-concept mappings. We show how to optimize the joint transcription and understanding task performance in the lexical-semantic relation space.},
keywords = {Machine Learning, Speech Processing}
}
title = {Discriminative Reranking for Spoken Language Understanding},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP11-DRMSLU1.pdf},
year = {2012},
date = {2012-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 526-539, 2012},
abstract = {Spoken language understanding (SLU) is concerned with the extraction of meaning structures from spoken utterances. Recent computational approaches to SLU, e.g., conditional random fields (CRFs), optimize local models by encoding several features, mainly based on simple n-grams. In contrast, recent works have shown that the accuracy of CRF can be significantly improved by modeling long-distance dependency features. In this paper, we propose novel approaches to encode all possible dependencies between features and most importantly among parts of the meaning structure, e.g., concepts and their combination. We rerank hypotheses generated by local models, e.g., stochastic finite state transducers (SFSTs) or CRF, with a global model. The latter encodes a very large number of dependencies (in the form of trees or sequences) by applying kernel methods to the space of all meaning (sub) structures. We performed comparative experiments between SFST, CRF, support vector machines (SVMs), and our proposed discriminative reranking models (DRMs) on representative conversational speech corpora in three different languages: the ATIS (English), the MEDIA (French), and the LUNA (Italian) corpora. These corpora have been collected within three different domain applications of increasing complexity: informational, transactional, and problem-solving tasks, respectively. The results show that our DRMs consistently outperform the state-of-the-art models based on CRF.},
keywords = {Signal Annotation and Interpretation, Speech Processing}
}
title = {Combining Machine Translation Systems for Spoken Language Understanding Portability},
author = {Garcia F., Hurtado L. F., Segarra E., Sanchis E. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT12-MTPortSLU.pdf},
year = {2012},
date = {2012-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Miami, 2012},
abstract = {We are interested in the problem of learning Spoken Language Understanding (SLU) models for multiple target languages. Learning such models requires annotated corpora, and porting to different languages would require corpora with parallel text translation and semantic annotations. In this paper we investigate how to learn a SLU model in a target language starting from no target text and no semantic annotation. Our proposed algorithm is based on the idea of exploiting the diversity (with regard to performance and coverage) of multiple translation systems to transfer statistically stable word-toconcept mappings in the case of the romance language pair, French and Spanish. Each translation system performs differently at the lexical level (wrt BLEU). The best translation system performances for the semantic task are gained from their combination at different stages of the portability methodology. We have evaluated the portability algorithms on the French MEDIA corpus, using French as the source language and Spanish as the target language. The experiments show the effectiveness of the proposed methods with respect to the source language SLU baseline.},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing, Statistical Machine Translation}
}
2011
title = {Simultaneous Dialog Act Segmentation and Classification from Human-Human Spoken Conversations},
author = {Quarteroni S., Ivanov A. V. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP11-DASegmcClass.pdf},
year = {2011},
date = {2011-01-01},
journal = {ICASSP, Prague, 2011},
abstract = {An accurate identification dialog acts (DAs), which represent the illocutionary aspect of communication, is essential to support the understanding of human conversations. This requires 1) the segmentation of human-human dialogs into turns, 2) the intra-turn segmentation into DA boundaries and 3) the classification of each segment according to a DA tag. This process is particularly challenging when both segmentation and tagging are automated and utterance hypotheses derive from the erroneous results of ASR. In this paper, we use Conditional Random Fields to learn models for simultaneous segmentation and labeling of DAs from whole human-human spoken dialogs. We identify the best performing lexical feature combinations on the LUNA and SWITCHBOARD human-human dialog corpora and compare performances to those of discriminative D classifiers based on manually segmented utterances. Additionally, we assess our models’ robustness to recognition errors, showing that DA identification is robust in the presence of high word error rates.},
keywords = {Natural Language Processing, Signal Annotation and Interpretation, Speech Processing}
}
2010
title = {Combining User Intention and Error Modeling for Statistical Dialog Simulators},
author = {Quarteroni S., Gonzalez M., Riccardi G. and Varges S.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-StatistDUS.pdf},
year = {2010},
date = {2010-02-01},
journal = {INTERSPEECH, Makuhari, 2010},
abstract = {Statistical user simulation is an efficient and effective way to train and evaluate the performance of a (spoken) dialog system. In this paper, we design and evaluate a modular data-driven dialog simulator where we decouple the “intentional” component of the User Simulator from the Error Simulator representing different types of ASR/SLU noisy channel distortion. While the former is composed by a Dialog Act Model, a Concept Model and a User Model, the latter is centered around an Error Model. We test different Dialog Act Models and Error Models against a baseline dialog manager and compare results with real dialogs obtained using the same dialog manager. On the grounds of dialog act, task and concept accuracy, our results show that 1) datadriven Dialog Act Models achieve good accuracy with respect to real user behavior and 2) data-driven Error Models make task completion times and rates closer to real data.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Acoustic Correlates of Meaning Structure in Conversational Speech},
author = {Ivanov A. V., Riccardi G., Ghosh S.,Tonelli S. and Stepanov E.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-AcousticSemanticCorrelates.pdf},
year = {2010},
date = {2010-01-01},
journal = {INTERSPEECH, Makuhari, 2010},
abstract = {We are interested in the problem of extracting meaning structures from spoken utterances in human communication. In Spoken Language Understanding (SLU) systems, parsing of meaning structures is carried over the word hypotheses generated by the Automatic Speech Recognizer (ASR). This approach suffers from high word error rates and ad-hoc conceptual representations. In contrast, in this paper we aim at discovering meaning components from direct measurements of acoustic and non-verbal linguistic features. The meaning structures are taken from the frame semantics model proposed in FrameNet, a consistent and extendable semantic structure resource covering a large set of domains. We give a quantitative analysis of meaning structures in terms of speech features across human–human dialogs from the manually annotated LUNA corpus. We show that the acoustic correlations between pitch, formant trajectories, intensity and harmonicity and meaning features are statistically significant over the whole corpus as well as relevant in classifying the target words evoked by a semantic frame. Index Terms: spoken language understanding, spoken dialog, frame semantics, speech mining, acoustic features.},
keywords = {Natural Language Processing, Speech Processing}
}
title = {Classifying Dialog Acts in Human-Human and Human-Machine Spoken Conversations},
author = {Quarteroni S. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-DAClass.pdf},
year = {2010},
date = {2010-01-01},
journal = {INTERSPEECH, Makuhari, 2010},
abstract = {Dialog acts represent the illocutionary aspect of the communication; depending on the nature of the dialog and its participants, different types of dialog act occur and an accurate classification of these is essential to support the understanding of human conversations. We learn effective discriminative dialog act classifiers by studying the most predictive classification features on Human-Human and Human-Machine corpora such as LUNA and SWITCHBOARD; additionally, we assess classifier robustness to speech errors. Our results exceed the state of the art on dialog act classification from reference transcriptions on SWITCHBOARD and allow us to reach a very satisfying performance on ASR transcriptions.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Investigating Clarification Strategies in a Hybrid POMDP Dialog Manager},
author = {Varges S., Quarteroni S., Riccardi G. and Ivanov A. V.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGDial10-ClariStratPOMDP.pdf},
year = {2010},
date = {2010-01-01},
journal = {SIGDial, Tokyo, 2010},
abstract = {We investigate the clarification strategies exhibited by a hybrid POMDP dialog manager based on data obtained from a phone-based user study. The dialog manager combines task structures with a number of POMDP policies each optimized for obtaining an individual concept. We investigate the relationship between dialog length and task completion. In order to measure the effectiveness of the clarification strategies, we compute concept precisions for two different mentions of the concept in the dialog: first mentions and final values after clarifications and similar strategies, and compare this to a rulebased system on the same task. We observe an improvement in concept precision of 12.1% for the hybrid POMDP compared to 5.2% for the rule-based system.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Cooperative User Models in Statistical Dialog Simulators},
author = {Gonzalez M., Quarteroni S., Riccardi G. and Varges S.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGDial10-DUSCoopUser.pdf},
year = {2010},
date = {2010-01-01},
journal = {SIGDial 2010, Tokyo 2010},
abstract = {Statistical user simulation is a promising methodology to train and evaluate the performance of (spoken) dialog systems. We work with a modular architecture for data-driven simulation where the “intentional” component of user simulation includes a User Model representing userspecific features. We train a dialog simulator that combines traits of human behavior such as cooperativeness and context with domain-related aspects via the Expectation-Maximization algorithm. We show that cooperativeness provides a finer representation of the dialog context which directly affects task completion rate.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {The LUNA Spoken Dialogue System: Beyond Utterance Classification},
author = {Dinarelli M.,Stepanov E.,Varges S. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP10-SDSBeyondUttClass.pdf},
year = {2010},
date = {2010-01-01},
journal = {ICASSP, Dallas, 2010},
abstract = {We present a call routing application for complex problem solving tasks. Up to date work on call routing has been mainly dealing with call-type classification. In this paper we take call routing further: Initial call classification is done in parallel with a robust statistical Spoken Language Understanding module. This is followed by a dialogue to elicit further taskrelevant details from the user before passing on the call. The dialogue capability also allows us to obtain clarifications of the initial classifier guess. Based on an evaluation, we show that conducting a dialogue significantly improves upon call routing based on call classification alone. We present both subjective and objective evaluation results of the system according to standard metrics on real users.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
2009
title = {A Statistical Dialog Manager for the LUNA Project},
author = {Griol D., Riccardi G. and Sanchis E.},
editor = {INTERSPEECH, Brighton, 2009},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-DM.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {In this paper, we present an approach for the development of a statistical dialog manager, in which the system response is selected by means of a classification process which considers all the previous history of the dialog to select the next system response. In particular, we use decision trees for its implementation. The statistical model is automatically learned from training data which are labeled in terms of different SLU features. This methodology has been applied to develop a dialog manager within the framework of the European LUNA project, whose main goal is the creation of a robust natural spoken language understanding system. We present an evaluation of this approach for both human machine and human-human conversations acquired in this project. We demonstrate that a statistical dialog manager developed with the proposed technique and learned from a corpus of human-machine dialogs can successfully infer the task-related topics present in spontaneous humanhuman dialogs.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {The Exploration/Exploitation Trade-Off in Reinforcement Learning for Dialogue Management},
author = {Varges S., Riccardi G., Quarteroni S., Ivanov A. V.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU09-ExploitationExplorationTradeoffSDS.pdf},
year = {2009},
date = {2009-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Merano, 2009.},
abstract = {Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner’s lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Leveraging POMDPs trained with User Simulations and Rule-Based Dialog Management in a SDS},
author = {Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGDial09-demo.pdf},
year = {2009},
date = {2009-01-01},
journal = {SIGDIAL, Demo Session, London, 2009},
abstract = {We have developed a complete spoken dialogue framework that includes rule-based and trainable dialogue managers, speech recognition, spoken language understanding and generation modules, and a comprehensive web visualization interface. We present a spoken dialogue system based on Reinforcement Learning that goes beyond standard rule based models and computes on-line decisions of the best dialogue moves. Bridging the gap between handcrafted (e.g. rule-based) and adaptive (e.g. based on Partially Observable Markov Decision Processes - POMDP) dialogue models, this prototype is able to learn high rewarding policies in a number of dialogue situations.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Learning the Structure of Human-Computer and Human-Human Spoken Conversations},
author = {Griol D., Riccardi G. and Sanchis E.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-HHConvStructure.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {We are interested in the problem of understanding human conversation structure in the context of human-machine and human-human interaction. We present a statistical methodology for detecting the structure of spoken dialogs based on a generative model learned using decision trees. To evaluate our approach we have used the LUNA corpora, collected from real users engaged in problem solving tasks. The results of the evaluation show that automatic segmentation of spoken dialogs is very effective not only with models built using separately human-machine dialogs or human-human dialogs, but it is also possible to infer the task-related structure of human-human dialogs with a model learned using only human-machine dialogs.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Can Machines Call People?- User Experience While Answering Telephone Calls Initiated by Machine},
author = {Sporka A. J., Jakub F. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-RR1.pdf},
year = {2009},
date = {2009-01-01},
journal = {CHI, Boston, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {On-Line Strategy Computation in Spoken Dialog Systems},
author = {Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP09-POMDPs.pdf},
year = {2009},
date = {2009-01-01},
journal = {ICASSP, Demo Session, Singapore, 2009. VIDEO},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {The Voice Multimodal Application Framework},
author = {Riccardi G., Mosca N., Roberti P. and Baggia P.},
year = {2009},
date = {2009-01-01},
journal = {AVIOS, San Diego, 2009. VIDEO},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Spoken Dialog Systems: From Theory to Technology},
author = {Riccardi G., Baggia P. and Roberti P.},
year = {2009},
date = {2009-01-01},
journal = {Proc. Work. Toni Mian, Padua, 2007},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Ontology-Based Grounding of Spoken Language Understanding},
author = {Quarteroni S., Dinarelli M. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU09-OntologyGrounding.pdf},
year = {2009},
date = {2009-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Merano, 2009},
abstract = {Current Spoken Language Understanding models rely on either hand-written semantic grammars or flat attributevalue sequence labeling. In most cases, no relations between concepts are modeled, and both concepts and relations are domain-specific, making it difficult to expand or port the domain model. In contrast, we expand our previous work on a domain model based on an ontology where concepts follow the predicateargument semantics and domain-independent classical relations are defined on such concepts. We conduct a thorough study on a spoken dialog corpus collected within a customer care problemsolving domain, and we evaluate the coverage and impact of the ontology for the interpretation, grounding and},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {What's in an Ontology for Spoken Language Understanding},
author = {Quarteroni S., Riccardi G. and Dinarelli M.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-Ontology.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {Current Spoken Language Understanding systems rely either on hand-written semantic grammars or on flat attribute-value sequence labeling. In both approaches, concepts and their relations (when modeled at all) are domain-specific, thus making it difficult to expand or port the domain model. To address this issue, we introduce: 1) a domain model based on an ontology where concepts are classified into either predicative or argumentative; 2) the modeling of relations between such concept classes in terms of classical relations as defined in lexical semantics. We study and analyze our approach on a corpus of customer care data, where we evaluate the coverage and relevance of the ontology for the interpretation of speech utterances.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Concept Segmentation and Labeling for Conversational Speech},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-RR.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Re-Ranking Models Based on Small Training Data for Spoken Language Understanding},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-RR3.pdf},
year = {2009},
date = {2009-01-01},
journal = {EMNLP, Singapore, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Re-Ranking Models For Spoken Language Understanding},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/EACL09-RR.pdf},
year = {2009},
date = {2009-01-01},
journal = {EACL Conference, Athens, 2009},
abstract = {Spoken Language Understanding aims at mapping a natural language spoken sentence into a semantic representation. In the last decade two main approaches have been pursued: generative and discriminative models. The former is more robust to overfitting whereas the latter is more robust to many irrelevant features. Additionally, the way in which these approaches encode prior knowledge is very different and their relative performance changes based on the task. In this paper we describe a machine learning framework where both models are used: a generative model produces a list of ranked hypotheses whereas a discriminative model based on structure kernels and Support Vector Machines, re-ranks such list. We tested our approach on the MEDIA corpus (human-machine dialogs) and on a new corpus (human-machine and humanhuman dialogs) produced in the European LUNA project. The results show a large improvement on the state-of-the-art in concept segmentation and labeling.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
2008
title = {Persistent Information State in a Data-Centric Architecture},
author = {Sebastian V., Riccardi G. and Quarteroni S.},
year = {2008},
date = {2008-01-01},
journal = {SIGdial Workshop on Discourse and Dialogue, Columbus, 2008},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Spoken Language Understanding},
author = {De Mori R., Bechet F., Hakkani-Tur D., McTear M., Riccardi G. and Tur G.},
year = {2008},
date = {2008-01-01},
journal = {IEEE Signal Processing Magazine vol. 25, pp.50-58 ,2008},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Joint Generative And Discriminative Models For Spoken Language Understanding},
author = {Dinarelli M., Moschitti A., Riccardi G.},
year = {2008},
date = {2008-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Goa, 2008},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Learning with Noisy Supervision for Spoken Language Understanding},
author = {Raymond C. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP08-NoisySupervisionSLU.pdf},
year = {2008},
date = {2008-01-01},
journal = {Proc. IEEE ICASSP, Las Vegas,2008},
abstract = {Data-driven Spoken Language Understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been successfully applied to automatic speech recognition and utterance classification. In general, corpora annotation for SLU involves such tasks as sentence segmentation, chunking or frame labeling and predicate-argument annotation. In such cases human annotations are subject to errors increasing with the annotation complexity. We investigate two alternative noise-robust active learning strategies that are either data-intensive or supervision-intensive. The strategies detect likely erroneous examples and improve significantly the SLU performance for a given labeling cost. We apply uncertainty based active learning with conditional random fields on the concept segmentation task for SLU. We perform annotation experiments on two databases, namely ATIS (English) and Media (French). We show that our noise-robust algorithm could improve the accuracy up to 6% (absolute) depending on the noise level and the labeling cost.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
2007
title = {A Data-Centric Architecture for Data-Driven Spoken Dialog Systems},
author = {Varges S. and Riccardi G.},
year = {2007},
date = {2007-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, 2007},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Spoken Language Understanding with Kernels for Syntactic/Semantic Structures},
author = {Moschitti A., Riccardi G. and Raymond C.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU07-SLUKernels.pdf},
year = {2007},
date = {2007-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, 2007},
abstract = {Automatic concept segmentation and labeling are the fundamental problems of Spoken Language Understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with Support Vector Machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with offthe-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-theart models when combined with n-gram based models.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Generative and Discriminative Algorithms for Spoken Language Understanding},
author = {Raymond C., Riccardi G.},
year = {2007},
date = {2007-01-01},
journal = {INTERSPEECH, Antwerp, 2007},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
2006
title = {Spoken Dialog Systems: From Theory to Technology},
author = {Riccardi G. and Baggia P.},
year = {2006},
date = {2006-01-01},
journal = {Edizione della Normale di Pisa, 2006},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {An Active Approach to spoken Language Processing},
author = {Hakkani-Tur D., Riccardi G. and Tur G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/acm-tslp-06.pdf},
year = {2006},
date = {2006-01-01},
journal = {ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006},
abstract = {State of the art data-driven speech and language processing systems require a large amount of human intervention ranging from data annotation to system prototyping. In the traditional supervised passive approach, the system is trained on a given number of annotated data samples and evaluated using a separate test set. Then more data is collected arbitrarily, annotated, and the whole cycle is repeated. In this article, we propose the active approach where the system itself selects its own training data, evaluates itself and re-trains when necessary. We first employ active learning which aims to automatically select the examples that are likely to be the most informative for a given task. We use active learning for both selecting the examples to label and the examples to re-label in order to correct labeling errors. Furthermore, the system automatically evaluates itself using active evaluation to keep track of the unexpected events and decides on-demand to label more examples. The active approach enables dynamic adaptation of spoken language processing systems to unseen or unexpected events for nonstationary input while reducing the manual annotation effort significantly. We have evaluated the active approach with the AT&T spoken dialog system used for customer care applications. In this article, we present our results for both automatic speech recognition and spoken language understanding. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Speech recognition and synthesis; I.5.1 [Pattern Recognition]: Models—Statistical General Terms: Algorithms, Languages, Performance Additional Key Words and Phrases: Passive learning, active learning, adaptive learning, unsupervised learning, active evaluation, spoken language understanding, automatic speech recognition, spoken dialog systems, speech and language processing},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {The AT&T Spoken Language Understanding System},
author = {Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE-SAP-2005-SLU.pdf},
year = {2006},
date = {2006-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006},
abstract = {Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users’ intents for call classification, to combinations of users’ intents and named entities. In this paper, we present the SLU system of VoiceTone ® (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users’ utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Shallow Semantic Parsing for Spoken Language Understanding},
author = {Coppola B., Moschitti A., Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/NAACL09-ShallowSemanticParsingFrameNet.pdf},
year = {2006},
date = {2006-01-01},
journal = {NAACL, Boulder, Colorado, 2009},
abstract = {Most Spoken Dialog Systems are based on speech grammars and frame/slot semantics. The semantic descriptions of input utterances are usually defined ad-hoc with no ability to generalize beyond the target application domain or to learn from annotated corpora. The approach we propose in this paper exploits machine learning of frame semantics, borrowing its theoretical model from computational linguistics. While traditional automatic Semantic Role Labeling approaches on written texts may not perform as well on spoken dialogs, we show successful experiments on such porting. Hence, we design and evaluate automatic FrameNet-based parsers both for English written texts and for Italian dialog utterances. The results show that disfluencies of dialog data do not severely hurt performance. Also, a small set of FrameNet-like manual annotations is enough for realizing accurate Semantic Role Labeling on the target domains of typical Dialog Systems.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Beyond ASR 1-Best: Using Word Confusion Network},
author = {Hakkani-Tur D., Bechet F., Riccardi G. and Tur G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/CSL-pivot-slu.pdf},
year = {2006},
date = {2006-01-01},
journal = {Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006},
abstract = {We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. Ó 2005 Elsevier Ltd. All rights reserved.},
keywords = {Language Modeling, Speech Processing}
}
2005
title = {Using Context to Improve Emotion Detection in Spoken Dialog Systems},
author = {Liscombe J., Riccardi G. and Hakkani-Tur D.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS05-EmotRecog.pdf},
year = {2005},
date = {2005-01-01},
journal = {INTERSPEECH, Lisbon, Sept. 2005},
abstract = {Most research that explores the emotional state of users of spoken dialog systems does not fully utilize the contextual nature that the dialog structure provides. This paper reports results of machine learning experiments designed to automatically classify the emotional state of user turns using a corpus of 5,690 dialogs collected with the “How May I Help You SM ” spoken dialog system. We show that augmenting standard lexical and prosodic features with contextual features that exploit the structure of spoken dialog and track user state increases classification accuracy by 2.6%.},
keywords = {Affective Computing, Speech Processing}
}
title = {The AT&T WATSON Speech Recognizer},
author = {Goffin V., Allauzen C., Bocchieri E., Hakkani-Tur D., Ljolje A., Parthasarathy S., Rahim M., Riccardi G. and Saraclar M.},
year = {2005},
date = {2005-01-01},
journal = {IEEE ICASSP, Philadelphia, March 2005},
keywords = {Speech Processing}
}
title = {Adaptive Categorical Understanding for Spoken Dialogue Systems'},
author = {Potamianos A., Narayanan S. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/ieee_adapt-categ-05.pdf},
year = {2005},
date = {2005-01-01},
journal = {Potamianos A., Narayanan S and Riccardi, G.},
abstract = {IEEE Trans. on Speech and Audio, vol. 13, n.3 , pp. 321-329, 2005},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Error Prediction in Spoken Dialog: from Signal-to-Noise Ratio to Semantic Confidence Scores},
author = {Hakkani-Tur D., Tur G., Riccardi G. and Kim H. K.},
year = {2005},
date = {2005-01-01},
journal = {IEEE ICASSP, Philadelphia, March 2005},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Mining Spoken dialogue Corpora for system Evaluation and Modeling},
author = {Bechet F., Riccardi G. and Hakkani-Tur D.},
year = {2005},
date = {2005-01-01},
journal = {EMNLP Conference, Barcelona, 2004},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Combining Classifiers for Spoken Language Understanding},
author = {Karahan M., Hakkani-Tur D., Riccardi G. and Tur G.},
year = {2005},
date = {2005-01-01},
journal = {IEEE ASRU, U.S. Virgin Islands, Dec., 2003},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
2004
title = {Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification},
author = {Hakkani-Tur D., Tur G., Rahim M. and Riccardi G.},
year = {2004},
date = {2004-01-01},
journal = {ICASSP, Montreal, May 2004},
keywords = {Language Modeling, Machine Learning, Speech Processing}
}
2003
title = {Multi-channel Sentence Classification for Spoken Dialogue Modeling},
author = {Bechet F., Riccardi G. and Hakkani-Tur D.},
year = {2003},
date = {2003-01-01},
journal = {EUROSPEECH, Geneve, Switzerland, Sept. 2003},
keywords = {Machine Learning, Speech Processing}
}
title = {A General Algorithm for Word Graph Decomposition},
author = {Hakkani-Tur D. and Riccardi G.},
year = {2003},
date = {2003-01-01},
journal = {IEEE ICASSP, Honk-Kong, 2003},
keywords = {Speech Processing}
}
2002
title = {Acoustic and Word Lattice Based Algorithms for Confidence Scores},
author = {Falavigna D., Gretter R. and Riccardi G.},
year = {2002},
date = {2002-01-01},
journal = {Proc. ICSLP, Denver, 2002},
keywords = {Speech Processing}
}
title = {AT&T Help Desk},
author = {Fabbrizio G., Dutton D., Gupta N., Hollister B., Rahim M., Riccardi G., Schapire R. and Schroeter J.},
year = {2002},
date = {2002-01-01},
journal = {Proc. ICSLP, Denver, 2002},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Automated Natural Spoken Dialog},
author = {Gorin A., Abella A., Alonso T., Riccardi G. and Wright J.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/computer_magazine_2002.pdf},
year = {2002},
date = {2002-01-01},
journal = {IEEE Computer, vol. 35, n.4, pp. 51-56, April, 2002 (invited paper)},
abstract = {Engineers have long sought to design systems that understand and act upon spoken language. Extracting meaning from natural, unconstrained speech over the telephone is technically challenging, and quantifying semantic content is crucial for engineering and evaluating such systems.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Improving Spoken Language Understanding Using Word Confusion Networks},
author = {Gokhan T., Wright J., Gorin A., Riccardi G., Tur H.},
year = {2002},
date = {2002-01-01},
journal = {ICSLP, Denver, 2002},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Combining Prior Knowledge and Boosting for Call Classification in Spoken Language Dialogue},
author = {Rochery M., Schapire R., Rahim M., Gupta G., Riccardi G., Bangalore S., Alshawi H. and Douglas S.},
year = {2002},
date = {2002-01-01},
journal = {Proc. IEEE ICASSP, Orlando, 2002},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
2001
title = {Robust Numeric Recognition in Spoken Language Dialogue},
author = {Rahim M., Riccardi G., Saul L., Wright J., Buntschuh B. and Gorin A. L.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/numericlang-speechcomm-2001.pdf},
year = {2001},
date = {2001-11-01},
journal = {Speech Communication, 34, pp. 195-212, 2001},
abstract = {This paper addresses the problem of automatic numeric recognition and understanding in spoken language dialogue. We show that accurate numeric understanding in ̄uent unconstrained speech demands maintaining robustness at several dierent levels of system design, including acoustic, language, understanding and dialogue. We describe a robust system for numeric recognition and present algorithms for feature extraction, acoustic and language modeling, discriminative training, utterance veri®cation and numeric understanding and validation. Experimental results from a ®eld-trial of a spoken dialogue system are presented that include customers\' responses to credit card and telephone number requests. Ó 2001 Elsevier Science B.V. All rights reserved.},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {On-line learning of language models with word error probability distributions},
author = {Gretter R. and Riccardi G.},
year = {2001},
date = {2001-05-07},
journal = {Proc. IEEE ICASSP 2001, Salt Lake City, Utah, 7-11 May 2001},
keywords = {Machine Learning, Speech Processing}
}
title = {Integration of Utterance Verification with Statistical Language Modeling and Spoken Language Understanding},
author = {Rose R. C., Yao H., Riccardi G. and Wright J. H.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/uv-speechcomm-2001.pdf},
year = {2001},
date = {2001-01-01},
journal = {Speech Communication, 34, pp. 321-331, 2001},
abstract = {Methods for utterance veri®cation (UV) and their integration into statistical language modeling and understanding formalisms for a large vocabulary spoken understanding system are presented. The paper consists of three parts. First, a set of acoustic likelihood ratio (LR) based UV techniques are described and applied to the problem of rejecting portions of a hypothesized word string that may have been incorrectly decoded by a large vocabulary continuous speech recognizer. Second, a procedure for integrating the acoustic level con®dence measures with the statistical language model is described. Finally, the eect of integrating acoustic level con®dence into the spoken language understanding unit (SLU) in a call-type classi®cation task is discussed. These techniques were evaluated on utterances collected from a highly unconstrained call routing task performed over the telephone network. They have been evaluated in terms of their ability to classify utterances into a set of 15 call-types that are accepted by the application. Ó 2001 Elsevier Science B.V. All rights reserved.},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
2000
title = {A Spoken Dialog System for Conference/Workshop Services},
author = {Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Kamm C., Narayanan S.},
year = {2000},
date = {2000-10-01},
journal = {Proc. ICSLP, Beijing, Oct. 2000},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding},
author = {Petrovska-Delacretaz D., Gorin A. L., Riccardi G. and Wright J. H.},
year = {2000},
date = {2000-10-01},
journal = {Proc. ICSLP, Beijing, Oct. 2000},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Semantic information processing of spoken language},
author = {Gorin A. L., Wright J. H., Riccardi G., Abella A. and Alonso T.},
year = {2000},
date = {2000-10-01},
journal = {ATR Workshop on Multi-Lingual Speech Communication, Oct. 2000},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Spoken language adaptation over time and state in a natural spoken dialog system},
author = {Riccardi G. and Gorin A. L.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP00-LMAdapt.pdf},
year = {2000},
date = {2000-01-01},
journal = {IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000},
abstract = {We are interested in adaptive spoken dialog systems for automated services. Peoples’ spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialog. Thus, it is crucial to adapt automatic speech recognition (ASR) language models to these varying conditions. We characterize and quantify these variations based on a database of 30K user-transactions with AT&T’s experimental How May I Help You? spoken dialog system. We describe a novel adaptation algorithm for language models with time and dialog-state varying parameters. Our language adaptation framework allows for recognizing and understanding unconstrained speech at each stage of the dialog, enabling context-switching and error recovery. These models have been used to train state-dependent ASR language models. We have evaluated their performance with respect to word accuracy and perplexity over time and dialog states. We have achieved a reduction of 40% in perplexity and of 8.4% in word error rate over the baseline system, averaged across all dialog states.},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1999
title = {W99- A Spoken Dialog System for the ASRU99 Workshop},
author = {Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Lin C., Kamm C.},
year = {1999},
date = {1999-12-01},
journal = {Proc. IEEE ASRU, Keystone, Colorado, Dec. 1999},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Learning head-dependency relations from unannotated corpora},
author = {Riccardi G., Bangalore S. and Sarin P.},
year = {1999},
date = {1999-12-01},
journal = {Proc. IEEE ASRU, Keystone, Colorado, Dec. 1999},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events},
author = {Conkie A., Riccardi G. and Rose R. C.},
year = {1999},
date = {1999-09-01},
journal = {Proc. EUROSPEECH, Budapest, Hungary, Sept. 1999},
keywords = {Machine Learning, Speech Processing}
}
title = {Categorical understanding using statistical N-gram models},
author = {Potamianos A., Riccardi G. and Narayanan S.},
year = {1999},
date = {1999-09-01},
journal = {Proc. EUROSPEECH, Budapest, Hungary, Sept. 1999},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Automatic speech recognition using acoustic confidence conditioned language models},
author = {Rose R. C. and Riccardi G.},
year = {1999},
date = {1999-09-01},
journal = {Proc. EUROSPEECH, Budapest, Hungary, Sept. 1999},
keywords = {Speech Processing}
}
title = {Spoken language variation over time and state in a natural spoken dialog system},
author = {Gorin A. L. and Riccardi G.},
year = {1999},
date = {1999-03-01},
journal = {Proc. ICASSP, Phoenix, Mar. 1999},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Modeling dysfluency and background events in ASR for a natural language understanding task},
author = {Rose R. C. and Riccardi G.},
year = {1999},
date = {1999-03-01},
journal = {Proc. ICASSP., Phoenix, March 1999},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Grammar fragment acquisition using syntactic and semantic clustering},
author = {Arai K., Wright J. H., Riccardi G. and Gorin A. L.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/fragclustering-speechcomm-19981.pdf},
year = {1999},
date = {1999-01-01},
journal = {Speech Communication, vol. 27, no. 1, Jan. 1999},
abstract = {A new method for automatically acquiring Fragments for understanding ̄uent speech is proposed. The goal of this method is to generate a collection of Fragments, each representing a set of syntactically and semantically similar phrases. First, phrases observed frequently in the training set are selected as candidates. Each candidate phrase has three associated probability distributions: of following contexts, of preceding contexts, and of associated semantic actions. The similarity between candidate phrases is measured by applying the Kullback±Leibler distance to these three probability distributions. Candidate phrases that are close in all three distances are clustered into a Fragment. Salient sequences of these Fragments are then automatically acquired, and exploited by a spoken language understanding module to classify calls in AT&T\'s ``How may I help you?\'\' task. These Fragments allow us to generalize unobserved phrases. For instance, they detected 246 phrases in the test-set that were not present in the training-set. This result shows that unseen phrases can be automatically discovered by our new method. Experimental results show that 2.8% of the improvement in call-type classi®catio},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Robust automatic speech recognition in a natural spoken dialog},
author = {Rahim M., Riccardi G., Wright J., Buntschuh B. and Gorin A.},
year = {1999},
date = {1999-01-01},
journal = {Workshop on Robust Methods for Speech Recognition in Adverse Condition, Tampere, Finland, 1999},
keywords = {Language Modeling, Speech Processing}
}
1998
title = {Language model adaptation for spoken dialog systems},
author = {Riccardi G., Potamianos A. and Narayanan S.},
year = {1998},
date = {1998-11-01},
journal = {Proc. ICSLP, Sydney, Nov. 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Stochastic language models for speech recognition and understanding},
author = {Riccardi G. and Gorin A. L.},
year = {1998},
date = {1998-11-01},
journal = {Proc. ICSLP, Sydney, Nov. 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Integration of utterance verification with statistical language modeling and spoken language understanding},
author = {Rose R. C., Yao H., Riccardi G. and Wright J.},
year = {1998},
date = {1998-05-01},
journal = {Proc. ICASSP., Seattle, May 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Automatic acquisition of phrase grammars for stochastic language modeling},
author = {Riccardi G. and Bangalore S.},
year = {1998},
date = {1998-01-01},
journal = {Proc. 6th ACL Workshop on Very Large Corpora, Montreal, 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1997
title = {Grammar fragment acquisition using syntactic and semantic clustering,'' Proc. Workshop Spoken Language Understanding & Communication},
author = {Arai K., Wright J., Riccardi G. and Gorin A.},
year = {1997},
date = {1997-12-01},
journal = {Yokosuka, Japan, Dec. 1997},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {How may I help you?},
author = {Gorin A. L., Riccardi G. and Wright J. H.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/specom97.pdf},
year = {1997},
date = {1997-01-01},
journal = {Speech Communication, vol. 23, Oct. 1997, pp. 113-127.},
abstract = {We are interested in providing automated services via natural spoken dialog systems. By natural, we mean that the machine understands and acts upon what people actually say, in contrast to what one would like them to say. There are many issues that arise when such systems are targeted for large populations of non-expert users. In this paper, we focus on the task of automatically routing telephone calls based on a user’s fluently spoken response to the open-ended prompt of ‘‘How may I help you?’’. We first describe a database generated from 10,000 spoken transactions between customers and human agents. We then describe methods for automatically acquiring language models for both recognition and understanding from such data. Experimental results evaluating call-classification from speech are reported for that database. These methods have been embedded within a spoken dialog system, with subsequent processing for information retrieval and formfilling. q 1997 Elsevier Science B.V.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {A spoken language system for automated call routing},
author = {Riccardi G., Gorin A. L., Ljolje A. and Riley M.},
year = {1997},
date = {1997-01-01},
journal = {Proc. ICASSP '97, 1997, pp. 1143-1146},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Integrating multiple knowledge sources for utterance verification in a large vocabulary speech understanding system},
author = {Rose R. C., Yao H., Riccardi G. and Wright J.},
year = {1997},
date = {1997-01-01},
journal = {Proc. IEEE ASR Workshop Proc., Santa Barbara, 1997},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Automatic acquisition of salient grammar fragments for call-type classification},
author = {Wright J. H., Gorin A. L. and Riccardi G.},
year = {1997},
date = {1997-01-01},
journal = {Proc. EUROSPEECH, Rhodes, Greece, 1997, pp. 1419-1422},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1996
title = {Stochastic automata for language modeling},
author = {Riccardi G., Pieraccini R. and Bocchieri E.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/csl96.pdf},
year = {1996},
date = {1996-01-01},
journal = { Computer Speech and Language, vol. 10(4), 1996, pp. 265-293},
abstract = {Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models to extract the most likely uttered sentence and its correspondent interpretation. The design of the language models has to be effective in order to mostly constrain the search algorithms and has to be efficient to comply with the storage space limits. In this work we present the Variable N-gram Stochastic Automaton (VNSA) language model that provides a unified formalism for building a wide class of language models. First, this approach allows for the use of accurate language models for large vocabulary speech recognition by using the standard search algorithm in the one-pass Viterbi decoder. Second, the unified formalism is an effective approach to incorporate different sources of information for computing the probability of word sequences. Third, the VNSAs are well suited for those applications where speech and language decoding cascades are implemented through weighted rational transductions. The VNSAs have been compared to standard bigram and trigram language models and their reduced set of parameters does not affect by any means the performances in terms of perplexity. The design of a stochastic language model through the VNSA is described and applied to word and phrase class-based language models. The effectiveness of VNSAs has been tested within the Air Travel Information System (ATIS) task to build the language model for th},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1995
title = {State tying of triphone HMM's for the 1994 AT&T ARPA ATIS recognizer},
author = {Bocchieri E. and Riccardi G.},
year = {1995},
date = {1995-09-01},
journal = {Proc. EUROSPEECH '95, Madrid, Sept. 1995, pp. 1499-1503},
keywords = {Speech Processing}
}
title = {Understanding spontaneous speech},
author = {Bocchieri E., Levin E., Pieraccini R. and Riccardi G.},
year = {1995},
date = {1995-01-01},
journal = {J. of the Italian Assoc. of Artificial Intelligence, Sept. 1995},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Non deterministic stochastic language models for speech recognition},
author = {Riccardi G., Bocchieri E. and Pieraccini R.},
year = {1995},
date = {1995-01-01},
journal = {Proc. ICASSP, Detroit, pp. 247-250, Detroit, 1995},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1994
title = {Improved multipulse algorithm for speech coding by means of adaptive Boltzmann annealing},
author = {Mumolo E., Rebelli A. and Riccardi G.},
year = {1994},
date = {1994-01-01},
journal = {European Transactions on Telecommunications, vol. 5, no. 6, Nov. 1994},
keywords = {Speech Processing}
}
title = {Dynamic bit allocation in subband coding of wideband audio with multipulse},
author = {Menardi P., Mian G. A. and Riccardi G.},
year = {1994},
date = {1994-01-01},
journal = {Proc. EUSIPCO, Edinburgh, 1994, pp. 1449-1452},
keywords = {Speech Processing}
}
title = {A localization property of line spectrum pairs},
author = {Mian G. A. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE_LSP.pdf},
year = {1994},
date = {1994-01-01},
journal = {IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 536-539, Oct. 1994},
keywords = {Speech Processing}
}
title = {The 1993 AT&T ATIS system},
author = {Bocchieri E. and Riccardi G.},
year = {1994},
date = {1994-01-01},
journal = {Proc. 1994 ARPA Spoken Language Technology Workshop, Plainsboro, NJ, March 1994, pp. 41-42},
keywords = {Language Modeling, Signal Annotation and Interpretation, Speech Processing}
}
1993
title = {Analysis-by-synthesis algorithms for low bitrate coding},
author = {Riccardi G. and Mian G. A.},
year = {1993},
date = {1993-10-01},
journal = {IEEE Workshop on Speech Coding for Telecommunications, Montreal, Oct. 1993},
keywords = {Speech Processing}
}
title = {Use of the forward-backward search for large vocabulary recognition with continuous observation density HMM's},
author = {Bocchieri E. and Riccardi G.},
year = {1993},
date = {1993-01-01},
journal = {Proc. IEEE Workshop on Automatic Speech Recognition, pp. 85-86, Snowbird, 1993},
keywords = {Speech Processing}
}
title = {An approach to parameter reoptimization in multipulse based coders},
author = {Fratti M., Mian G. A. and Riccardi G.},
url = {https://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE_Multipulse.pdf},
year = {1993},
date = {1993-01-01},
journal = {IEEE Trans. Speech & Audio Proc., vol. 1, no. 4, pp. 463-465, Oct. 1993},
keywords = {Speech Processing}
}
1992
title = {On the effectiveness of parameter reoptimization in multipulse based coders},
author = {Fratti M., Mian G. A. and Riccardi G.},
year = {1992},
date = {1992-11-01},
journal = {Proc. ICASSP '92, San Francisco, 1992, pp. 72-77},
keywords = {Speech Processing}
}