Mousavi M., Negro R. and Riccardi G. An Unsupervised Approach to Extract Life-Events from Personal Narratives in the Mental Health Domain (Conference) Eighth Italian Conference on Computational Linguistics, 2022. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation) Bayerl P. S., Tammewar A., Riedhammer K. and Riccardi G. Detecting Emotion Carriers By Combining Acoustic and Lexical Representations (Conference) IEEE Automatic Speech Recognition and Understanding Conference, 2021. (BibTeX | Tags: Affective Computing, Machine Learning) Torres M. J., Ravanelli M., Medina-Devilliers S., Lerner D. M. and Riccardi G. Interpretable SincNet-based Deep Learning for Emotion Recognition in Individuals with Autism (Conference) IEEE Conf. Engineering in Medicine and Biology, Conference, 2021. (Links | BibTeX | Tags: Affective Computing, Autism, Machine Learning, Signal Annotation and Interpretation) Mousavi M., Cervone A., Danieli M. and Riccardi G. Would you like to tell me more? Generating a corpus of psychotherapy dialogues (Conference) NAACL, Workshop on NLP for Medical Conversations 2021. (Links | BibTeX | Tags: Signal Annotation and Interpretation) Tammewar A., Cervone A. and Riccardi G. Emotion Carrier Recognition from Personal Narratives (Conference) INTERSPEECH, 2021. (Links | BibTeX | Tags: Affective Computing) Danieli M., Ciulli T, Mousavi M. and Riccardi G. A Participatory Design of Conversational Artificial Intelligence Agents for Mental Healthcare (Article) Journal of Medical Internet Research (JMIR) Formative Research Journal, 5 (12), 2021. (Links | BibTeX | Tags: Conversational and Interactive Systems , Signal Annotation and Interpretation) Torres M. J., Clarkson T., Hauschild K., Luhmann C. C., Lerner D. M. and Riccardi G. Facial emotions are accurately encoded in the brains of those with autism: A deep learning approach (Article) Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2021. (Links | BibTeX | Tags: Affective Computing, Autism, Machine Learning, Signal Annotation and Interpretation) Roccabruna G., Cervone A. and Riccardi G. Multifunctional ISO standard Dialogue Act tagging in Italian (Conference) Seventh Italian Conference on Computational Linguistics, 2021. (Links | BibTeX | Tags: Signal Annotation and Interpretation) Roccabruna G., Cervone A. and Riccardi G. Multifunctional ISO standard Dialogue Act tagging in Italian (Article) 2020. (Links | BibTeX | Tags: Discourse, Natural Language Processing) Tammewar A., Cervone A. and Riccardi G. Emotion Carrier Recognition from Personal Narratives (Article) 2020. (Links | BibTeX | Tags: Affective Computing, Natural Language Processing) Cervone A. and Riccardi G. Is This Dialogue Coherent ? Learning From Dialogue Acts and Entities (Article) 2020. (Links | BibTeX | Tags: Conversational and Interactive Systems , Discourse, Natural Language Processing) Tammewar A., Cervone A.,Eva-Maria Messner, Riccardi G. Annotation of Emotion Carriers in Personal Narratives (Proceeding) 2020. (Links | BibTeX | Tags: Affective Computing, Natural Language Processing, Signal Annotation and Interpretation) Chowdhury S. A., Stepanov E. A., Danieli M. and Riccardi G. Automatic Classification of Speech Overlaps: Feature Representation and Algorithms (Article) Computer Speech and Language, 55 pp. 145-167, 2019. (Links | BibTeX | Tags: Discourse, Speech Analytics) Mayor Torres, J.M., Clarkson, T., Luhmann, C. C., Riccardi, G., Lerner, M.D. 2019. (Links | BibTeX | Tags: Affective Computing, Autism, Machine Learning) Tortoreto G., Stepanov E. A., Cervone A., Dubiel M., Riccardi G. Affective Behaviour Analysis of On-line User Interactions: Are On-line Support Groups more Therapeutic than Twitter? (Conference) 2019. (Links | BibTeX | Tags: Discourse, Health, Language Analytics, Machine Learning) Marinelli F., Cervone A., Tortoreto G., Stepanov E. A., Di Fabbrizio G., Riccardi G. Active Annotation: bootstrapping annotation lexicon and guidelines for supervised NLU learning (Conference) 2019. (Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation) Coman C. A., Yoshino K., Murase Y., Nakamura S., Riccardi G. An Incremental Turn-Taking Model For Task-Oriented Dialog Systems (Conference) 2019. (Links | BibTeX | Tags: Conversational and Interactive Systems , Natural Language Processing) Tammewar A., Cervone A., Messner E., Riccardi G. Modeling user context for valence prediction from narratives (Conference) 2019. (Links | BibTeX | Tags: Affective Computing, Natural Language Processing) Dubiel M., Cervone A., Riccardi G. Inquisitive Mind: A Conversational News Companion (Conference) 2019. (Links | BibTeX | Tags: Conversational and Interactive Systems ) Alam F., Danieli M. and Riccardi G. Annotating and Modeling Empathy in Spoken Conversations (Article) Computer Speech and Language, 50 pp. 40-61, 2018. (Links | BibTeX | Tags: Affective Computing, Discourse, Signal Annotation and Interpretation) Stepanov E. A., Lathuiliere S., Chowdhury S. A., Ghosh A., Vieriu R., Sebe N.and Riccardi G. Depression Severity Estimation from Multiple Modalities (Conference) 2018. (Links | BibTeX | Tags: Affective Computing, Health Analytics) Mayor Torres, J.M., Libsack, E.J., Clarkson, T., Keifer, C.M., Riccardi, G., Lerner, M.D. 2018. (Links | BibTeX | Tags: Affective Computing, Autism) Mayor Torres, J.M., Clarkson, T., Stepanov E. A. , Luhmann C. C., Lerner, M.D., Riccardi, G. Enhanced Error Decoding from Error-Related Potentials using Convolutional Neural Networks (Conference) 2018. (Links | BibTeX | Tags: Affective Computing, Autism, Machine Learning) Dias RD, Conboy HM, Gabany JM, Clarke LA, Osterweil LJ, Arney D, Goldman JM, Riccardi G, Avrunin GS, Yule SJ, Zenati MA. Intelligent Interruption Management System to Enhance Safety and Performance in Complex Surgical and Robotic Procedures (Proceeding) 2018. (BibTeX | Tags: Interactive Systems, Machine Learning) Dias R., Conboy M. H., Gabany M. J., Clarke A. L. , Osterweil J. L., Avrunin S. G., Arney D., Goldman M. J., Riccardi G., Yule J. S., Zenati A. M. 2018. (Links | BibTeX | Tags: Health Analytics, Interactive Systems, Machine Learning, Signal Annotation and Interpretation) Mezza S., Cervone A., Stepanov E. A., Tortoreto G. and Riccardi G. ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents (Conference) 2018. (Links | BibTeX | Tags: Conversational and Interactive Systems , Discourse) Cervone A., Stepanov E. A. and Riccardi G. Coherence Models for Dialogue (Conference) 2018. (Links | BibTeX | Tags: Conversational and Interactive Systems , Discourse) Stepanov E. A., Lathuilierev S., Chowdhury S. A., Ghosh A., Vieriu R.D., Sebe N. and Riccardi G. Depression Severity Estimation from Multiple Modalities (Article) 2018, (EXCELLENT Paper AWARD). (Links | BibTeX | Tags: Health Analytics, Machine Learning) Ghosh A., Stepanov E. A., Mayor Torres, J.M., Danieli M. and Riccardi G. HEAL: A Health Analytics Intelligent Agent Platform for the acquisition and analysis of physiological signals (Conference) 2018. (Links | BibTeX | Tags: Health Analytics) Cervone A., Gambi E., Tortoreto G., Stepanov E. A., and Riccardi G. Automatically Predicting User Ratings for Conversational Systems (Conference) 2018. (Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems , Speech Analytics) Gobbi J., Stepanov E. A. and Riccardi G. Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development (Conference) 2018. (BibTeX | Tags: Conversational and Interactive Systems , Natural Language Processing) Ghosh A., Stepanov E. A., Danieli M., and Riccardi G. Are You Stressed? Detecting High Stress from User Diaries (Proceeding) 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2017) • September 11-14, 2017 • Debrecen, Hungary, 2017. (Abstract | Links | BibTeX | Tags: Health Analytics, Interactive Systems) Mayor Torres Juan M., Stepanov A. E. WI '17 Proceedings of the International Conference on Web Intelligence Pages 939-946, Leipzig, Germany - August 23 - 26, 2017, 2017. (Abstract | Links | BibTeX | Tags: Affective Computing, Interactive Systems, Signal Annotation and Interpretation) Stepanov A. E., Chowdhury A. S., Bayer A. O., Ghosh A., Klasinas I., Calvo M., Sanchis E. and Riccardi G. Language Resources and Evaluation, https://doi.org/10.1007/s10579-017-9396-5 , Springer, 2017, 2017. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Mogessie A. M., Ronchetti M., Riccardi G. Exploring the Role of Online Peer-Assessment as a Tool of Early Intervention (Article) In Wu, Gennari, Huang, Xie and Cao Y. (eds) Emerging Technologies for Education, Lecture Notes in Computer Science, vol 10108, pp. 635-644, 2017, 2017. (Abstract | Links | BibTeX | Tags: Interactive Systems) Cervone A., Stepanov E. A., Celli F., and Riccardi G. Irony Detection: from the Twittersphere to the News Space (Conference) 2017. (Abstract | Links | BibTeX | Tags: Affective Computing, Natural Language Processing) Singla K., Stepanov A. E., Bayer A. O., Riccardi G. and Carenini G. Automatic Community Creation for Abstractive Spoken Summarization (Conference) EMNLP 2017 Workshop on New Frontiers in Summarization, Copenhagen, 2017, 2017. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Chowdhury A. S., Stepanov E. A., Danieli M.. and Riccardi G. Functions of Silences towards Information Flow in Spoken Conversation (Conference) EMNLP 2017 Workshop on Speech-Centric Natural Language Processing, Copenhagen, 2017, 2017. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Bayer A. O., Stepanov A. E. and Riccardi G. Towards End-to-End Spoken Dialogue Systems (Proceeding) Proc. INTERSPEECH , Stockholm, 2017, 2017. (Abstract | Links | BibTeX | Tags: Interactive Systems, Speech Processing) Tortoreto G., Ghosh A., Stepanov E. A., Danieli M.. and Riccardi G. Affective Behaviour Analysis of User Interactions in Support Web Group (Presentation) 01.01.2017. (Links | BibTeX | Tags: Affective Computing) Chowdhury A. S. and Riccardi G. A Deep Learning Approach To Modeling Competitiveness In Spoken Conversations (Conference) Proc. ICASSP, New Orleans, 2017, 2017. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Cervone A., Tortoreto G., Mezza S., Gambi E. and Riccardi G Roving Mind: a balancing act between open–domain and engaging dialogue systems (Conference) 2017. (Links | BibTeX | Tags: Conversational and Interactive Systems , Interactive Systems, Machine Learning, Natural Language Processing, Speech Processing) Celli F., Stepanov A. E., Poesio M. and Riccardi G. Predicting Brexit: Classifying Agreement is Better than Sentiment and Pollsters (Proceeding) Proc. PEOPLES Workshop at COLING, Osaka 2016., 2016. (Abstract | Links | BibTeX | Tags: Affective Computing, Machine Learning, Signal Annotation and Interpretation) Alam F., Celli F., Stepanov A. E., Ghosh A. and Riccardi G. The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems (Proceeding) Proc. PEOPLES Workshop at COLING, Osaka 2016, 2016. (Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems ) Alam F. , Chowdhury S. , Danieli M. and Riccardi G. How Interlocutors Coordinate with each other within Emotional Segments? (Proceeding) Proc. COLING, Osaka, 2016., 2016. (Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems , Discourse, Interactive Systems) Alam F. , Danieli M.. and Riccardi G. Can We Detect Speakers' Empathy? A Real-Life Case Study (Proceeding) Proc. IEEE International Conference on Cognitive Infocommunications, Wrocław, 2016, 2016. (Abstract | Links | BibTeX | Tags: Discourse, Interactive Systems) Mogessie M., Ronchetti M. and Riccardi G. Exploring the Role of Online Peer-Assessment as a Tool of Early Intervention (Proceeding) Proc. International Conference on Web-based Learning, Rome, 2016., 2016. (Abstract | Links | BibTeX | Tags: Interactive Systems) Chowdhury S. , Stepanov A. E. and Riccardi G. Predicting User Satisfaction from Turn-Taking in Spoken Conversations (Proceeding) Proc. INTERSPEECH, San Francisco, 2016., 2016. (Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems , Discourse, Interactive Systems, Signal Annotation and Interpretation, Speech Processing) Mayor J. M., Ghosh A., Stepanov A. E. and Riccardi G. HEAL-T: An Efficient PPG-based Heart-Rate And IBI Estimation Method During Physical Exercise (Proceeding) Proc. EUSIPCO, Budapest, 2016, 2016. (Abstract | Links | BibTeX | Tags: Health Analytics, Signal Annotation and Interpretation) Celli F., Stepanov A. E. , Riccardi G, Tell me who you are, I’ll tell whether you agree or disagree: Prediction of agreement/disagreement in news blogs (Proceeding) IJCAI, Proc. Workshop on Natural Language Processing Meets Journalism, New York, 2016, 2016. (Abstract | Links | BibTeX | Tags: Discourse) Stepanov A. E. and Riccardi G. The UniTN End-To-End Discourse Parser in CoNLL 2016 Shared Task (Proceeding) Proc. CoNLL, Berlin, 2016, 2016. (Abstract | Links | BibTeX | Tags: Discourse) Schenk N., Chiarcos C., Donandt K., Rönnqvist S., Stepanov A. E. and Riccardi G. Do We Really Need All Those Rich Linguistic Features? A Neural Network-Based Approach to Implicit Sense Labeling (Proceeding) Proc. CoNLL, Berlin, 2016, 2016. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing) Mogessie M., Ronchetti M. and Riccardi G. Predicting Student Progress from Peer-Assessment Data (Conference) Raleigh, USA,2016, 2016. (Abstract | Links | BibTeX | Tags: Education Analytics) Mayor J. M., Stepanov A. E. and Riccardi G. EEG Semantic Decoding Using Deep Neural Networks (Conference) Workshop on Concepts, Actions and Objects, Rovereto, 2016., 2016. (Links | BibTeX | Tags: Health Analytics, Signal Annotation and Interpretation) Celli F., Riccardi G. and Alam F. Multilevel Annotation of Agreement and Disagreement in Italian News Blogs (Proceeding) Proc. Language Resources and Evaluation Conference , Portroz, 2016, 2016. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Chowdhury S., Stepanov A. E. and Riccardi G. Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it ? (Proceeding) roc. Language Resources and Evaluation Conference , Portroz, 2016, 2016. (Abstract | Links | BibTeX | Tags: Statistical Machine Translation) Danieli M., Balamurali A. R., Stepanov A. E., Favre B., Bechet F. and Riccardi G. Summarizing Behaviors: An Experiment on the Annotation of Call-Centre Conversations (Proceeding) Proc. Language Resources and Evaluation Conference , Portroz, 2016, 2016. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Riccardi G., Stepanov A. E. and Chowdhury S. Discourse Connective Detection in Spoken Conversations (Proceeding) Proc. ICASSP, Shanghai, 2016, 2016. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing, Speech Processing) Danieli M., Ghosh A., Berra E., Fulcheri C., Rabbia F., Testa E., Veglio F., Riccardi G. Automatically classifying essential arterial hypertension from physiological and daily life stress responses. (Conference) ESH 2016 – The 26th European Meeting on Hypertension and Cardiovascular Protection, Paris, France, June 10 -13 2016., 2016. (BibTeX | Tags: Health Analytics) Stepanov E., Favre B., Alam F., Chowdhury S., Singla K., Trione J., Bechet F. and Riccardi G. Automatic Summarization of Call-Center Conversations (Conference) 2015. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Stepanov E. and Riccardi G. Sentiment Polarity Classification with Low-level Discourse-based Features (Conference) 2015. (Links | BibTeX | Tags: Natural Language Processing, Signal Annotation and Interpretation) Danieli M. , Riccardi G. and Alam F. Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations (Conference) 2015. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Celli F., Ghosh A., Alam F. and Riccardi G. Information Processing and Management, Nov 2015, 2015. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Mogessie M, Riccardi G. and Ronchetti M. Predicting Students’ Final Exam Scores from their Course Activities (Article) Proc. IEEE Frontiers in Education, El Paso ( USA), 2015., 2015. (Abstract | Links | BibTeX | Tags: Education Analytics, Machine Learning) Danieli M., Ghosh A, Berra E., Testa E., Rabbia F., Veglio F. and Riccardi G. Comprendere l’Ipertensione Arteriosa Essenziale A Partire da Costrutti Psicologici e Segnali Fisiologici (Conference) 2015. (Links | BibTeX | Tags: Affective Computing, Health Analytics) Chowdhury A, Calvo M., Ghosh A., Stepanov A. E., Bayer A. O., Riccardi G., Garcia F. and Sanchis E. Selection and Aggregation Techniques for Crowdsourced Semantic Annotation Task (Conference) 2015. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Statistical Machine Translation) Chowdhury A,, Danieli M. and Riccardi G. The Role of Speakers and Context in Classifying Competition in Overlapping Speech (Conference) 2015. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Signal Annotation and Interpretation) Bayer A. O. and Riccardi G. Deep Semantic Encodings for Language Modeling (Conference) 2015. (Abstract | Links | BibTeX | Tags: Language Modeling, Signal Annotation and Interpretation, Speech Processing) Favre B., Stepanov A. E., Trione J. , Bechet F. and Riccardi G. Call Centre Conversation Summarization: A Pilot Task at Multiling 2015 (Conference) 2015. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Signal Annotation and Interpretation) Ghosh A, Mayor Torres J.M., Danieli M. and Riccardi G. Detection of Essential Hypertension with Physiological Signals from Wearable Devices (Conference) 2015. (Abstract | Links | BibTeX | Tags: Health Analytics, Machine Learning, Signal Annotation and Interpretation) Ghosh A, Danieli M. and Riccardi G. Annotation and Prediction of Stress and Workload from Physiological and Inertial Signals (Conference) 2015. (Abstract | Links | BibTeX | Tags: Health Analytics, Machine Learning, Signal Annotation and Interpretation) Stepanov E., Bayer A. O., Riccardi G., The UniTN Discourse Parser in CoNLL 2015 Shared Task (Conference) 2015. (Abstract | Links | BibTeX | Tags: Discourse, Machine Learning, Natural Language Processing) Chowdhury A, Danieli M. and Riccardi G. Annotating and Categorizing Competition in Overlap Speech (Conference) 2015. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing, Signal Annotation and Interpretation) Vinciarelli A., Esposito A., Andre’ E., Bonin F., Chetouani M., Cohn F. J., Cristani M., Fuhrmann F., Gilmartin E., Hammal Z., Heylen D., Kaiser R., Koutsombogera M., Potamianos A., Renals S., Riccardi G., Salah A. G. Cognitive Computation, pp. 1-17, April 2015, 2015. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Parodi S, Riccardi G, Castagnino N, Tortolina L, Maffei M, Zoppoli G, Nencioni A, Ballestrero A, Patrone F. Systems Medicine in Oncology: Signaling-networks modeling and new generation decision-support systems (Book) Methods Molecular Biology, Vol. 1386, Schmitz U and Wolkenhauer O (Eds): Systems Medicine, Springer Science press., 2015. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation) Celli F. and Riccardi G. and Ghosh A. CorEA: Italian News Corpus with Emotions and Agreement (Conference) 2014. (Abstract | Links | BibTeX | Tags: Affective Computing, Natural Language Processing, Signal Annotation and Interpretation) Bayer A. O. and Riccardi G. Semantic Language Models for Automatic Speech Recognition (Conference) 2014. (Abstract | Links | BibTeX | Tags: Language Modeling, Natural Language Processing, Speech Processing) Danieli M. , Riccardi G. and Alam F. Annotation of Complex Emotions in Real-Life Dialogues: The Case of Empathy (Conference) 2014. (Abstract | Links | BibTeX | Tags: Affective Computing, Signal Annotation and Interpretation) Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G. Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages (Article) IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011, 2014. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing) Griol D., Callejas Z., Lopez-Cozar R. and Riccardi G. A Domain-Independent Statistical Methodology for Dialog Management in Spoken Dialog Systems (Article) Computer Speech and Language, to be published in 2014, 2014. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Riccardi G. Towards Healthcare Personal Agents (Conference) 2014. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems ) Chowdhury S. A. and Riccardi G. Unsupervised Recognition and Clustering of Speech Overlaps in Spoken Conversations (Conference) 2014. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Chowdhury S. A., Ghosh A., Stepanov E., Bayer A. O., Riccardi G. and Klasinas I. Cross-Language Transfer of Semantic Annotation via Targeted Crowdsourcing (Conference) 2014. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Ghosh A. and Riccardi G. Recognizing Human Activities from Smartphone Signals (Conference) 2014. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing) Alam F. and Riccardi G. Predicting Personality Traits using Multimodal Information (Conference) 2014. (Abstract | Links | BibTeX | Tags: Affective Computing) Mogessie M., Riccardi G. and Ronchetti M. A Web Based Peer Interaction Framework for Improved Assessment and Supervision of Students (Conference) 2014. (Abstract | Links | BibTeX | Tags: Education Analytics, Interactive Systems) Stepanov E. and Riccardi G. Towards Cross-Domain PDTB-Style Discourse Parsing (Conference) 2014. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Signal Annotation and Interpretation) Alam F. and Riccardi G. Fusion of Acoustic, Linguistic and Psycholinguistic Features for Speaker Personality Traits Recognition (Conference) 2014. (Abstract | Links | BibTeX | Tags: Affective Computing) Stepanov E., Riccardi G. and Bayer A. O. The Development of the Multilingual LUNA Corpus for Spoken Language System Porting (Conference) 2014. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing, Statistical Machine Translation) Ghosh S., Johansson R., Riccardi G. and Tonelli S. Shallow Discourse Parsing with Conditional Random Fields (Conference) 2014. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing) Stepanov E., Kashkarev I., Bayer A. O., Riccardi G. and Ghosh A. Language Style and Domain Adaptation for Cross-Language Porting (Conference) 2013. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Statistical Machine Translation) Bayer A. O. and Riccardi G. On-line Adaptation of Semantic Models for Spoken Language Understanding (Conference) 2013. (Abstract | Links | BibTeX | Tags: Machine Learning, Speech Processing) Stepanov E. and Riccardi G. Comparative Evaluation of Argument Extraction Algorithms in Discourse Relation Parsing (Conference) 2013. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Bayer A. O. and Riccardi G. Instance-Based On-Line Language Model Adaptation (Conference) 2013. (Abstract | Links | BibTeX | Tags: Machine Learning, Speech Processing) Alam F. and Riccardi G. Comparative Study of Speaker Personality Traits Recognition in Conversational and Broadcast News Speech (Conference) 2013. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Riccardi G., Ghosh A., Chowdhury S. A. and Bayer A. O. Motivational Feedback in Crowdsourcing: a Case Study in Speech Transcriptions (Conference) 2013. (Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems , Machine Learning, Signal Annotation and Interpretation) Alam F. ,Stepanov E. and Riccardi G. Personality Traits Recognition on Social Network - Facebook (Conference) 2013. (Abstract | Links | BibTeX | Tags: Affective Computing, Natural Language Processing) Dinarelli M., Moschitti A. and Riccardi G. Discriminative Reranking for Spoken Language Understanding (Article) IEEE Trans. on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 526-539, 2012, 2012. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation, Speech Processing) Garcia F., Hurtado L. F., Segarra E., Sanchis E. and Riccardi G. Combining Machine Translation Systems for Spoken Language Understanding Portability (Conference) 2012. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing, Statistical Machine Translation) Bayer A. O. and Riccardi G. Joint Language Models for Automatic Speech Recognition and Understanding (Conference) 2012. (Abstract | Links | BibTeX | Tags: Machine Learning, Speech Processing) Ghosh S., Riccardi G. and Johansson R. Global Features for Shallow Discourse Parsing (Conference) 2012. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing) Riccardi G., Cimiano P., Potamianos A., and Unger C. Up From Limited Dialog Systems! (Conference) 2012. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems ) Ghosh S., Johansson R., Riccardi G. and Tonelli S. Improving the Recall of a Discourse Parser by Constraint-Based Postprocessing (Conference) 2012. (BibTeX | Tags: Discourse, Machine Learning, Natural Language Processing) Ivanov A. V. and Riccardi G. Kolmogorov-Smirnov Test for Feature Selection in Emotion Recognition From Speech (Conference) 2012. (Abstract | Links | BibTeX | Tags: Affective Computing) Stepanov E. and Riccardi G. Detecting General Opinions from Customer Surveys (Conference) 2011. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Natural Language Processing) Ghosh S., Tonelli S., Riccardi G. and Johansson R. End-to-End Discourse Parser Evaluation (Conference) 2011. (Abstract | Links | BibTeX | Tags: Discourse, Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Moschitti A., Chu-Carroll J., Patwardhan S., Fan J. and Riccardi G. Using Syntacting and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy! (Conference) 2011. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Ivanov A. V., Riccardi G., Sporka A. J. and Franc J. Recognition of Personality Traits from Human Spoken Conversations (Conference) 2011. (Abstract | Links | BibTeX | Tags: Affective Computing) Francesconi F., Ghosh A., Riccardi G., Ronchetti M. and Vagin A. Collecting Life Logs for Experience Based Corpora (Conference) 2011. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems ) Quarteroni S., Ivanov A. V. and Riccardi G. Simultaneous Dialog Act Segmentation and Classification from Human-Human Spoken Conversations (Conference) 2011. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Signal Annotation and Interpretation, Speech Processing) Varges S., Riccardi G., Quarteroni S. , and Ivanov A. V. POMDP Concept Policies and Task Structures for Hybrid Dialog Management (Conference) 2011. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems ) Ludwig B., Haecker M., Schaeller R., Zenker B., Ivanov A. V. and Riccardi G. Tell Me Your Needs: Assistance for Public Transport Users (Conference) 2011. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems ) Quarteroni S., Gonzalez M., Riccardi G. and Varges S. Combining User Intention and Error Modeling for Statistical Dialog Simulators (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Hypotheses Selection for Re-Ranking semantic Annotation (Conference) 2010. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Quarteroni S. and Riccardi G. Classifying Dialog Acts in Human-Human and Human-Machine Spoken Conversations (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Varges S., Quarteroni S., Riccardi G. and Ivanov A. V. Investigating Clarification Strategies in a Hybrid POMDP Dialog Manager (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Gonzalez M., Quarteroni S., Riccardi G. and Varges S. Cooperative User Models in Statistical Dialog Simulators (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Dinarelli M.,Stepanov E.,Varges S. and Riccardi G. The LUNA Spoken Dialogue System: Beyond Utterance Classification (Conference) 2010. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Ivanov A. V., Riccardi G., Ghosh S.,Tonelli S. and Stepanov E. Acoustic Correlates of Meaning Structure in Conversational Speech (Conference) 2010. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Speech Processing) Ivanov A. V., Riccardi G. Automatic Turn Segmentation in Spoken Conversations (Conference) 2010. (Abstract | Links | BibTeX | Tags: Signal Annotation and Interpretation) Nguyen T. T., Moschitti A. and Riccardi G. Kernel-based Reranking for Named-Entity Extraction (Conference) 2010. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Sara Tonelli S., Riccardi G., Prasad R. and Joshi A. Annotation of Discourse Relations for Conversational Spoken Dialogs (Conference) 2010. (Abstract | Links | BibTeX | Tags: Discourse, Natural Language Processing, Signal Annotation and Interpretation) Varges S., Quarteroni S., Riccardi G., Ivanov A. V. and Roberti P. Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems ) Quarteroni S., Dinarelli M. and Riccardi G. Ontology-Based Grounding of Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Varges S., Riccardi G., Quarteroni S., Ivanov A. V. The Exploration/Exploitation Trade-Off in Reinforcement Learning for Dialogue Management (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P. Leveraging POMDPs trained with User Simulations and Rule-Based Dialog Management in a SDS (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Griol D., Riccardi G. and Sanchis E. A Statistical Dialog Manager for the LUNA Project (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Griol D., Riccardi G. and Sanchis E. Learning the Structure of Human-Computer and Human-Human Spoken Conversations (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Quarteroni S., Riccardi G. and Dinarelli M. What's in an Ontology for Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Concept Segmentation and Labeling for Conversational Speech (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Sporka A. J., Jakub F. and Riccardi G. Can Machines Call People?- User Experience While Answering Telephone Calls Initiated by Machine (Conference) 2009. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Dinarelli M., Moschitti A. and Riccardi G. Re-Ranking Models Based on Small Training Data for Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Nguyen T. T., Moschitti A. and Riccardi G. Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing) Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P. On-Line Strategy Computation in Spoken Dialog Systems (Conference) 2009. (Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Dinarelli M., Quarteroni S., Tonelli S., Moschitti A. and Riccardi G. Annotating Spoken Dialogs: from Speech Segments to Dialog Acts and Frame Semantics (Conference) 2009. (Abstract | Links | BibTeX | Tags: Natural Language Processing, Signal Annotation and Interpretation) Dinarelli M., Moschitti A. and Riccardi G. Re-Ranking Models For Spoken Language Understanding (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Baggia P., Cutugno F., Danieli M., Pieraccini R. The Multisite 2009 EVALITA Spoken Dialog System Evaluation (Conference) 2009. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Riccardi G., Mosca N., Roberti P. and Baggia P. The Voice Multimodal Application Framework (Conference) 2009. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Riccardi G., Baggia P. and Roberti P. Spoken Dialog Systems: From Theory to Technology (Conference) 2009. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Rodríguez K. J., Dipper S., Götze M., Poesio P., Riccardi G., Raymond C., Wisniewska J. Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus (Conference) 2009. (BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) De Mori R., Bechet F., Hakkani-Tur D., McTear M., Riccardi G. and Tur G. Spoken Language Understanding (Article) IEEE Signal Processing Magazine vol. 25, pp.50-58 ,2008, 2008. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Bisazza A., Dinarelli M., Quarteroni S., Tonelli S., Moschitti A., Riccardi G. Semantic Annotations For Conversational Speech: from speech transcriptions to predicate argument structures (Conference) 2008. (BibTeX | Tags: Machine Learning, Natural Language Processing) Dinarelli M., Moschitti A., Riccardi G. Joint Generative And Discriminative Models For Spoken Language Understanding (Conference) 2008. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Coppola B., Moschitti A., Tonelli S., Riccardi G. Automatic FrameNet-Based Annotation of Conversational Speech (Conference) 2008. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing) Sebastian V., Riccardi G. and Quarteroni S. Persistent Information State in a Data-Centric Architecture (Conference) 2008. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Raymond C. and Riccardi G. Learning with Noisy Supervision for Spoken Language Understanding (Conference) 2008. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Rodríguez K., Raymond C. and Riccardi G. Active Annotation in the LUNA Italian Corpus of Spontaneous Dialogues (Conference) 2008. (BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Varges S. and Riccardi G. A Data-Centric Architecture for Data-Driven Spoken Dialog Systems (Conference) 2007. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Moschitti A., Riccardi G. and Raymond C. Spoken Language Understanding with Kernels for Syntactic/Semantic Structures (Conference) 2007. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Raymond C., Riccardi G. Generative and Discriminative Algorithms for Spoken Language Understanding (Conference) 2007. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Fogarolli A., Riccardi G. and Ronchetti M. Searching for in Information in Video Lectures (Conference) 2007. (BibTeX | Tags: Conversational and Interactive Systems ) Raymond C., Riccardi G., Rodríguez K. J. and Wisniewska J. The LUNA Corpus: an Annotation Scheme for a Multi-domain Multi-lingual Dialogue Corpus (Conference) 2007. (BibTeX | Tags: Machine Learning, Natural Language Processing, Signal Annotation and Interpretation) Riccardi G. and Baggia P. Spoken Dialog Systems: From Theory to Technology (Article) Edizione della Normale di Pisa, 2006, 2006. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Hakkani-Tur D., Riccardi G. and Tur G. An Active Approach to spoken Language Processing (Article) ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006, 2006. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Hakkani-Tur D., Bechet F., Riccardi G. and Tur G. Beyond ASR 1-Best: Using Word Confusion Network (Article) Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006, 2006. (Abstract | Links | BibTeX | Tags: Language Modeling, Speech Processing) Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M. The AT&T Spoken Language Understanding System (Article) IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006, 2006. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Coppola B., Moschitti A., Riccardi G. Shallow Semantic Parsing for Spoken Language Understanding (Conference) 2006. (Abstract | Links | BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Riccardi G., Ronchetti M. NEEDLE: Next Generation Digital Libraries (Conference) 2006. (BibTeX | Tags: Conversational and Interactive Systems ) Topkara M., Riccardi G., Hakkani-Tur D., Atallah M. J. Natural Language Watermarking: Research Challenges and Applications (Conference) 2006. (Abstract | Links | BibTeX | Tags: Natural Language Processing) Riccardi G. and Hakkani-Tur D. Grounding Emotions in Human-Machine Conversational Systems (Article) Lecture Notes in Computer Science, Springer-Verlag, , pp. 144 – 154, 2005, 2005. (Abstract | Links | BibTeX | Tags: Affective Computing, Conversational and Interactive Systems ) Riccardi G. and Hakkani-Tur D. Active Learning: Theory and Applications to Automatic Speech Recognition (Article) IEEE Trans. on Speech and Audio, vol. 13, n.4 , pp. 504-511, 2005, 2005. (Abstract | Links | BibTeX | Tags: Machine Learning) Potamianos A., Narayanan S. and Riccardi G. Adaptive Categorical Understanding for Spoken Dialogue Systems' (Article) Potamianos A., Narayanan S and Riccardi, G., 2005. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Liscombe J., Riccardi G. and Hakkani-Tur D. Using Context to Improve Emotion Detection in Spoken Dialog Systems (Conference) 2005. (Abstract | Links | BibTeX | Tags: Affective Computing, Speech Processing) Hakkani-Tur D., Tur G., Riccardi G. and Kim H. K. Error Prediction in Spoken Dialog: from Signal-to-Noise Ratio to Semantic Confidence Scores (Conference) 2005. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Goffin V., Allauzen C., Bocchieri E., Hakkani-Tur D., Ljolje A., Parthasarathy S., Rahim M., Riccardi G. and Saraclar M. The AT&T WATSON Speech Recognizer (Conference) 2005. (BibTeX | Tags: Speech Processing) Bechet F., Riccardi G. and Hakkani-Tur D. Mining Spoken dialogue Corpora for system Evaluation and Modeling (Conference) 2005. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Karahan M., Hakkani-Tur D., Riccardi G. and Tur G. Combining Classifiers for Spoken Language Understanding (Conference) 2005. (BibTeX | Tags: Machine Learning, Natural Language Processing, Speech Processing) Hakkani-Tur D., Tur G., Rahim M. and Riccardi G. Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification (Conference) 2004. (BibTeX | Tags: Language Modeling, Machine Learning, Speech Processing) Tur G., Hakkani-Tur D. and Riccardi G. Extending Boosting For Call Classification Using Word Confusion Networks (Conference) 2004. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation) Bechet F., Riccardi G. and Hakkani-Tur D. Multi-channel Sentence Classification for Spoken Dialogue Modeling (Conference) 2003. (BibTeX | Tags: Machine Learning, Speech Processing) Riccardi G. and Hakkani-Tur D. Active and Unsupervised Learning for Automatic Speech Recognition (Conference) 2003. (BibTeX | Tags: Machine Learning) Hakkani-Tur D. and Riccardi G. A General Algorithm for Word Graph Decomposition (Conference) 2003. (BibTeX | Tags: Speech Processing) Bangalore S. and Riccardi G. Stochastic Finite-State Models for Spoken Language Machine Translation (Article) Machine Translation , vol 17, n. 3, pp. 165-184, 2002 (Invited paper), 2002. (Abstract | Links | BibTeX | Tags: Statistical Machine Translation) Gorin A., Abella A., Alonso T., Riccardi G. and Wright J. Automated Natural Spoken Dialog (Article) IEEE Computer, vol. 35, n.4, pp. 51-56, April, 2002 (invited paper), 2002. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Bangalore S., Murdock V., Riccardi G. Bootstrapping Bilingual Data Using Consensus Translation for a Multilingual Instant Messaging System (Conference) 2002. (BibTeX | Tags: Statistical Machine Translation) Gokhan T., Wright J., Gorin A., Riccardi G., Tur H. Improving Spoken Language Understanding Using Word Confusion Networks (Conference) 2002. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Falavigna D., Gretter R. and Riccardi G. Acoustic and Word Lattice Based Algorithms for Confidence Scores (Conference) 2002. (BibTeX | Tags: Speech Processing) Fabbrizio G., Dutton D., Gupta N., Hollister B., Rahim M., Riccardi G., Schapire R. and Schroeter J. AT&T Help Desk (Conference) 2002. (BibTeX | Tags: Conversational and Interactive Systems , Machine Learning, Speech Processing) Rochery M., Schapire R., Rahim M., Gupta G., Riccardi G., Bangalore S., Alshawi H. and Douglas S. Combining Prior Knowledge and Boosting for Call Classification in Spoken Language Dialogue (Conference) 2002. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Tur D., Riccardi G. and Gorin A. L.
Active Learning for Automatic Speech Recognition (Conference) 2002. (BibTeX | Tags: Machine Learning) Bangalore S., Bordel G. and Riccardi G. Computing Consensus Translation from Multiple Machine Translation Systems (Conference) 2002. (BibTeX | Tags: Statistical Machine Translation) Rahim M., Riccardi G., Saul L., Wright J., Buntschuh B. and Gorin A. L. Robust Numeric Recognition in Spoken Language Dialogue (Article) Speech Communication, 34, pp. 195-212, 2001, 2001. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Gretter R. and Riccardi G. On-line learning of language models with word error probability distributions (Conference) 2001. (BibTeX | Tags: Machine Learning, Speech Processing) Rose R. C., Yao H., Riccardi G. and Wright J. H. Speech Communication, 34, pp. 321-331, 2001, 2001. (Abstract | Links | BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Bangalore S. and Riccardi G. A Finite-State Approach to Machine Translation (Conference) 2001. (BibTeX | Tags: Statistical Machine Translation) Bangalore S. and Riccardi G. Finite-state models for lexical reordering in spoken language translation (Conference) 2000. (BibTeX | Tags: Statistical Machine Translation) Riccardi G. On-line Learning of Acoustic and Lexical Units for Domain-Independent ASR (Conference) 2000. (BibTeX | Tags: Machine Learning) Petrovska-Delacretaz D., Gorin A. L., Riccardi G. and Wright J. H. Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding (Conference) 2000. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Kamm C., Narayanan S. A Spoken Dialog System for Conference/Workshop Services (Conference) 2000. (BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Gorin A. L., Wright J. H., Riccardi G., Abella A. and Alonso T. Semantic information processing of spoken language (Conference) 2000. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Bangalore S. and Riccardi G. Stochastic finite-state models for spoken language machine translation,'' Proc. Workshop on Embedded Machine Translation Systems (Conference) 2000. (BibTeX | Tags: Statistical Machine Translation) Riccardi G. and Gorin A. L. Spoken language adaptation over time and state in a natural spoken dialog system (Article) IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000, 2000. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Lin C., Kamm C. W99- A Spoken Dialog System for the ASRU99 Workshop (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Riccardi G., Bangalore S. and Sarin P. Learning head-dependency relations from unannotated corpora (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Conkie A., Riccardi G. and Rose R. C. Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events (Conference) 1999. (BibTeX | Tags: Machine Learning, Speech Processing) Potamianos A., Riccardi G. and Narayanan S. Categorical understanding using statistical N-gram models (Conference) 1999. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Rose R. C. and Riccardi G. Automatic speech recognition using acoustic confidence conditioned language models (Conference) 1999. (BibTeX | Tags: Speech Processing) Gorin A. L. and Riccardi G. Spoken language variation over time and state in a natural spoken dialog system (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rose R. C. and Riccardi G. Modeling dysfluency and background events in ASR for a natural language understanding task (Conference) 1999. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Arai K., Wright J. H., Riccardi G. and Gorin A. L. Grammar fragment acquisition using syntactic and semantic clustering (Article) Speech Communication, vol. 27, no. 1, Jan. 1999, 1999. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Gorin A. L., Petrovska-Delacretaz D., Riccardi G. and Wright J. H. Learning spoken language without transcription (Conference) 1999. (BibTeX | Tags: Machine Learning) Rahim M., Riccardi G., Wright J., Buntschuh B. and Gorin A. Robust automatic speech recognition in a natural spoken dialog (Conference) 1999. (BibTeX | Tags: Language Modeling, Speech Processing) Arai K., Wright J. H., Riccardi G. and Gorin A. L. Grammar fragment acquisition using syntactic and semantic clustering (Conference) 1998. (BibTeX | Tags: Machine Learning) Riccardi G., Potamianos A. and Narayanan S. Language model adaptation for spoken dialog systems (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Riccardi G. and Gorin A. L. Stochastic language models for speech recognition and understanding (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Rose R. C., Yao H., Riccardi G. and Wright J. Integration of utterance verification with statistical language modeling and spoken language understanding (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Riccardi G. and Bangalore S. Automatic acquisition of phrase grammars for stochastic language modeling (Conference) 1998. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Arai K., Wright J., Riccardi G. and Gorin A. Grammar fragment acquisition using syntactic and semantic clustering,'' Proc. Workshop Spoken Language Understanding & Communication (Conference) 1997. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Gorin A. L., Riccardi G. and Wright J. H. How may I help you? (Article) Speech Communication, vol. 23, Oct. 1997, pp. 113-127., 1997. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Riccardi G., Gorin A. L., Ljolje A. and Riley M. A spoken language system for automated call routing (Conference) 1997. (BibTeX | Tags: Conversational and Interactive Systems , Speech Processing) Rose R. C., Yao H., Riccardi G. and Wright J. Integrating multiple knowledge sources for utterance verification in a large vocabulary speech understanding system (Conference) 1997. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Wright J. H., Gorin A. L. and Riccardi G. Automatic acquisition of salient grammar fragments for call-type classification (Conference) 1997. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Riccardi G., Pieraccini R. and Bocchieri E. Stochastic automata for language modeling (Article) Computer Speech and Language, vol. 10(4), 1996, pp. 265-293, 1996. (Abstract | Links | BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Bocchieri E. and Riccardi G. State tying of triphone HMM's for the 1994 AT&T ARPA ATIS recognizer (Conference) 1995. (BibTeX | Tags: Speech Processing) Bocchieri E., Levin E., Pieraccini R. and Riccardi G. Understanding spontaneous speech (Article) J. of the Italian Assoc. of Artificial Intelligence, Sept. 1995, 1995. (BibTeX | Tags: Machine Learning, Signal Annotation and Interpretation, Speech Processing) Bocchieri E., Riccardi G. and Anantharaman J. The 1994 AT&T ATIS CHRONUS recognizer (Conference) 1995. (BibTeX | Tags: Language Modeling, Signal Annotation and Interpretation) Riccardi G., Bocchieri E. and Pieraccini R. Non deterministic stochastic language models for speech recognition (Conference) 1995. (BibTeX | Tags: Conversational and Interactive Systems , Language Modeling, Speech Processing) Mumolo E., Rebelli A. and Riccardi G. Improved multipulse algorithm for speech coding by means of adaptive Boltzmann annealing (Article) European Transactions on Telecommunications, vol. 5, no. 6, Nov. 1994, 1994. (BibTeX | Tags: Speech Processing) Mian G. A. and Riccardi G. A localization property of line spectrum pairs (Article) IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 536-539, Oct. 1994, 1994. (Links | BibTeX | Tags: Speech Processing) Bocchieri E. and Riccardi G. The 1993 AT&T ATIS system (Conference) 1994. (BibTeX | Tags: Language Modeling, Signal Annotation and Interpretation, Speech Processing) Menardi P., Mian G. A. and Riccardi G. Dynamic bit allocation in subband coding of wideband audio with multipulse (Conference) 1994. (BibTeX | Tags: Speech Processing) Riccardi G. and Mian G. A. Analysis-by-synthesis algorithms for low bitrate coding (Conference) 1993. (BibTeX | Tags: Speech Processing) Fratti M., Mian G. A. and Riccardi G. An approach to parameter reoptimization in multipulse based coders (Article) IEEE Trans. Speech & Audio Proc., vol. 1, no. 4, pp. 463-465, Oct. 1993, 1993. (Links | BibTeX | Tags: Speech Processing) Bocchieri E. and Riccardi G. Use of the forward-backward search for large vocabulary recognition with continuous observation density HMM's (Conference) 1993. (BibTeX | Tags: Speech Processing) Fratti M., Mian G. A. and Riccardi G. On the effectiveness of parameter reoptimization in multipulse based coders (Conference) 1992. (BibTeX | Tags: Speech Processing)2022
title = {An Unsupervised Approach to Extract Life-Events from Personal Narratives in the Mental Health Domain},
author = {Mousavi M., Negro R. and Riccardi G.},
year = {2022},
date = {2022-01-27},
publisher = {Eighth Italian Conference on Computational Linguistics},
keywords = {Machine Learning, Signal Annotation and Interpretation}
}
2021
title = {Detecting Emotion Carriers By Combining Acoustic and Lexical Representations},
author = {Bayerl P. S., Tammewar A., Riedhammer K. and Riccardi G.},
year = {2021},
date = {2021-10-01},
publisher = {IEEE Automatic Speech Recognition and Understanding Conference},
keywords = {Affective Computing, Machine Learning}
}
title = {Interpretable SincNet-based Deep Learning for Emotion Recognition in Individuals with Autism},
author = {Torres M. J., Ravanelli M., Medina-Devilliers S., Lerner D. M. and Riccardi G.},
url = {https://arxiv.org/pdf/2107.10790.pdf},
year = {2021},
date = {2021-07-18},
publisher = {IEEE Conf. Engineering in Medicine and Biology, Conference},
keywords = {Affective Computing, Autism, Machine Learning, Signal Annotation and Interpretation}
}
title = {Would you like to tell me more? Generating a corpus of psychotherapy dialogues},
author = {Mousavi M., Cervone A., Danieli M. and Riccardi G.},
url = {https://aclanthology.org/2021.nlpmc-1.1.pdf},
year = {2021},
date = {2021-07-06},
organization = {NAACL, Workshop on NLP for Medical Conversations},
keywords = {Signal Annotation and Interpretation}
}
title = {Emotion Carrier Recognition from Personal Narratives},
author = {Tammewar A., Cervone A. and Riccardi G.},
url = {https://arxiv.org/abs/2008.07481},
year = {2021},
date = {2021-06-24},
publisher = {INTERSPEECH},
keywords = {Affective Computing}
}
title = {A Participatory Design of Conversational Artificial Intelligence Agents for Mental Healthcare},
author = {Danieli M., Ciulli T, Mousavi M. and Riccardi G.},
url = {https://formative.jmir.org/2021/12/e30053},
year = {2021},
date = {2021-04-29},
journal = {Journal of Medical Internet Research (JMIR) Formative Research Journal},
volume = {5},
number = {12},
keywords = {Conversational and Interactive Systems , Signal Annotation and Interpretation}
}
title = {Facial emotions are accurately encoded in the brains of those with autism: A deep learning approach},
author = {Torres M. J., Clarkson T., Hauschild K., Luhmann C. C., Lerner D. M. and Riccardi G.},
url = {https://www.sciencedirect.com/science/article/abs/pii/S2451902221001075?via%3Dihub},
year = {2021},
date = {2021-04-16},
journal = {Biological Psychiatry: Cognitive Neuroscience and Neuroimaging},
keywords = {Affective Computing, Autism, Machine Learning, Signal Annotation and Interpretation}
}
title = {Multifunctional ISO standard Dialogue Act tagging in Italian},
author = {Roccabruna G., Cervone A. and Riccardi G.},
url = {https://disi.unitn.it/~riccardi/papers2/Clicit20-ISODAItalian.pdf},
year = {2021},
date = {2021-03-01},
publisher = {Seventh Italian Conference on Computational Linguistics},
keywords = {Signal Annotation and Interpretation}
}
2020
title = {Multifunctional ISO standard Dialogue Act tagging in Italian },
author = {Roccabruna G., Cervone A. and Riccardi G.},
editor = {Seventh Italian Conference on Computational Linguistics, 2020},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2020/12/Clicit20-ISODAItalian.pdf},
year = {2020},
date = {2020-11-02},
keywords = {Discourse, Natural Language Processing}
}
title = {Emotion Carrier Recognition from Personal Narratives },
author = {Tammewar A., Cervone A. and Riccardi G.},
editor = {arXiv.org, 2020},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2020/12/2008.07481.pdf},
year = {2020},
date = {2020-08-17},
keywords = {Affective Computing, Natural Language Processing}
}
title = {Is This Dialogue Coherent ? Learning From Dialogue Acts and Entities},
author = {Cervone A. and Riccardi G.},
editor = {SIGDial, Idaho*, 2020},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2020/12/SIGDIAL20-DialogueCoherence.pdf},
year = {2020},
date = {2020-06-01},
keywords = {Conversational and Interactive Systems , Discourse, Natural Language Processing}
}
title = {Annotation of Emotion Carriers in Personal Narratives},
author = {Tammewar A., Cervone A.,Eva-Maria Messner, Riccardi G.},
editor = {Proc. Language Resources and Evaluation Conference , Marseille*, 2020},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2020/12/LREC20-EmotionCarriers.pdf},
year = {2020},
date = {2020-05-11},
keywords = {Affective Computing, Natural Language Processing, Signal Annotation and Interpretation}
}
2019
title = {Automatic Classification of Speech Overlaps: Feature Representation and Algorithms},
author = {Chowdhury S. A., Stepanov E. A., Danieli M. and Riccardi G.},
url = {https://disi.unitn.it/~riccardi/papers2/CSL19-SpeechOverlapCategorization.pdf},
year = {2019},
date = {2019-05-01},
journal = {Computer Speech and Language},
volume = {55},
pages = {145-167},
keywords = {Discourse, Speech Analytics}
}
title = {Distinct but Effective Neural Networks for Facial Emotion Recognition in Individuals with Autism: A Deep Learning Approach},
author = {Mayor Torres, J.M., Clarkson, T., Luhmann, C. C., Riccardi, G., Lerner, M.D.},
url = {http://disi.unitn.it/~riccardi/papers2/INSAR_JMM_2019_Deep_Learning.pdf},
year = {2019},
date = {2019-01-01},
keywords = {Affective Computing, Autism, Machine Learning}
}
title = {Affective Behaviour Analysis of On-line User Interactions: Are On-line Support Groups more Therapeutic than Twitter? },
author = {Tortoreto G., Stepanov E. A., Cervone A., Dubiel M., Riccardi G.},
editor = {Association for Computational Linguistics Conference, Workshop on Social Media Mining for Health Applications, Florence},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/ACL19-AffectiveBehaviourOSG.pdf},
year = {2019},
date = {2019-01-01},
keywords = {Discourse, Health, Language Analytics, Machine Learning}
}
title = {Active Annotation: bootstrapping annotation lexicon and guidelines for supervised NLU learning },
author = {Marinelli F., Cervone A., Tortoreto G., Stepanov E. A., Di Fabbrizio G., Riccardi G.},
editor = {INTERSPEECH, Graz},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/IS19-Active_Annotation.pdf},
year = {2019},
date = {2019-01-01},
keywords = {Machine Learning, Signal Annotation and Interpretation}
}
title = {An Incremental Turn-Taking Model For Task-Oriented Dialog Systems},
author = {Coman C. A., Yoshino K., Murase Y., Nakamura S., Riccardi G.},
editor = {INTERSPEECH, Graz},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/IS19-Incremental-SLU.pdf},
year = {2019},
date = {2019-01-01},
keywords = {Conversational and Interactive Systems , Natural Language Processing}
}
title = {Modeling user context for valence prediction from narratives},
author = {Tammewar A., Cervone A., Messner E., Riccardi G.},
editor = {INTERSPEECH, Graz},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/IS19-ValencePredictionNarratives.pdf},
year = {2019},
date = {2019-01-01},
keywords = {Affective Computing, Natural Language Processing}
}
title = {Inquisitive Mind: A Conversational News Companion},
author = {Dubiel M., Cervone A., Riccardi G.},
editor = {Proc. 1st International Conference on Conversational User Interface, Dublin},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/CUI19-InquistiveMindAgent.pdf},
year = {2019},
date = {2019-01-01},
keywords = {Conversational and Interactive Systems }
}
2018
title = {Annotating and Modeling Empathy in Spoken Conversations},
author = {Alam F., Danieli M. and Riccardi G.},
url = {https://www.sciencedirect.com/science/article/pii/S088523081730133X},
year = {2018},
date = {2018-07-01},
journal = {Computer Speech and Language},
volume = {50},
pages = {40-61},
keywords = {Affective Computing, Discourse, Signal Annotation and Interpretation}
}
title = {Depression Severity Estimation from Multiple Modalities},
author = {Stepanov E. A., Lathuiliere S., Chowdhury S. A., Ghosh A., Vieriu R., Sebe N.and Riccardi G.},
editor = {AVEC Challenge},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/1711.060951.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Affective Computing, Health Analytics}
}
title = {EEG-based Single trial Classification Emotion Recognition: A Comparative Analysis in Individuals with and without Autism Spectrum Disorder},
author = {Mayor Torres, J.M., Libsack, E.J., Clarkson, T., Keifer, C.M., Riccardi, G., Lerner, M.D.},
editor = {Annual Meeting of the International Society for Autism Research, Rotterdam},
url = {https://insar.confex.com/imfar/2018/webprogram/Paper27651.html},
year = {2018},
date = {2018-01-01},
keywords = {Affective Computing, Autism}
}
title = {Enhanced Error Decoding from Error-Related Potentials using Convolutional Neural Networks},
author = {Mayor Torres, J.M., Clarkson, T., Stepanov E. A. , Luhmann C. C., Lerner, M.D., Riccardi, G.},
editor = {IEEE Conf. Engineering in Biology and Medicine Society, Honolulu},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/EMBC18-Enhanced-Error-Decoding.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Affective Computing, Autism, Machine Learning}
}
title = {Intelligent Interruption Management System to Enhance Safety and Performance in Complex Surgical and Robotic Procedures},
author = {Dias RD, Conboy HM, Gabany JM, Clarke LA, Osterweil LJ, Arney D, Goldman JM, Riccardi G, Avrunin GS, Yule SJ, Zenati MA.},
editor = {Proc. Workshop on OR 2.0 , Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis , Grenada},
year = {2018},
date = {2018-01-01},
keywords = {Interactive Systems, Machine Learning}
}
title = {Development of an Interactive Dashboard to Analyze Cognitive Workload of Surgical Teams During Complex Procedural Care},
author = {Dias R., Conboy M. H., Gabany M. J., Clarke A. L. , Osterweil J. L., Avrunin S. G., Arney D., Goldman M. J., Riccardi G., Yule J. S., Zenati A. M.},
editor = {IEEE Conf. on Cognitive and Computational Aspects of Situation Management, Boston},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/COGSIMA18ContextAwareDashboardSurgicalTeam.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Health Analytics, Interactive Systems, Machine Learning, Signal Annotation and Interpretation}
}
title = {ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents},
author = {Mezza S., Cervone A., Stepanov E. A., Tortoreto G. and Riccardi G.},
editor = {Conference on Computational Linguistics (COLING), Santa Fe},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/Coling18-ISO-DA-Tagging.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Conversational and Interactive Systems , Discourse}
}
title = {Coherence Models for Dialogue},
author = {Cervone A., Stepanov E. A. and Riccardi G.},
editor = {INTERSPEECH, Hyderabad},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/IS18-DiscourseModels.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Conversational and Interactive Systems , Discourse}
}
title = {Depression Severity Estimation from Multiple Modalities},
author = {Stepanov E. A., Lathuilierev S., Chowdhury S. A., Ghosh A., Vieriu R.D., Sebe N. and Riccardi G.},
editor = {IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), Ostrava},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/HealthCom18-Depression.pdf},
year = {2018},
date = {2018-01-01},
note = {EXCELLENT Paper AWARD},
keywords = {Health Analytics, Machine Learning}
}
title = {HEAL: A Health Analytics Intelligent Agent Platform for the acquisition and analysis of physiological signals},
author = {Ghosh A., Stepanov E. A., Mayor Torres, J.M., Danieli M. and Riccardi G.},
editor = {IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), Ostrava},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/HealthCom18-healT.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Health Analytics}
}
title = {Automatically Predicting User Ratings for Conversational Systems},
author = {Cervone A., Gambi E., Tortoreto G., Stepanov E. A., and Riccardi G.},
editor = {Fifth Italian Conference on Computational Linguistics , Turin},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/Clicit18-PredictingUserRatings.pdf},
year = {2018},
date = {2018-01-01},
keywords = {Affective Computing, Conversational and Interactive Systems , Speech Analytics}
}
title = {Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development},
author = {Gobbi J., Stepanov E. A. and Riccardi G.},
editor = {Fifth Italian Conference on Computational Linguistics , Turin},
year = {2018},
date = {2018-01-01},
keywords = {Conversational and Interactive Systems , Natural Language Processing}
}
2017
title = {Are You Stressed? Detecting High Stress from User Diaries},
author = {Ghosh A., Stepanov E. A., Danieli M., and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/09/55_PID4964285_55.pdf},
year = {2017},
date = {2017-09-11},
publisher = {8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2017) • September 11-14, 2017 • Debrecen, Hungary},
abstract = {Knowledge of the complete clinical history, lifestyle, behaviour, medication adherence data, and underlying symptoms, all affect the treatment outcomes. Collecting, analysing and using all these data, while treating a patient can often be very challenging. A doctor can spend only a limited time
with a patient. This time is often not enough to learn about all the lifestyle and underlying conditions of a patient’s life. Often patients are asked to maintain diaries of their daily activities. Diaries can help to improve adherence by increasing the consciousness of the patients, and can also serve as a way for the doctors to validate this adherence. However, diaries can be cumbersome to parse, and hence increase the task burden of the doctor. In this paper we demonstrate that automatic analysis of diaries can be used to predict the stress level of the diary writers with an F-measure of 0.70.},
keywords = {Health Analytics, Interactive Systems}
}
with a patient. This time is often not enough to learn about all the lifestyle and underlying conditions of a patient’s life. Often patients are asked to maintain diaries of their daily activities. Diaries can help to improve adherence by increasing the consciousness of the patients, and can also serve as a way for the doctors to validate this adherence. However, diaries can be cumbersome to parse, and hence increase the task burden of the doctor. In this paper we demonstrate that automatic analysis of diaries can be used to predict the stress level of the diary writers with an F-measure of 0.70.
title = {Enhanced face/audio emotion recognition: video and instance level classification using ConvNets and restricted Boltzmann Machines},
author = {Mayor Torres Juan M., Stepanov A. E.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/09/ACMWI2017MayorStepanov.pdf
http://dl.acm.org/citation.cfm?id=3109423},
year = {2017},
date = {2017-08-23},
publisher = {WI '17 Proceedings of the International Conference on Web Intelligence Pages 939-946, Leipzig, Germany - August 23 - 26, 2017},
abstract = {Face-based and audio-based emotion recognition modalities have been studied profusely obtaining successful classification rates for arousal/valence levels and multiple emotion categories settings. However, recent studies only focus their attention on classifying discrete emotion categories with a single image representation and/or a single set of audio feature descriptors. Face-based emotion recognition systems use a single image channel representations such as principal-components-analysis whitening, isotropic smoothing, or ZCA whitening. Similarly, audio emotion recognition systems use a standardized set of audio descriptors, including only averaged Mel-Frequency Cepstral coefficients. Both approaches imply the inclusion of decision-fusion modalities to compensate the limited feature separability and achieve high classification rates. In this paper, we propose two new methodologies for enhancing face-based and audio-based emotion recognition based on a single classifier decision and using the EU Emotion Stimulus dataset: (1) A combination of a Convolutional Neural Networks for frame-level feature extraction with a k-Nearest Neighbors classifier for the subsequent frame-level aggregation and video-level classification, and (2) a shallow Restricted Boltzmann Machine network for arousal/valence classification.},
keywords = {Affective Computing, Interactive Systems, Signal Annotation and Interpretation}
}
title = {Cross-Language Transfer of Semantic Annotation via Targeted Crowdsourcing: Task Design and Evaluation},
author = {Stepanov A. E., Chowdhury A. S., Bayer A. O., Ghosh A., Klasinas I., Calvo M., Sanchis E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/10.1007s10579-017-9396-5.pdf},
year = {2017},
date = {2017-01-01},
journal = {Language Resources and Evaluation, https://doi.org/10.1007/s10579-017-9396-5 , Springer, 2017},
abstract = {Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.
},
keywords = {Signal Annotation and Interpretation}
}
title = {Exploring the Role of Online Peer-Assessment as a Tool of Early Intervention},
author = {Mogessie A. M., Ronchetti M., Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/PRASE16-PeerAssessmentEarlyIntervention.pdf},
year = {2017},
date = {2017-01-01},
journal = {In Wu, Gennari, Huang, Xie and Cao Y. (eds) Emerging Technologies for Education, Lecture Notes in Computer Science, vol 10108, pp. 635-644, 2017},
abstract = {Peer-assessment in education has a long history. Although the adoption of technological tools is not a recent phenomenon, many peer-assessment studies are conducted in manual environments. Automating peer-assessment tasks improves the efficiency of the practice and provides opportunities for taking advantage of large amounts of studentgenerated data, which will readily be available in electronic format. Data from three undergraduate-level courses, which utilised an electronic peerassessment tool were explored in this study in order to investigate the relationship between participation in online peer-assessment tasks and successful course completion. It was found that students with little or no participation in optional peer-assessment activities had very low course completion rates as opposed to those with high participation. In light of this finding, it is argued that electronic peer-assessment can serve as a tool of early intervention. Further advantages of automated peerassessment are discussed and foreseen extensions of this work are outlined.},
keywords = {Interactive Systems}
}
title = {Irony Detection: from the Twittersphere to the News Space},
author = {Cervone A., Stepanov E. A., Celli F., and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/Clicit17-IronyDetection.pdf},
year = {2017},
date = {2017-01-01},
journal = {Proc. Fourth Conference on Computational Linguistics , Rome, 2017},
abstract = {Automatic detection of irony is one of the hot topics for sentiment analysis, as it changes the polarity of text. Most of the work has been focused on the detection of figurative language in Twitter data due to relative ease of obtaining annotated data, thanks to the use of hashtags to signal irony. However, irony is present generally in natural language conversations and in particular in online public fora. In this paper, we present a comparative evaluation of irony detection from Italian news fora and Twitter posts. Since irony is not a very frequent phenomenon, its automatic detection suffers from data imbalance and feature sparseness problems. We experiment with different representations of text – bag-of-words, writing style, and word embeddings to address the feature sparseness; and balancing techniques to address the data imbalance.},
keywords = {Affective Computing, Natural Language Processing}
}
title = {Automatic Community Creation for Abstractive Spoken Summarization},
author = {Singla K., Stepanov A. E., Bayer A. O., Riccardi G. and Carenini G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_NewSum_Singla_etal.pdf},
year = {2017},
date = {2017-01-01},
publisher = {EMNLP 2017 Workshop on New Frontiers in Summarization, Copenhagen, 2017},
abstract = {Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation using cosine similarity on different levels of representation: raw text, WordNet SynSet IDs, and word embeddings. We show that the abstractive summarization systems with automatic communities significantly outperform previously published results on both English and Italian corpora.},
keywords = {Natural Language Processing, Speech Processing}
}
title = {Functions of Silences towards Information Flow in Spoken Conversation},
author = {Chowdhury A. S., Stepanov E. A., Danieli M.. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_SCNLP_Chowdhury_etal.pdf},
year = {2017},
date = {2017-01-01},
publisher = {EMNLP 2017 Workshop on Speech-Centric Natural Language Processing, Copenhagen, 2017},
abstract = {Silence is an integral part of the most frequent turn-taking phenomena in spoken conversations. Silence is sized and placed within the conversation flow and it is coordinated by the speakers along with the other speech acts. The objective of this analytical study is twofold: to explore the functions of silence with duration of one second and above, towards information flow in a dyadic conversation utilizing the sequences of dialog acts present in the turns surrounding the silence itself; and to design a feature space useful for clustering the silences using a hierarchical concept formation algorithm. The resulting clusters are manually grouped into functional categories based on their similarities. It is observed that the silence plays an important role in response preparation, also can indicate speakers’ hesitation or indecisiveness. It is also observed that sometimes long silences can be used deliberately to get a forced response from another speaker thus making silence a multi-functional and an important catalyst towards information flow.
},
keywords = {Affective Computing, Speech Processing}
}
title = {Towards End-to-End Spoken Dialogue Systems},
author = {Bayer A. O., Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_IS_Bayer_etal.pdf},
year = {2017},
date = {2017-01-01},
publisher = {Proc. INTERSPEECH , Stockholm, 2017},
abstract = {Training task-oriented dialogue systems requires significant amount of manual effort and integration of many independently built components; moreover, the pipeline is prone to errorpropagation. End-to-end training has been proposed to overcome these problems by training the whole system over the utterances of both dialogue parties. In this paper we present an end-to-end spoken dialogue system architecture that is based on turn embeddings. Turn embeddings encode a robust representation of user turns with a local dialogue history and they are trained using sequence-to-sequence models. Turn embeddings are trained by generating the previous and the next turns of the dialogue and additionally perform spoken language understanding. The end-to-end spoken dialogue system is trained using the pre-trained turn embeddings in a stateful architecture that considers the whole dialogue history. We observe that the proposed spoken dialogue system architecture outperforms the models based on local-only dialogue history and it is robust to automatic speech recognition errors.},
keywords = {Interactive Systems, Speech Processing}
}
title = {Affective Behaviour Analysis of User Interactions in Support Web Group},
author = {Tortoreto G., Ghosh A., Stepanov E. A., Danieli M.. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_ECP_Tortoreto_etal_poster.pdf},
year = {2017},
date = {2017-01-01},
journal = {Proc. European Congress of Psychology, Amsterdam , 2017},
keywords = {Affective Computing},
tppubtype = {presentation}
}
title = {A Deep Learning Approach To Modeling Competitiveness In Spoken Conversations},
author = {Chowdhury A. S. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2017/10/2017_ICASSP_Chowdhury_Riccardi.pdf},
year = {2017},
date = {2017-01-01},
publisher = {Proc. ICASSP, New Orleans, 2017},
abstract = {The motivation behind the research on overlapping speech has always been dominated by the need to model humanmachine interaction for dialog systems and conversation analysis. To have more complex insights of the interlocutors’ intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocutor’s intention to grab the floor. This act could be a competitive or non-competitive act, which either signals a problem or indicates assistance in communication. In this paper, we present a Deep Learning approach to modeling competitiveness in overlapping speech using acoustic and lexical features and their combination. We compare a fully-connected feed-forward neural network to the Support Vector Machine (SVM) models on real call center human-human conversations. We have observed that feature combination with DNN (significantly) outperforms SVM models, both the individual feature baselines and the feature combination model by 4% and 2% respectively.},
keywords = {Affective Computing, Speech Processing}
}
title = {Roving Mind: a balancing act between open–domain and engaging dialogue systems},
author = {Cervone A., Tortoreto G., Mezza S., Gambi E. and Riccardi G},
editor = {1st Alexa Prize Conference, Las Vegas},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2019/11/AMZ17Conf-RovingMIndPaper.pdf},
year = {2017},
date = {2017-01-01},
keywords = {Conversational and Interactive Systems , Interactive Systems, Machine Learning, Natural Language Processing, Speech Processing}
}
2016
title = {Predicting Brexit: Classifying Agreement is Better than Sentiment and Pollsters},
author = {Celli F., Stepanov A. E., Poesio M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/Coling16PEOPLE-BrexitPaper.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. PEOPLES Workshop at COLING, Osaka 2016.},
abstract = {On June 23rd 2016, UK held the referendum which ratified the exit from the EU. While most of the traditional pollsters failed to forecast the final vote, there were online systems that hit the result with high accuracy using opinion mining techniques and big data.
Starting one month before, we collected and monitored millions of posts about the referendum from social media conversations, and exploited Natural Language Processing techniques to predict the referendum outcome. In this paper we discuss the methods used by traditional pollsters and compare it to the predictions based on different opinion mining techniques. We find that opinion mining based on agreement/disagreement classification works better than opinion mining based on polarity classification in the forecast of the referendum outcome.},
keywords = {Affective Computing, Machine Learning, Signal Annotation and Interpretation}
}
Starting one month before, we collected and monitored millions of posts about the referendum from social media conversations, and exploited Natural Language Processing techniques to predict the referendum outcome. In this paper we discuss the methods used by traditional pollsters and compare it to the predictions based on different opinion mining techniques. We find that opinion mining based on agreement/disagreement classification works better than opinion mining based on polarity classification in the forecast of the referendum outcome.
title = {The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems},
author = {Alam F., Celli F., Stepanov A. E., Ghosh A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/Coling16PEOPLE-MoodClassification.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. PEOPLES Workshop at COLING, Osaka 2016},
abstract = {In this paper, we address the issue of automatic prediction of readers’ mood from newspaper articles and comments. As online newspapers are becoming more and more similar to social media platforms, users can provide affective feedback, such as mood and emotion. We exploited the self-reported annotation of mood categories obtained from the metadata of the Italian online newspaper corriere.it to design and evaluate a system for predicting five different mood categories from news articles and comments: indignation, disappointment, worry, satisfaction and amusement. The outcome of our experiments shows that overall, bag-of-word-ngrams performs better compared to all other feature sets, however, stylometric features perform better for the mood score prediction of articles. Our study shows that such self-reported annotations can be used to design automatic systems.},
keywords = {Affective Computing, Conversational and Interactive Systems }
}
title = {How Interlocutors Coordinate with each other within Emotional Segments?},
author = {Alam F. , Chowdhury S. , Danieli M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/Coling16-CoordinationEmotionalSegments.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. COLING, Osaka, 2016.},
abstract = {In this paper, we aim to investigate the coordination of interlocutors behavior in different emotional segments. Conversational coordination between the interlocutors is the tendency of speakers to predict and adjust each other accordingly on an ongoing conversation. In order to find such a coordination, we investigated 1) lexical similarities between the speakers in each emotional segments,
2) correlation between the interlocutors using psycholinguistic features, such as linguistic styles, psychological process, personal concerns among others, and 3) relation of interlocutors turn-taking behaviors such as competitiveness. To study the degree of coordination in different emotional segments, we conducted our experiments using real dyadic conversations collected from call centers in which agent’s emotional state include empathy and customer’s emotional states include anger and frustration. Our findings suggest that the most coordination occurs between the interlocutors inside anger segments, where as, a little coordination was observed when the agent was empathic, even though an increase in the amount of non-competitive overlaps was observed. We found no significant difference between anger and frustration segment in terms of turn-taking behaviors. However, the length of pause significantly decreases in the preceding segment of anger where as it increases in the preceding segment of frustration.},
keywords = {Affective Computing, Conversational and Interactive Systems , Discourse, Interactive Systems}
}
2) correlation between the interlocutors using psycholinguistic features, such as linguistic styles, psychological process, personal concerns among others, and 3) relation of interlocutors turn-taking behaviors such as competitiveness. To study the degree of coordination in different emotional segments, we conducted our experiments using real dyadic conversations collected from call centers in which agent’s emotional state include empathy and customer’s emotional states include anger and frustration. Our findings suggest that the most coordination occurs between the interlocutors inside anger segments, where as, a little coordination was observed when the agent was empathic, even though an increase in the amount of non-competitive overlaps was observed. We found no significant difference between anger and frustration segment in terms of turn-taking behaviors. However, the length of pause significantly decreases in the preceding segment of anger where as it increases in the preceding segment of frustration.
title = {Can We Detect Speakers' Empathy? A Real-Life Case Study},
author = {Alam F. , Danieli M.. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/CogInfo16-Detect-speakers-empathy.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. IEEE International Conference on Cognitive Infocommunications, Wrocław, 2016},
abstract = {In the context of automatic behavioral analysis, we aim to classify empathy in human-human spoken conversations. Empathy underlies to the human ability to recognize, understand and to react to emotions, attitudes, and beliefs of others. While empathy and its different manifestations (e.g., sympathy, compassion) have been widely studied in psychology, very little has been done in the computational research literature. In this paper, we present a case study where we investigate the occurrences of empathy in call-centers human-human conversations. In order to propose an operational definition of empathy, we adopt the modal model of emotions, where the appraisal processes of the unfolding of emotional states are modeled sequentially. We have designed a binary classification system to detect the presence of empathic manifestations in spoken conversations. The automatic classification system has been evaluated using spoken conversations by exploiting and comparing perform},
keywords = {Discourse, Interactive Systems}
}
title = {Exploring the Role of Online Peer-Assessment as a Tool of Early Intervention},
author = {Mogessie M., Ronchetti M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/PRASE16-PeerAssessmentEarlyIntervention.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. International Conference on Web-based Learning, Rome, 2016.},
abstract = {Peer-assessment in education has a long history. Although the adoption of technological tools is not a recent phenomenon, many peer-assessment studies are conducted in manual environments. Automating peer-assessment tasks improves the efficiency of the practice and provides opportunities for taking advantage of large amounts of studentgenerated data, which will readily be available in electronic format. Data from three undergraduate-level courses, which utilised an electronic peerassessment tool were explored in this study in order to investigate the relationship between participation in online peer-assessment tasks and successful course completion. It was found that students with little or no participation in optional peer-assessment activities had very low course
completion rates as opposed to those with high participation. In light of this finding, it is argued that electronic peer-assessment can serve as a tool of early intervention. Further advantages of automated peerassessment are discussed and foreseen extensions of this work are outlined.},
keywords = {Interactive Systems}
}
completion rates as opposed to those with high participation. In light of this finding, it is argued that electronic peer-assessment can serve as a tool of early intervention. Further advantages of automated peerassessment are discussed and foreseen extensions of this work are outlined.
title = {Predicting User Satisfaction from Turn-Taking in Spoken Conversations},
author = {Chowdhury S. , Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/IS16-PredictingUserSatisfactionTurnTaking.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. INTERSPEECH, San Francisco, 2016.},
abstract = {User satisfaction is an important aspect of the user experience while interacting with objects, systems or people. Traditionally user satisfaction is evaluated a-posteriori via spoken or written questionnaires or interviews. In automatic behavioral analysis we aim at measuring the user emotional states and its descriptions as they unfold during the interaction. In our approach, user satisfaction is modeled as the final state of a sequence of emotional states and given ternary values positive, negative, neutral. In this paper, we investigate the discriminating power of turn-taking in predicting user satisfaction in spoken conversations. Turn-taking is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing. In this paper, we train different characterization of turn-taking, such as competitiveness of the speech overlaps. To extract turn-taking features we design a turn segmentation and labeling system that incorporates lexical and acoustic information. Given a human-human spoken dialog, our system automatically infers any of the three values of the state of the user satisfaction. We evaluate the classification system on real-life call-center human-human dialogs. The comparative performance analysis shows that the contribution of the turn-taking features outperforms both prosodic and lexical features.},
keywords = {Affective Computing, Conversational and Interactive Systems , Discourse, Interactive Systems, Signal Annotation and Interpretation, Speech Processing}
}
title = {HEAL-T: An Efficient PPG-based Heart-Rate And IBI Estimation Method During Physical Exercise},
author = {Mayor J. M., Ghosh A., Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/EUSIPCO16-HearRateAlgorithm.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. EUSIPCO, Budapest, 2016},
abstract = {Photoplethysmography (PPG) is a simple, unobtrusive and low-cost technique for measuring blood volume pulse (BVP) used in heart-rate (HR) estimation. However, PPG based heart-rate monitoring devices are often affected by motion artifacts in on-the-go scenarios, and can yield a noisy BVP signal reporting erroneous HR values. Recent studies have proposed spectral decomposition techniques (e.g. M-FOCUSS, Joint-Sparse-Spectrum) to reduce motion artifacts and increase HR estimation accuracy, but at the cost of high computational load. The singular-value-decomposition and recursive calculations present in these approaches are not feasible for the implementation in real-time continuous-monitoring scenarios. In this paper, we propose an efficient HR estimation method based on a combination of fast-ICA, RLS and BHW filter stages that avoids sparse signal reconstruction, while maintaining a high HR estimation accuracy. The proposed method outperforms the state-of-the-art systems on the publicly available TROIKA data set both in terms of HR estimation accuracy (absolute error of 2.25 ± 1.93 bpm) and computational load.
},
keywords = {Health Analytics, Signal Annotation and Interpretation}
}
title = {Tell me who you are, I’ll tell whether you agree or disagree: Prediction of agreement/disagreement in news blogs},
author = {Celli F., Stepanov A. E. , Riccardi G,},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/IJCAI16-AgreementDisagreementBlogger.pdf},
year = {2016},
date = {2016-11-01},
publisher = {IJCAI, Proc. Workshop on Natural Language Processing Meets Journalism, New York, 2016},
abstract = {In this paper we address the problem of the automatic classification of agreement and disagreement in news blog conversations. We analyze bloggers, messages and relations between messages. We show that relational features (such as replying to a message or to an article) and information about bloggers (such as personality, stances, mood and discourse structure priors) boost the performance in the classification of agreement/disagreement more than features extracted from messages, such as sentiment, style and general discourse relation senses. We also show that bloggers exhibit reply patterns significantly correlated to the expression of agreement or disagreement. Moreover, we show that there are also discourse structures correlated to agreement (expansion relations), and to disagreement (contingency relations).},
keywords = {Discourse}
}
title = {The UniTN End-To-End Discourse Parser in CoNLL 2016 Shared Task},
author = {Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/CoNLL16-UNITNDiscourseParser.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. CoNLL, Berlin, 2016},
abstract = {Penn Discourse Treebank style discourse parsing is a composite task of detecting explicit and non-explicit discourse relations, their connective and argument spans, and assigning a sense to these relations. Due to the composite nature of the task, the end-to-end performance is greatly affected by the error propagation. This paper describes the end-to-end discourse parser for English submitted to the CoNLL 2016 Shared Task on Shallow Discourse Parsing with the main focus of the parser being on argument spans and the reduction of global error through model selection. In the end-to-end closed-track evaluation the parser achieves F-measure of 0.2510 outperforming the best system of the previous year.},
keywords = {Discourse}
}
title = {Do We Really Need All Those Rich Linguistic Features? A Neural Network-Based Approach to Implicit Sense Labeling},
author = {Schenk N., Chiarcos C., Donandt K., Rönnqvist S., Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/CoNLL16-FrankfurtUNITNDiscourseParser.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. CoNLL, Berlin, 2016},
abstract = {We describe our contribution to the CoNLL 2016 Shared Task on shallow discourse parsing.1 Our system extends the two best parsers from previous year’s competition by integration of a novel implicit sense labeling component. It is grounded on a highly generic, language-independent feedforward neural network architecture incorporating weighted word embeddings for argument spans which obviates the need for (traditional) hand-crafted features.
Despite its simplicity, our system overall outperforms all results from 2015 on 5 out of 6 evaluation sets for English and achieves an absolute improvement in F1-score of 3.2% on the PDTB test section for non-explicit sense classification.},
keywords = {Discourse, Natural Language Processing}
}
Despite its simplicity, our system overall outperforms all results from 2015 on 5 out of 6 evaluation sets for English and achieves an absolute improvement in F1-score of 3.2% on the PDTB test section for non-explicit sense classification.
title = {Predicting Student Progress from Peer-Assessment Data},
author = {Mogessie M., Ronchetti M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/EDM16-PredictingUserProgressFromPeers.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Raleigh, USA,2016},
abstract = {Predicting overall student performance and monitoring progress have attracted more attention in the past five years than before. Demographic data, high school grades and test result constitute much of the data used for building prediction models. This study demonstrates how data from a peer-assessment environment can be used to build student progress prediction models. The possibility for automating tasks, coupled with minimal teacher intervention, make peer-assessment an efficient platform for gathering student activity data in a continuous manner. The performances of the prediction models are comparable with those trained using other educational data. Considering the fact that the student performance data do not include any teacher assessments, the results are more than encouraging and shall convince the reader that peerassessment has yet another advantage to offer in the realm of automated student progress monitoring and supervision.},
keywords = {Education Analytics}
}
title = {EEG Semantic Decoding Using Deep Neural Networks},
author = {Mayor J. M., Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/CAOS16-EEGDeepNN.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Workshop on Concepts, Actions and Objects, Rovereto, 2016.},
keywords = {Health Analytics, Signal Annotation and Interpretation}
}
title = {Multilevel Annotation of Agreement and Disagreement in Italian News Blogs},
author = {Celli F., Riccardi G. and Alam F.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/LREC16-ADR.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. Language Resources and Evaluation Conference , Portroz, 2016},
abstract = {In this paper, we present a corpus of news blog conversations in Italian annotated with gold standard agreement/disagreement relations at message and sentence levels. This is the first resource of this kind in Italian. From the analysis of ADRs at the two levels emerged that agreement annotated at message level is consistent and generally reflected at sentence level, and that the structure of disagreement is more complex. The manual error analysis revealed that this resource is useful not only for the analysis of argumentation, but also for the detection of irony/sarcasm in online debates. The corpus and annotation tool are available for research purposes on request.
},
keywords = {Signal Annotation and Interpretation}
}
title = {Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it ?},
author = {Chowdhury S., Stepanov A. E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/LREC16-DA-StandardISO.pdf},
year = {2016},
date = {2016-11-01},
publisher = {roc. Language Resources and Evaluation Conference , Portroz, 2016},
abstract = {Spoken conversation corpora often adapt existing Dialogue Act (DA) annotation specifications, such as DAMSL, DIT++, etc., to task specific needs, yielding incompatible annotations; thus, limiting corpora re-usability. Recently accepted ISO standard for DA annotation – Dialogue Act Markup Language (DiAML) – is designed as domain and application independent. Moreover, the clear separation of dialogue dimensions and communicative functions, coupled with the hierarchical organization of the latter, allows for classification at different levels of granularity. However, re-annotating existing corpora with the new scheme might require significant effort. In this paper we test the utility of the ISO standard through comparative evaluation of the corpus-specific legacy and the semi-automatically transferred DiAML DA annotations on supervised dialogue act classification task. To test the domain independence of the resulting annotations, we perform cross-domain and data aggregation evaluation. Compared to the legacy annotation scheme, on the Italian LUNA Human-Human corpus, the DiAML annotation scheme exhibits better cross-domain and data aggregation classification performance, while maintaining comparable in-domain performance.},
keywords = {Statistical Machine Translation}
}
title = {Summarizing Behaviors: An Experiment on the Annotation of Call-Centre Conversations},
author = {Danieli M., Balamurali A. R., Stepanov A. E., Favre B., Bechet F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/LREC16-Summarization.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. Language Resources and Evaluation Conference , Portroz, 2016},
abstract = {Annotating and predicting behavioural aspects in conversations is becoming critical in the conversational analytics industry. In this paper we look into inter-annotator agreement of agent behaviour dimensions on two call center corpora. We find that the task can be annotated consistently over time, but that subjectivity issues impacts the quality of the annotation. The reformulation of some of the annotated dimensions is suggested in order to improve agreement.
},
keywords = {Signal Annotation and Interpretation}
}
title = {Discourse Connective Detection in Spoken Conversations},
author = {Riccardi G., Stepanov A. E. and Chowdhury S.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2016/11/ICASSP16-DiscourseConnective.pdf},
year = {2016},
date = {2016-11-01},
publisher = {Proc. ICASSP, Shanghai, 2016},
abstract = {Discourse parsing is an important task in Language Understanding with applications to human-human and human-machine communication modeling. However, most of the research has focused on written text, and parsers heavily rely on syntactic parsers that themselves have low performance on dialog data. In our work, we address the problem of analyzing the semantic relations between discourse units in human-human spoken conversations. In particular, in this paper we focus on the detection of discourse connectives which are the predicate of such relations. The discourse relations are drawn from the Penn Discourse Treebank annotation model and adapted to a domain-specific Italian human-human spoken conversations. We study the relevance of lexical and acoustic context in predicting discourse connectives. We observe that both lexical and acoustic context have mixed effect on the prediction of specific connectives. While the oracle of using lexical and acoustic contextual feature combinations is F1 = 68.53, the lexical context alone significantly outperforms the baseline by more than 10 points with F1 = 64.93.},
keywords = {Discourse, Natural Language Processing, Speech Processing}
}
title = {Automatically classifying essential arterial hypertension from physiological and daily life stress responses.},
author = {Danieli M., Ghosh A., Berra E., Fulcheri C., Rabbia F., Testa E., Veglio F., Riccardi G.},
year = {2016},
date = {2016-06-10},
publisher = {ESH 2016 – The 26th European Meeting on Hypertension and Cardiovascular Protection, Paris, France, June 10 -13 2016.},
keywords = {Health Analytics}
}
2015
title = {Automatic Summarization of Call-Center Conversations},
author = {Stepanov E., Favre B., Alam F., Chowdhury S., Singla K., Trione J., Bechet F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/ASRU15-SpeechSummarizationDemo.pdf},
year = {2015},
date = {2015-12-13},
journal = {IEEE ASRU, Scottsdale, 2015. ( Demo )},
abstract = {This paper presents the SENSEI approach to automatic summarization which represents spoken conversation in terms of factual descriptors and abstractive synopses that are useful for quality assurance supervision in call centers. We demonstrate a browser-based graphical system that automatically produces these summary descriptors and synopses.
Index Terms— Summarization, Speech Processing},
keywords = {Natural Language Processing, Speech Processing}
}
Index Terms— Summarization, Speech Processing
title = {Sentiment Polarity Classification with Low-level Discourse-based Features},
author = {Stepanov E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/AI-IA15-DiscourseConnect4SenntimentClassification.pdf},
year = {2015},
date = {2015-12-03},
journal = {Proc. CLIC-it, Trento, 2015},
keywords = {Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Emotion Unfolding and Affective Scenes: A Case Study in Spoken Conversations},
author = {Danieli M. , Riccardi G. and Alam F.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/ICMI15-ERM4CT.pdf},
year = {2015},
date = {2015-11-09},
journal = {Proc. ICMI Workshop on Representations and Modelling for Companion Systems, Seattle, 2015},
abstract = {The manifestation of human emotions evolves over time and space. Most of the work on affective computing research is limited to the association of context-free signal segments, such as utterances and images, to basic emotions. In this paper, we discuss the hypothesis that interpreting emotions
requires a conceptual description of their dynamics within the context of their manifestations. We describe the unfolding of emotions through the proposed affective scene framework.
Affective scenes are defined in terms of who first expresses the variation in their emotional state in a conversation, how this affects the other speaker’s emotional appraisal and response, and which modifications occur from the initial through the final state of the scene. This conceptual framework is applied and evaluated on real human-human conversations drawn from call centers. We show that the automatic classification of affective scenes achieves more than satisfactory results and it benefits from acoustic, lexical and psycholinguistic features of the speech and linguistics signals.},
keywords = {Affective Computing, Speech Processing}
}
requires a conceptual description of their dynamics within the context of their manifestations. We describe the unfolding of emotions through the proposed affective scene framework.
Affective scenes are defined in terms of who first expresses the variation in their emotional state in a conversation, how this affects the other speaker’s emotional appraisal and response, and which modifications occur from the initial through the final state of the scene. This conceptual framework is applied and evaluated on real human-human conversations drawn from call centers. We show that the automatic classification of affective scenes achieves more than satisfactory results and it benefits from acoustic, lexical and psycholinguistic features of the speech and linguistics signals.
title = {In the mood for Sharing Contents: Emotions, personality and interaction styles in the diffusion of news},
author = {Celli F., Ghosh A., Alam F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/IPM15-MoodSharing.pdf},
year = {2015},
date = {2015-11-01},
journal = {Information Processing and Management, Nov 2015},
abstract = {In this paper, we analyze the influence of Twitter users in sharing news articles that may affect the readers’ mood. We collected data of more than 2000 Twitter users who shared news articles from Corriere.it, a daily newspaper that provides mood metadata annotated by readers on a voluntary basis. We automatically annotated personality types and communication styles of Twitter users and analyzed the correlations between personality, communication style, Twitter metadata (such as followig and folllowers) and the type of mood associated to the articles they shared. We also run a feature selection task, to find the best predictors of positive and negative mood sharing, and a classification task. We automatically predicted positive and negative mood sharers with 61.7% F1-measure.},
keywords = {Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Predicting Students’ Final Exam Scores from their Course Activities},
author = {Mogessie M, Riccardi G. and Ronchetti M.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/FIE15-PredictingStudentsScores.pdf},
year = {2015},
date = {2015-10-21},
journal = {Proc. IEEE Frontiers in Education, El Paso ( USA), 2015.},
abstract = {A common approach to the problem of predicting students’ exam scores has been to base this prediction on the previous educational history of students. In this paper, we present a model that bases this prediction on students’ performance on several tasks assigned throughout the duration
of the course. In order to build our prediction model, we use data from a semi-automated peer-assessment system implemented in two undergraduate-level computer science courses, where students ask questions about topics discussed in class, answer questions from their peers, and rate answers provided by their peers. We then construct features that are used to build several multiple linear regression models. We use the Root Mean Squared Error (RMSE) of the prediction models to evaluate their performance. Our final model, which has recorded an RMSE of 2.9326 for one course and 3.4383 for another on predicting grades on a scale of 18 to 30, is built using 14 features that capture various activities of students. Our work has possible implications in the MOOC arena and in similar online course administration systems.},
keywords = {Education Analytics, Machine Learning}
}
of the course. In order to build our prediction model, we use data from a semi-automated peer-assessment system implemented in two undergraduate-level computer science courses, where students ask questions about topics discussed in class, answer questions from their peers, and rate answers provided by their peers. We then construct features that are used to build several multiple linear regression models. We use the Root Mean Squared Error (RMSE) of the prediction models to evaluate their performance. Our final model, which has recorded an RMSE of 2.9326 for one course and 3.4383 for another on predicting grades on a scale of 18 to 30, is built using 14 features that capture various activities of students. Our work has possible implications in the MOOC arena and in similar online course administration systems.
title = {Comprendere l’Ipertensione Arteriosa Essenziale A Partire da Costrutti Psicologici e Segnali Fisiologici},
author = {Danieli M., Ghosh A, Berra E., Testa E., Rabbia F., Veglio F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/SIIA-2015-ComprendereIpertensionePoster.pdf},
year = {2015},
date = {2015-09-24},
journal = {XXXI Congresso Societa’ Italiana Ipertensione Arteriosa, Bologna, 2015. (Poster)},
keywords = {Affective Computing, Health Analytics}
}
title = {Selection and Aggregation Techniques for Crowdsourced Semantic Annotation Task},
author = {Chowdhury A, Calvo M., Ghosh A., Stepanov A. E., Bayer A. O., Riccardi G., Garcia F. and Sanchis E.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/IS15-crowdsourcingSelectionAggregation.pdf},
year = {2015},
date = {2015-09-06},
journal = {Proc. INTERSPEECH , Dresden, 2015},
abstract = {Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated.
However, complex tasks like semantic annotation transfer require workers to take simultaneous decisions on chunk segmentation and labeling while acquiring on-the-go domainspecific knowledge. The increased task complexity may generate low judgment agreement and/or poor performance. The goal of this paper is to cope with these crowdsourcing requirements with semantic priming and unsupervised quality control mechanisms. We aim at an automatic quality control that takes into account different levels of workers’ expertise and annotation task performance. We investigate the judgment selection and aggregation techniques on the task of cross-language semantic annotation
transfer. We propose stochastic modeling techniques to estimate the task performance of a worker on a particular judgment with respect to the whole worker group. These estimates are used for the selection of the best judgments as well as weighted consensus-based annotation aggregation. We demonstrate that the technique is useful for increasing the quality of collected annotations.
Index Terms: Crowdsourcing, Annotation, Cross-language porting},
keywords = {Machine Learning, Signal Annotation and Interpretation, Statistical Machine Translation}
}
However, complex tasks like semantic annotation transfer require workers to take simultaneous decisions on chunk segmentation and labeling while acquiring on-the-go domainspecific knowledge. The increased task complexity may generate low judgment agreement and/or poor performance. The goal of this paper is to cope with these crowdsourcing requirements with semantic priming and unsupervised quality control mechanisms. We aim at an automatic quality control that takes into account different levels of workers’ expertise and annotation task performance. We investigate the judgment selection and aggregation techniques on the task of cross-language semantic annotation
transfer. We propose stochastic modeling techniques to estimate the task performance of a worker on a particular judgment with respect to the whole worker group. These estimates are used for the selection of the best judgments as well as weighted consensus-based annotation aggregation. We demonstrate that the technique is useful for increasing the quality of collected annotations.
Index Terms: Crowdsourcing, Annotation, Cross-language porting
title = {The Role of Speakers and Context in Classifying Competition in Overlapping Speech},
author = {Chowdhury A,, Danieli M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/IS15-OverlapClassification.pdf},
year = {2015},
date = {2015-09-06},
journal = {Proc. INTERSPEECH , Dresden, 2015},
abstract = {Overlapping speech is one of the most frequently occurring events in the course of human-human conversations. Understanding the dynamics of overlapping speech is crucial for conversational analysis and for modeling human-machine dialog. Overlapping speech may signal the speaker’s intention to grab the floor with a competitive vs non-competitive act. In this paper, we study the role of speakers, whether they initiate (overlapper) or not (overlappee) the overlap, and the context of the
event. The speech overlap may be explained and predicted by the dialog context, the linguistic or acoustic descriptors. Our goal is to understand whether the competitiveness of the overlap is best predicted by the overlapper, the overlappee, the context or by their combinations. For each overlap and its context we have extracted acoustic, linguistic, and psycholinguistic features and combined decisions from the best classification models. The evaluation of the classifier has been carried out over call center human-human conversations. The results show that the complete knowledge of speakers’ role and context highly contribute to the classification results when using acoustic and
psycholinguistic features. Our findings also suggest that the lexical selections of the overlapper are good indicators of speaker’s competitive or non-competitive intentions.
Index Terms: Spoken Conversation, Automatic Classification, Overlapping Speech, Discourse, Context},
keywords = {Conversational and Interactive Systems , Machine Learning, Signal Annotation and Interpretation}
}
event. The speech overlap may be explained and predicted by the dialog context, the linguistic or acoustic descriptors. Our goal is to understand whether the competitiveness of the overlap is best predicted by the overlapper, the overlappee, the context or by their combinations. For each overlap and its context we have extracted acoustic, linguistic, and psycholinguistic features and combined decisions from the best classification models. The evaluation of the classifier has been carried out over call center human-human conversations. The results show that the complete knowledge of speakers’ role and context highly contribute to the classification results when using acoustic and
psycholinguistic features. Our findings also suggest that the lexical selections of the overlapper are good indicators of speaker’s competitive or non-competitive intentions.
Index Terms: Spoken Conversation, Automatic Classification, Overlapping Speech, Discourse, Context
title = {Deep Semantic Encodings for Language Modeling},
author = {Bayer A. O. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/IS15-SELMAutoEncoding.pdf},
year = {2015},
date = {2015-09-06},
journal = {Proc. INTERSPEECH , Dresden, 2015},
abstract = {Word error rate (WER) is not an appropriate metric for spoken language systems (SLS) because lower WER does not necessarily yield better understanding performance. Therefore, language models (LMs) that are used in SLS should be trained to jointly optimize transcription and understanding performance. Semantic LMs (SELMs) are based on the theory of frame semantics and incorporate features of frames and meaning bearing words (target words) as semantic context when training LMs.
The performance of SELMs is affected by the errors on the ASR and the semantic parser output. In this paper we address the problem of coping with such noise in the training phase of the neural network-based architecture of LMs. We propose the use of deep autoencoders for the encoding of semantic context while accounting for ASR errors. We investigate the optimization of SELMs both for transcription and understanding by using deep semantic encodings. Deep semantic encodings
suppress the noise introduced by the ASR module, and enable SELMs to be optimized adequately. We assess the understanding performance by measuring the errors made on target words and we achieve 3.7% relative improvement over recurrent neural network LMs.
Index Terms: Language Modeling, Semantic Language Models, Recurrent Neural Networks, Deep Autoencoders},
keywords = {Language Modeling, Signal Annotation and Interpretation, Speech Processing}
}
The performance of SELMs is affected by the errors on the ASR and the semantic parser output. In this paper we address the problem of coping with such noise in the training phase of the neural network-based architecture of LMs. We propose the use of deep autoencoders for the encoding of semantic context while accounting for ASR errors. We investigate the optimization of SELMs both for transcription and understanding by using deep semantic encodings. Deep semantic encodings
suppress the noise introduced by the ASR module, and enable SELMs to be optimized adequately. We assess the understanding performance by measuring the errors made on target words and we achieve 3.7% relative improvement over recurrent neural network LMs.
Index Terms: Language Modeling, Semantic Language Models, Recurrent Neural Networks, Deep Autoencoders
title = {Call Centre Conversation Summarization: A Pilot Task at Multiling 2015},
author = {Favre B., Stepanov A. E., Trione J. , Bechet F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/SIGDIAL15-CallCenterConversationSummarizationPilot.pdf},
year = {2015},
date = {2015-09-02},
journal = {Proc. SigDial, Prague, 2015.},
abstract = {This paper describes the results of the Call Centre Conversation Summarization task at Multiling’15. The CCCS task consists in generating abstractive synopses from call centre conversations between a caller and an agent. Synopses are summariesof the problem of the caller, and how it is solved by the agent. Generating them is a very challenging task given that deep analysis of the dialogs and text generation are necessary. Three languages were addressed: French, Italian and English translations
of conversations from those two languages. The official evaluation metric was ROUGE-2. Two participants submitted a total of four systems which had trouble beating the extractive baselines. The
datasets released for the task will allow more research on abstractive dialog summarization.},
keywords = {Conversational and Interactive Systems , Signal Annotation and Interpretation}
}
of conversations from those two languages. The official evaluation metric was ROUGE-2. Two participants submitted a total of four systems which had trouble beating the extractive baselines. The
datasets released for the task will allow more research on abstractive dialog summarization.
title = {Detection of Essential Hypertension with Physiological Signals from Wearable Devices},
author = {Ghosh A, Mayor Torres J.M., Danieli M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/EMBC15-HypertensionMonitoringPrediction.pdf},
year = {2015},
date = {2015-08-25},
journal = { Proc. EMBC, IEEE Conf. Engineering in Biology and Medicine Society, Milan, 2015.},
abstract = {Early detection of essential hypertension can support the prevention of cardiovascular disease, a leading cause of death. The traditional method of identification of hypertension involves periodic blood pressure measurement using brachial cuff-based measurement devices. While these devices are noninvasive, they require manual setup for each measurement and they are not suitable for continuous monitoring. Research has shown that physiological signals such as Heart Rate Variability,
which is a measure of the cardiac autonomic activity, is correlated with blood pressure. Wearable devices capable of measuring physiological signals such as Heart Rate, Galvanic Skin Response, Skin Temperature have recently become ubiquitous. However, these signals are not accurate and are prone to noise due to different artifacts. In this paper a) we present a data collection protocol for continuous non-invasive monitoring of physiological signals from wearable devices; b) we implement
signal processing techniques for signal estimation; c) we explore how the continuous monitoring of these physiological signals can be used to identify hypertensive patients; d) We conduct a pilot study with a group of normotensive and hypertensive patients to test our techniques. We show that physiological signals extracted from wearable devices can distinguish between these two groups with high accuracy.},
keywords = {Health Analytics, Machine Learning, Signal Annotation and Interpretation}
}
which is a measure of the cardiac autonomic activity, is correlated with blood pressure. Wearable devices capable of measuring physiological signals such as Heart Rate, Galvanic Skin Response, Skin Temperature have recently become ubiquitous. However, these signals are not accurate and are prone to noise due to different artifacts. In this paper a) we present a data collection protocol for continuous non-invasive monitoring of physiological signals from wearable devices; b) we implement
signal processing techniques for signal estimation; c) we explore how the continuous monitoring of these physiological signals can be used to identify hypertensive patients; d) We conduct a pilot study with a group of normotensive and hypertensive patients to test our techniques. We show that physiological signals extracted from wearable devices can distinguish between these two groups with high accuracy.
title = {Annotation and Prediction of Stress and Workload from Physiological and Inertial Signals},
author = {Ghosh A, Danieli M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/EMBC15-StressMonitoringPrediction.pdf},
year = {2015},
date = {2015-08-25},
journal = {Proc. EMBC, IEEE Conf. Engineering in Biology and Medicine Society, Milan, 2015.},
abstract = { Continuous daily stress and high workload can have negative effects on individuals’ physical and mental wellbeing. It has been shown that physiological signals may support the prediction of stress and workload. However, previous research is limited by the low diversity of signals concurring to such predictive tasks and controlled experimental design. In this paper we present 1) a pipeline for continuous and real-life acquisition of physiological and inertial signals 2) a mobile agent application for on-the-go event annotation and 3) an end-to-end signal processing and classification system for
stress and workload from diverse signal streams. We study physiological signals such as Galvanic Skin Response (GSR), Skin Temperature (ST), Inter Beat Interval (IBI) and Blood Volume Pulse (BVP) collected using a non-invasive wearable device; and inertial signals collected from accelerometer and
gyroscope sensors. We combine them with subjects’ inputs (e.g. event tagging) acquired using the agent application, and their emotion regulation scores. In our experiments we explore signal combination and selection techniques for stress and workload prediction from subjects whose signals have been recorded continuously during their daily life. The end-toend classification system is described for feature extraction, signal artifact removal, and classification. We show that a combination of physiological, inertial and user event signals provides accurate prediction of stress for real-life users and signals.},
keywords = {Health Analytics, Machine Learning, Signal Annotation and Interpretation}
}
stress and workload from diverse signal streams. We study physiological signals such as Galvanic Skin Response (GSR), Skin Temperature (ST), Inter Beat Interval (IBI) and Blood Volume Pulse (BVP) collected using a non-invasive wearable device; and inertial signals collected from accelerometer and
gyroscope sensors. We combine them with subjects’ inputs (e.g. event tagging) acquired using the agent application, and their emotion regulation scores. In our experiments we explore signal combination and selection techniques for stress and workload prediction from subjects whose signals have been recorded continuously during their daily life. The end-toend classification system is described for feature extraction, signal artifact removal, and classification. We show that a combination of physiological, inertial and user event signals provides accurate prediction of stress for real-life users and signals.
title = {The UniTN Discourse Parser in CoNLL 2015 Shared Task},
author = {Stepanov E., Bayer A. O., Riccardi G.,},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/CoNLL15-UNITNDiscourseParser.pdf},
year = {2015},
date = {2015-07-30},
journal = {Proc. CoNLL, Bejiing, 2015. Runner-up Discurse Parsing Shared-Task},
abstract = {Penn Discourse Treebank style discourse parsing is a composite task of identifying discourse relations (explicit or nonexplicit), their connective and argument spans, and assigning a sense to these relations from the hierarchy of senses. In this paper we describe University of Trento parser submitted to CoNLL 2015 Shared Task on Shallow Discourse Parsing. The span detection tasks for explicit relations are cast as token-level sequence labeling.
The argument span decisions are conditioned on relations’ being intra- or intersentential.
Non-explicit relation detection and sense assignment tasks are cast as classification. In the end-to-end closedtrack evaluation, the parser ranked second with a global F-measure of 0.2184},
keywords = {Discourse, Machine Learning, Natural Language Processing}
}
The argument span decisions are conditioned on relations’ being intra- or intersentential.
Non-explicit relation detection and sense assignment tasks are cast as classification. In the end-to-end closedtrack evaluation, the parser ranked second with a global F-measure of 0.2184
title = {Annotating and Categorizing Competition in Overlap Speech},
author = {Chowdhury A, Danieli M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/ICASSP15-OverlapClassification.pdf},
year = {2015},
date = {2015-04-19},
journal = {Proc. ICASSP, Brisbane, 2015.},
abstract = {Overlapping speech is a common and relevant phenomenon in human conversations, reflecting many aspects of discourse dynamics. In this paper, we focus on the pragmatic role of overlaps in turn-in-progress, where it can be categorized as competitive or non-competitive. Previous studies on these
two categories have mostly relied on controlled scenarios and small datasets. In our study, we focus on call center data, with customers and operators engaged in problem-solving tasks. We propose and evaluate an annotation scheme for these two overlap categories in the context of spontaneous and in-vivo human conversations. We analyze the distinctive predictive characteristics of a very large set of high-dimensional acoustic feature. We obtained a significant improvement in classification results as well as significant reduction in the feature set size.},
keywords = {Discourse, Natural Language Processing, Signal Annotation and Interpretation}
}
two categories have mostly relied on controlled scenarios and small datasets. In our study, we focus on call center data, with customers and operators engaged in problem-solving tasks. We propose and evaluate an annotation scheme for these two overlap categories in the context of spontaneous and in-vivo human conversations. We analyze the distinctive predictive characteristics of a very large set of high-dimensional acoustic feature. We obtained a significant improvement in classification results as well as significant reduction in the feature set size.
title = {Open Challenges in Modelling, Analysis and Synthesis of Human Behaviour in Human–Human and Human–Machine Interactions},
author = {Vinciarelli A., Esposito A., Andre’ E., Bonin F., Chetouani M., Cohn F. J., Cristani M., Fuhrmann F., Gilmartin E., Hammal Z., Heylen D., Kaiser R., Koutsombogera M., Potamianos A., Renals S., Riccardi G., Salah A. G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2015/11/CogniComp15-ChallengesHHHM-Review.pdf},
year = {2015},
date = {2015-04-01},
journal = {Cognitive Computation, pp. 1-17, April 2015},
abstract = {Modelling, analysis and synthesis of behaviour are the subject of major efforts in computing science,
especially when it comes to technologies that make sense of human–human and human–machine interactions. This article outlines some of the most important issues that still need to be addressed to ensure substantial progress in the field, namely (1) development and adoption of virtuous data collection and sharing practices, (2) shift in the focus of interest from individuals to dyads and groups, (3) endowment of artificial agents with internal representations of users and context, (4) modelling of cognitive and semantic processes underlying social behaviour and (5) identification of application domains and strategies for moving from laboratory to the real-world products.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
especially when it comes to technologies that make sense of human–human and human–machine interactions. This article outlines some of the most important issues that still need to be addressed to ensure substantial progress in the field, namely (1) development and adoption of virtuous data collection and sharing practices, (2) shift in the focus of interest from individuals to dyads and groups, (3) endowment of artificial agents with internal representations of users and context, (4) modelling of cognitive and semantic processes underlying social behaviour and (5) identification of application domains and strategies for moving from laboratory to the real-world products.
title = {Systems Medicine in Oncology: Signaling-networks modeling and new generation decision-support systems},
author = {Parodi S, Riccardi G, Castagnino N, Tortolina L, Maffei M, Zoppoli G, Nencioni A, Ballestrero A, Patrone F.},
year = {2015},
date = {2015-01-01},
publisher = {Methods Molecular Biology, Vol. 1386, Schmitz U and Wolkenhauer O (Eds): Systems Medicine, Springer Science press.},
keywords = {Machine Learning, Signal Annotation and Interpretation}
}
2014
title = {CorEA: Italian News Corpus with Emotions and Agreement},
author = {Celli F. and Riccardi G. and Ghosh A.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/AI-IA14-CoreaAgreementDisagreement-1.pdf},
year = {2014},
date = {2014-11-01},
journal = {Conferenza di Linguistica Computazionale, Pisa, 2014},
abstract = {In this paper, we describe an Italian corpus of news blogs, including bloggers’ emotion tags, and annotations of agreement relations amongst blogger- comment pairs. The main contributions of this work are: the formalization of the agreement relation, the design of guide- lines for its annotation, the quantitative analysis of the annotators’ agreement.},
keywords = {Affective Computing, Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Semantic Language Models for Automatic Speech Recognition},
author = {Bayer A. O. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT14-SemanticSLM.pdf},
year = {2014},
date = {2014-10-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Lake Tahoe, 2014},
abstract = {We are interested in the problem of semantics-aware train- ing of language models (LMs) for Automatic Speech Recog- nition (ASR). Traditional language modeling research have ignored semantic constraints and focused on limited size his- tories of words. Semantic structures may provide information to capture lexically realized long-range dependencies as well as the linguistic scene of a speech utterance. In this paper, we present a novel semantic LM (SELM) that is based on the the- ory of frame semantics. Frame semantics analyzes meaning of words by considering their role in the semantic frames they occur and by considering their syntactic properties. We show that by integrating semantic frames and target words into re- current neural network LMs we can gain significant improve- ments in perplexity and word error rates. We have evaluated the semantic LM on the publicly available ASR baselines on the Wall Street Journal (WSJ) corpus. SELMs achieve 50% and 64% relative reduction in perplexity compared to n-gram models by using frames and target words respectively. In ad- dition, 12% and 7% relative improvements in word error rates are achieved by SELMs on the Nov’92 and},
keywords = {Language Modeling, Natural Language Processing, Speech Processing}
}
title = {Annotation of Complex Emotions in Real-Life Dialogues: The Case of Empathy},
author = {Danieli M. , Riccardi G. and Alam F.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/AI-IA14-EmpathyCorpusAnnotation.pdf},
year = {2014},
date = {2014-01-01},
journal = {Conferenza di Linguistica Computazionale, Pisa, 2014},
abstract = {In this paper we discuss the problem of an-notating emotions in reallife spoken conversations by investigating the special case of empathy. We propose an annotation model based on the situated theories of emotions. The annotation scheme is directed to ob-serve the natural unfolding of empathy during the conversations. The key component of the protocol is the identification of the annotation unit based both on linguistic and paralinguistic cues. In the last part of the paper we evaluate the reliability of the annotation model.},
keywords = {Affective Computing, Signal Annotation and Interpretation}
}
title = {Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages},
author = {Han S., Dinarelli M., Raymond C., Lefevre F., Lehnen P., De Mori R., Moschitti A., Ney H. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP10-MultSLU.pdf},
year = {2014},
date = {2014-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, vol. 19, no. 6, pp. 1569-1583, 2011},
abstract = {One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.},
keywords = {Signal Annotation and Interpretation, Speech Processing}
}
title = {A Domain-Independent Statistical Methodology for Dialog Management in Spoken Dialog Systems},
author = {Griol D., Callejas Z., Lopez-Cozar R. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/CSL14-StatisticalDialogueManager1.pdf},
year = {2014},
date = {2014-01-01},
journal = {Computer Speech and Language, to be published in 2014},
abstract = {This paper proposes a domain-independent statistical methodology to develop dialog managers for spoken dialog systems. Our methodology employs a data-driven classification procedure to generate abstract representations of system turns taking into account the previous history of the dialog. A statistical framework is also introduced for the development and evaluation of dialog systems created using the methodology, which is based on a dialog simulation technique. The benefits and flexibility of the proposed methodology have been validated by developing statistical dialog managers for four spoken dialog systems of different complexity, designed for different languages (English, Italian, and Spanish) and application domains (from transactional to problem-solving tasks). The evaluation results show that the proposed methodology allows rapid development of new dialog managers as well as to explore new dialog strategies, which permit developing new enhanced versions of already existing systems. © 2013 Elsevier Ltd. All rights reserved.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Towards Healthcare Personal Agents},
author = {Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICMI14-HealthcarePersonalAgentsPositionPaper.pdf},
year = {2014},
date = {2014-01-01},
journal = {ACM International Conference on Multimodal Interaction, Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportuinities and Challenges, Istanbul 2014},
abstract = {For a long time, the research on human-machine conversation and interaction has inspired futuristic visions created by film directors and science fiction writers. Nowadays, there has been great progress towards this end by the extended community of artificial intelligence scientists spanning from computer scientists to neuroscientists. In this paper we first review the tension between the latest advances in the technology of virtual agents and the limitations in the modality, complexity and sociability of conversational agent interaction. Then we identify a research challenge and target for the research and technology community. We need to create a vision and research path to create personal agents that are perceived as devoted assistants and counselors in helping end-users managing their own healthcare and well-being throughout their life. Such target is a high-payoff research agenda with high-impact on the society. In this position paper, following a review of the state-of-the-art in conversational agent technology, we discuss the challenges in spoken/multimodal/multi-sensorial interaction needed to support the development of Healthcare Personal Agents.},
keywords = {Conversational and Interactive Systems }
}
title = {Unsupervised Recognition and Clustering of Speech Overlaps in Spoken Conversations},
author = {Chowdhury S. A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS-SLAM14-OverlapUnsuperClustering1.pdf},
year = {2014},
date = {2014-01-01},
journal = {Workshop on Speech, Language and Audio in Multimedia, Penang, Malaysia, 2014},
abstract = {We are interested in understanding speech overlaps and their function in human conversations. Previous studies on speech overlaps have relied on supervised methods, small corpora and controlled conversations. The characterization of overlaps based on timing, semantic and discourse function requires an analysis over a very large feature space. In this study, we discover and characterize speech overlaps using unsupervised techniques. Overlapping segments of human-human spoken conversations were extracted and transcribed using a large vocabulary Automatic Speech Recognizer (ASR). Each overlap instance is automatically projected onto a highdimensional space of acoustic and lexical features. Then, we used unsupervised clustering to discover distinct and wellseparated clusters that may correspond to different discourse functions (e.g., competitive, non-competitive overlap). We have evaluated recognition and clustering algorithms over a large set of real human-human spoken conversations. The automatic system separates two classes of speech overlaps. The clusters have been comparatively evaluated in terms of feature distributions and their contribution to the automatic classification of the clusters.},
keywords = {Signal Annotation and Interpretation}
}
title = {Cross-Language Transfer of Semantic Annotation via Targeted Crowdsourcing},
author = {Chowdhury S. A., Ghosh A., Stepanov E., Bayer A. O., Riccardi G. and Klasinas I.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS14-CrossLanguageSemanticTransfer-.pdf},
year = {2014},
date = {2014-01-01},
journal = { INTERSPEECH, Singapore, 2014},
abstract = {The development of a natural language speech application requires the process of semantic annotation. Moreover multilingual porting of speech applications increases the cost and complexity of the annotation task. In this paper we address the problem of transferring the semantic annotation of the source language corpus to a low-resource target language via crowdsourcing. The current crowdsourcing approach faces several problems. First, the available crowdsourcing platforms have skewed distribution of language speakers. Second, speech applications require domain-specific knowledge. Third, the lack of reference target language annotation, makes crowdsourcing worker control very difficult. In this paper we address these issues on the task of cross-language transfer of domain-specific semantic annotation from an Italian spoken language corpus to Greek, via targeted crowdsourcing. The issue of domain knowledge transfer is addressed by priming the workers with the source language concepts. The lack of reference annotation is coped with a consensus-based annotation algorithm. The quality of annotation transfer is assessed using source language references and inter-annotator agreement. We demonstrate that the proposed computational methodology is viable and achieves acceptable annotation quality.},
keywords = {Signal Annotation and Interpretation}
}
title = {Recognizing Human Activities from Smartphone Signals},
author = {Ghosh A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ACMM14-RecognizingHumanActivitiesSensor-Signals.pdf},
year = {2014},
date = {2014-01-01},
journal = {ACM International Conference on Multimedia, Orlando, 2014},
abstract = {In context-aware computing, Human Activity Recognition (HAR) aims to understand the current activity of users from their connected sensors. Smartphones with their various sensors are opening a new frontier in building human-centered applications for understanding users’ personal and world contexts. While in-lab and controlled activity recognition systems have yielded very good results, they do not perform well under in-the-wild scenarios. The objective of this paper is to 1) Investigate how audio signal can complement and improve other on-board sensors (accelerometer and gyroscope) for activity recognition; 2) Design and evaluate the fusion of such multiple signal streams to optimize performance and sampling rate. We show that fusion of these signal streams, including audio, achieves high performance even at very low sampling rates; 3) Evaluate the performance of the multistream human activity recognition under different real enduser activity conditions.},
keywords = {Discourse, Natural Language Processing}
}
title = {Predicting Personality Traits using Multimodal Information},
author = {Alam F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ACMM14-PersonalitytraitsFromMultimodal.pdf},
year = {2014},
date = {2014-01-01},
journal = {ACM International Conference on Multimedia, Workshop on Computational Personality Recognition, Orlando, 2014},
abstract = {Measuring personality traits has a long story in psychology where analysis has been done by asking sets of questions. These question sets (inventories) have been designed by investigating lexical terms that we use in our daily communications or by analyzing biological phenomena. Whether consciously or unconsciously we express our thoughts and behaviors when communicating with others, either verbally, non-verbally or using visual expressions. Recently, research in behavioral signal processing has focused on automatically measuring personality traits using different behavioral cues that appear in our daily communication. In this study, we present an approach to automatically recognize personality traits using a video-blog (vlog) corpus, consisting of transcription and extracted audio-visual features. We analyzed linguistic, psycholinguistic and emotional features in addition to the audio-visual features provided with the dataset. We also studied whether we can better predict a trait by identifying other traits. Using our best models we obtained very promising results compared to the official baseline.},
keywords = {Affective Computing}
}
title = {A Web Based Peer Interaction Framework for Improved Assessment and Supervision of Students},
author = {Mogessie M., Riccardi G. and Ronchetti M.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EDMEDIA14-Peer-based-Assessment.pdf},
year = {2014},
date = {2014-01-01},
journal = {Proc. World Conference on Educational Multimedia, Hypermedia and Telecommunications, Tampere, 2014},
abstract = {One of the challenges of both traditional and contemporary instructional media in higher education is creating a sustainable teaching-learning environment that ensures continuous engagement of students and provides efficient means of assessing their performance. We present a peer-based framework designed to increase active participation of students in courses administered in both traditional and blended learning settings. Students are continuously engaged in attention-eliciting tasks and are assessed by their peers. The framework allows semi-automated assignment of tasks to students. In completing these tasks, students ask questions, answer questions from other students, evaluate the quality of question-answer pairs and rate answers provided by their peers. We have implemented this framework in several courses and run extensive experiments to assess the effectiveness of our approach. We discuss the results of students’ surveys of this approach, which, in general, has been perceived as useful in achieving better learning outcomes.},
keywords = {Education Analytics, Interactive Systems}
}
title = {Towards Cross-Domain PDTB-Style Discourse Parsing},
author = {Stepanov E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EACL14-PDTBCrossDomain.pdf},
year = {2014},
date = {2014-01-01},
journal = {EACL Workshop on Health Text Mining and Information Analysis, Gothenburg, 2014},
abstract = {Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. With the availability of annotated corpora (Penn Discourse Treebank) statistical discourse parsers were developed. In the literature it was shown that the discourse parsing subtasks of discourse connective detection and relation sense classification do not generalize well across domains. The biomedical domain is of particular interest due to the availability of Biomedical Discourse Relation Bank (BioDRB). In this paper we present cross-domain evaluation of PDTB trained discourse relation parser and evaluate feature-level domain adaptation techniques on the argument span extraction subtask. We demonstrate that the subtask generalizes well across domains.},
keywords = {Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Fusion of Acoustic, Linguistic and Psycholinguistic Features for Speaker Personality Traits Recognition},
author = {Alam F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP14-FusionAcLingPsychPersonalityReco.pdf},
year = {2014},
date = {2014-01-01},
journal = {ICASSP, Florence, 2014},
abstract = {Behavioral analytics is an emerging research area that aims at automatic understanding of human behavior. For the advancement of this research area, we are interested in the problem of learning the personality traits from spoken data. In this study, we investigated the contribution of different types of speech features to the automatic recognition of Speaker Personality Trait (SPT) across diverse speech corpora (broadcast news and spoken conversation). We have extracted acoustic, linguistic, and psycholinguistic features and modeled their combination as input to the classification task. For the classification, we used Sequential Minimal Optimization for Support Vector Machine (SMO) together with Relief feature selection. The present study shows different levels of performance for automatically selected feature sets, and overall improved performance with their combination across diverse corpora.},
keywords = {Affective Computing}
}
title = {The Development of the Multilingual LUNA Corpus for Spoken Language System Porting},
author = {Stepanov E., Riccardi G. and Bayer A. O.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/LREC14-MultilingualLUNACorpusPorting.pdf},
year = {2014},
date = {2014-01-01},
journal = {LREC , Reykjavik, 2014},
abstract = {The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we address the problem of creating multilingual aligned corpora and its evaluation in the context of a spoken language understanding (SLU) porting task. We discuss the challenges of the manual creation of multilingual corpora, as well as present the algorithms for the creation of multilingual SLU via Statistical Machine Translation (SMT).},
keywords = {Natural Language Processing, Speech Processing, Statistical Machine Translation}
}
title = {Shallow Discourse Parsing with Conditional Random Fields},
author = {Ghosh S., Johansson R., Riccardi G. and Tonelli S.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/I11-1120.pdf},
year = {2014},
date = {2014-01-01},
journal = {International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, 2011},
abstract = {Parsing discourse is a challenging natural language processing task. In this paper we take a data driven approach to identify arguments of explicit discourse connectives. In contrast to previous work we do not make any assumptions on the span of arguments and consider parsing as a token-level sequence labeling task. We design the argument segmentation task as a cascade of decisions based on conditional random fields (CRFs). We train the CRFs on lexical, syntactic and semantic features extracted from the Penn Discourse Treebank and evaluate feature combinations on the commonly used test split. We show that the best combination of features includes syntactic and semantic features. The comparative error analysis investigates the performance variability over connective types and argument positions.},
keywords = {Discourse, Natural Language Processing}
}
2013
title = {Language Style and Domain Adaptation for Cross-Language Porting},
author = {Stepanov E., Kashkarev I., Bayer A. O., Riccardi G. and Ghosh A.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU13-LangAdaptCrossPorting.pdf},
year = {2013},
date = {2013-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, 2013},
abstract = {Automatic cross-language Spoken Language Understanding porting is plagued by two limitations. First, SLU are usually trained on limited domain corpora. Second, language pair resources (e.g. aligned corpora) are scarce or unmatched in style (e.g. news vs. conversation). We present experiments on automatic style adaptation of the input for the translation systems and their output for SLU. We approach the problem of scarce aligned data by adapting the available parallel data to the target domain using limited in-domain and larger web crawled close-to-domain corpora. SLU performance is optimized by re-ranking its output with Recurrent Neural Network-based joint language model. We evaluate end-to-end SLU porting on close and distant language pairs: Spanish - Italian and Turkish - Italian; and achieve significant improvements both in translation quality and SLU performance.},
keywords = {Signal Annotation and Interpretation, Statistical Machine Translation}
}
title = {On-line Adaptation of Semantic Models for Spoken Language Understanding},
author = {Bayer A. O. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU13-OnLineSLUAdapt.pdf},
year = {2013},
date = {2013-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, 2013},
abstract = {Spoken language understanding (SLU) systems extract semantic information from speech signals, which is usually mapped onto concept sequences. The distribution of concepts in dialogues are usually sparse. Therefore, general models may fail to model the concept distribution for a dialogue and semantic models can benefit from adaptation. In this paper, we present an instance-based approach for on-line adaptation of semantic models. We show that we can improve the performance of an SLU system on an utterance, by retrieving relevant instances from the training data and using them for on-line adapting the semantic models. The instancebased adaptation scheme uses two different similarity metrics edit distance and n-gram match score on three different tokenizations; word-concept pairs, words, and concepts. We have achieved a significant improvement (6% relative) in the understanding performance by conducting re-scoring experiments on the n-best lists that the SLU outputs. We have also applied a two-level adaptation scheme, where adaptation is first applied to the automatic speech recognizer (ASR) and then to the SLU.},
keywords = {Machine Learning, Speech Processing}
}
title = {Comparative Evaluation of Argument Extraction Algorithms in Discourse Relation Parsing},
author = {Stepanov E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IWPT13-DiscourseParsing.pdf},
year = {2013},
date = {2013-01-01},
journal = {International Conference on Parsing Technologies, Nara, 2013},
abstract = {Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. One of the subtasks of discourse parsing is the extraction of argument spans of discourse relations. A relation can be either intra-sentential – to have both arguments in the same sentence – or inter-sentential – to have arguments span over different sentences. There are two approaches to the task. In the first approach the parser decision is not conditioned on whether the relation is intra- or intersentential. In the second approach relations are parsed separately for each class. The paper evaluates the two approaches to argument span extraction on Penn Discourse Treebank explicit relations; and the problem is cast as token-level sequence labeling. We show that processing intra- and inter-sentential relations separately, reduces the task complexity and significantly outperforms the single model approach.},
keywords = {Natural Language Processing, Speech Processing}
}
title = {Instance-Based On-Line Language Model Adaptation},
author = {Bayer A. O. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS13-InstanceBaseLearning.pdf},
year = {2013},
date = {2013-01-01},
journal = {INTERSPEECH, Lyon, 2013},
abstract = {Language model (LM) adaptation is needed to improve the performance of language-based interaction systems. There are two important issues regarding LM adaptation; the selection of the target data set and the mathematical adaptation model. In the literature, usually statistics are drawn from the target data set (e.g. cache model) to augment (e.g. linearly) background statistical language models, as in the case of automatic speech recognition (ASR). Such models are relatively inexpensive to train, however they do not provide the necessary high-dimensional language context description needed for language-based interaction. Instance-based learning provides high-dimensional description of the lexical, semantic, or dialog context. In this paper, we present an instance-based approach to LM adaptation. We show that by retrieving similar instances from the training data and adapting the model with these instances, we can improve the performance of LMs. We propose two different similarity metrics for instance retrieval, edit distance and n-gram match score. We have performed instance-based adaptation on feed forward neural network LMs (NNLMs) to re-score n-best lists for ASR on the LUNA corpus, which includes conversational speech. We have achieved significant improvements in word error rate (WER) by using instance-based on-line LM adaptation on feed forward NNLMs.},
keywords = {Machine Learning, Speech Processing}
}
title = {Comparative Study of Speaker Personality Traits Recognition in Conversational and Broadcast News Speech},
author = {Alam F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS13-Personality.pdf},
year = {2013},
date = {2013-01-01},
journal = {INTERSPEECH, Lyon, 2013},
abstract = {Natural human-computer interaction requires, in addition to understand what the speaker is saying, recognition of behavioral descriptors, such as speaker’s personality traits (SPTs). The complexity of this problem depends on the high variability and dimensionality of the acoustic, lexical and situational context manifestations of the SPTs. In this paper, we present a comparative study of automatic speaker personality trait recognition from speech corpora that differ in the source speaking style (broadcast news vs. conversational) and experimental context. We evaluated different feature selection algorithms such as information gain, relief and ensemble classification methods to address the high dimensionality issues. We trained and evaluated ensemble methods to leverage base learners, using three different algorithms such as SMO (Sequential Minimal Optimization for Support Vector Machine), RF (Random Forest) and Adaboost. After that, we combined them using majority voting and stacking methods. Our study shows that, performance of the system greatly benefits from feature selection and ensemble methods across corpora.},
keywords = {Affective Computing, Speech Processing}
}
title = {Motivational Feedback in Crowdsourcing: a Case Study in Speech Transcriptions},
author = {Riccardi G., Ghosh A., Chowdhury S. A. and Bayer A. O.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS13-Crowdsourcing.pdf},
year = {2013},
date = {2013-01-01},
journal = {INTERSPEECH, Lyon, 2013},
abstract = {A widely used strategy in human and machine performance enhancement is achieved through feedback. In this paper we investigate the effect of live motivational feedback on motivating crowds and improving the performance of the crowdsourcing computational model. The provided feedback allows workers to react in real-time and review past actions (e.g. word deletions); thus, to improve their performance on the current and future (sub) tasks. The feedback signal can be controlled via clean (e.g. expert) supervision or noisy supervision in order to trade-off between cost and target performance of the crowd-sourced task. The feedback signal is designed to enable crowd workers to improve their performance at the (sub) task level. The type and performance of feedback signal is evaluated in the context of a speech transcription task. Amazon Mechanical Turk (AMT) platform is used to transcribe speech utterances from different corpora. We show that in both clean (expert) and noisy (worker/turker) real-time feedback conditions the crowd workers are able to provide significantly more accurate transcriptions in a shorter time.},
keywords = {Affective Computing, Conversational and Interactive Systems , Machine Learning, Signal Annotation and Interpretation}
}
title = {Personality Traits Recognition on Social Network - Facebook},
author = {Alam F. ,Stepanov E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/WCPR13-PersonalityTraitFacebook.pdf},
year = {2013},
date = {2013-01-01},
journal = {ICWSM, Workshop on Computational Personality Recognition, Boston, 2013},
abstract = {For the natural and social interaction it is necessary to understand human behavior. Personality is one of the fundamental aspects, by which we can understand behavioral dispositions. It is evident that there is a strong correlation between users’ personality and the way they behave on online social network (e.g., Facebook). This paper presents automatic recognition of Big-5 personality traits on social network (Facebook) using users’ status text. For the automatic recognition we studied different classification methods such as SMO (Sequential Minimal Optimization for Support Vector Machine), Bayesian Logistic Regression (BLR) and Multinomial Naïve Bayes (MNB) sparse modeling. Performance of the systems had been measured using macro-averaged precision, recall and F1; weighted average accuracy (WA) and un-weighted average accuracy (UA). Our comparative study shows that MNB performs better than BLR and SMO for personality traits recognition on the social network data.},
keywords = {Affective Computing, Natural Language Processing}
}
2012
title = {Discriminative Reranking for Spoken Language Understanding},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP11-DRMSLU1.pdf},
year = {2012},
date = {2012-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, vol. 20, no. 2, pp. 526-539, 2012},
abstract = {Spoken language understanding (SLU) is concerned with the extraction of meaning structures from spoken utterances. Recent computational approaches to SLU, e.g., conditional random fields (CRFs), optimize local models by encoding several features, mainly based on simple n-grams. In contrast, recent works have shown that the accuracy of CRF can be significantly improved by modeling long-distance dependency features. In this paper, we propose novel approaches to encode all possible dependencies between features and most importantly among parts of the meaning structure, e.g., concepts and their combination. We rerank hypotheses generated by local models, e.g., stochastic finite state transducers (SFSTs) or CRF, with a global model. The latter encodes a very large number of dependencies (in the form of trees or sequences) by applying kernel methods to the space of all meaning (sub) structures. We performed comparative experiments between SFST, CRF, support vector machines (SVMs), and our proposed discriminative reranking models (DRMs) on representative conversational speech corpora in three different languages: the ATIS (English), the MEDIA (French), and the LUNA (Italian) corpora. These corpora have been collected within three different domain applications of increasing complexity: informational, transactional, and problem-solving tasks, respectively. The results show that our DRMs consistently outperform the state-of-the-art models based on CRF.},
keywords = {Signal Annotation and Interpretation, Speech Processing}
}
title = {Combining Machine Translation Systems for Spoken Language Understanding Portability},
author = {Garcia F., Hurtado L. F., Segarra E., Sanchis E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT12-MTPortSLU.pdf},
year = {2012},
date = {2012-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Miami, 2012},
abstract = {We are interested in the problem of learning Spoken Language Understanding (SLU) models for multiple target languages. Learning such models requires annotated corpora, and porting to different languages would require corpora with parallel text translation and semantic annotations. In this paper we investigate how to learn a SLU model in a target language starting from no target text and no semantic annotation. Our proposed algorithm is based on the idea of exploiting the diversity (with regard to performance and coverage) of multiple translation systems to transfer statistically stable word-toconcept mappings in the case of the romance language pair, French and Spanish. Each translation system performs differently at the lexical level (wrt BLEU). The best translation system performances for the semantic task are gained from their combination at different stages of the portability methodology. We have evaluated the portability algorithms on the French MEDIA corpus, using French as the source language and Spanish as the target language. The experiments show the effectiveness of the proposed methods with respect to the source language SLU baseline.},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing, Statistical Machine Translation}
}
title = {Joint Language Models for Automatic Speech Recognition and Understanding},
author = {Bayer A. O. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT12-NNLMSLU.pdf},
year = {2012},
date = {2012-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Miami, 2012},
abstract = {Language models (LMs) are one of the main knowledge sources used by automatic speech recognition (ASR) and Spoken Language Understanding (SLU) systems. In ASR systems they are optimized to decode words from speech for a transcription task. In SLU systems they are optimized to map words into concept constructs or interpretation representations. Performance optimization is generally designed independently for ASR and SLU models in terms of word accuracy and concept accuracy respectively. However, the best word accuracy performance does not always yield the best understanding performance. In this paper we investigate how LMs originally trained to maximize word accuracy can be parametrized to account for speech understanding constraints and maximize concept accuracy. Incremental reduction in concept error rate is observed when a LM is trained on word-to-concept mappings. We show how to optimize the joint transcription and understanding task performance in the lexical-semantic relation space.},
keywords = {Machine Learning, Speech Processing}
}
title = {Global Features for Shallow Discourse Parsing},
author = {Ghosh S., Riccardi G. and Johansson R.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SigDial12-DParsing.pdf},
year = {2012},
date = {2012-01-01},
journal = {SIGDial, Seoul, 2012},
abstract = {A coherently related group of sentences may be referred to as a discourse. In this paper we address the problem of parsing coherence relations as defined in the Penn Discourse Tree Bank (PDTB). A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. We present techniques on using inter-sentential or sentence-level (global), data-driven, nongrammatical features in the task of parsing discourse. The parser model follows up previous approach based on using tokenlevel (local) features with conditional random fields for shallow discourse parsing, which is lacking in structural knowledge of discourse. The parser adopts a twostage approach where first the local constraints are applied and then global constraints are used on a reduced weighted search space (n-best). In the latter stage we experiment with different rerankers trained on the first stage n-best parses, which are generated using lexico-syntactic local features. The two-stage parser yields significant improvements over the best performing model of discourse parser on the PDTB corpus.},
keywords = {Machine Learning, Natural Language Processing}
}
title = {Up From Limited Dialog Systems!},
author = {Riccardi G., Cimiano P., Potamianos A., and Unger C.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/NAACL12-Portdial.pdf},
year = {2012},
date = {2012-01-01},
journal = {NAACL, Workshop on Future Directions and Needs in Spoken Dialog Community, Montreal, 2012},
abstract = {In the last two decades, information-seeking spoken dialog systems (SDS) have moved from research prototypes to real-life commercial applications. Still, dialog systems are limited by the scale, complexity of the task and coverage of knowledge required by problemsolving machines or mobile personal assistants. Future spoken interaction are required to be multilingual, understand and act on large scale knowledge bases in all its forms (from structured to unstructured). The Web research community have striven to build large scale and open multilingual resources (e.g. Wikipedia) and knowledge bases (e.g. Yago). We argue that a) it is crucial to leverage this massive amount of Web lightly structured knowledge and b) the scale issue can be addressed collaboratively and design open standards to make tools and resources available to the whole speech and language community.},
keywords = {Conversational and Interactive Systems }
}
title = {Improving the Recall of a Discourse Parser by Constraint-Based Postprocessing},
author = {Ghosh S., Johansson R., Riccardi G. and Tonelli S.},
year = {2012},
date = {2012-01-01},
journal = {LREC Istanbul, 2012},
keywords = {Discourse, Machine Learning, Natural Language Processing}
}
title = {Kolmogorov-Smirnov Test for Feature Selection in Emotion Recognition From Speech},
author = {Ivanov A. V. and Riccardi G.},
url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6289074&contentType=Conference+Publications},
year = {2012},
date = {2012-01-01},
journal = { ICASSP, Kyoto, 2012},
abstract = {Automatic emotion recognition from speech is limited by the ability to discover the relevant predicting features. The common approach is to extract a very large set of features over a generally long analysis time window. In this paper we investigate the applicability of two-sample Kolmogorov-Smirnov statistical test (KST) to the problem of segmental speech emotion recognition. We train emotion classifiers for each speech segment within an utterance. The segment labels are then combined to predict the dominant emotion label. Our findings show that KST can be successfully used to extract statistically relevant features. KST criterion is used to optimize the parameters of the statistical segmental analysis, namely the window segment size and shift. We carry out seven binary class emotion classification experiments on the Emo-DB and evaluate the impact of the segmental analysis and emotion-specific feature selection.},
keywords = {Affective Computing}
}
2011
title = {Detecting General Opinions from Customer Surveys},
author = {Stepanov E. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICDM11-OpinionDetection.pdf},
year = {2011},
date = {2011-01-01},
journal = {IEEE International Conference on Data Mining, SENTIRE Workshop, Vancouver, 2011},
abstract = {Questionnaire-based surveys and on-line product reviews resemble each other in that they both have user comments and satisfaction ratings. Since a comment might be a general opinion about a product or only one or a set of its attributes, in which case the text might not reflect the rating; surveys and reviews share the problem of pairing freetext comments with these ratings. To train accurate models for automatic evaluation of products from free-text, it is important to distinguish these two kinds of opinions. In this paper we present experiments on detecting general opinions that target a product as a whole; thus, reflect the user sentiments better. The task is different from subjectivity detection, since the goal is to detect generality of an opinion regardless of the rest of the documents being opinionated or not. The task complements feature-based opinion analysis and opinion polarity classification, since it can be applied as a preceding step to both tasks. We show that when used as a classification feature user ratings are not useful in the general opinion detection task. However, they are effective in predicting the polarity of a comment once it is identified as a general opinion.},
keywords = {Conversational and Interactive Systems , Machine Learning, Natural Language Processing}
}
title = {End-to-End Discourse Parser Evaluation},
author = {Ghosh S., Tonelli S., Riccardi G. and Johansson R.},
url = {http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6061347&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6061347},
year = {2011},
date = {2011-01-01},
journal = {IEEE International Conference on Semantic Computing, Menlo Park, USA, 2011},
abstract = {We are interested in the problem of discourse parsing of textual documents. We present a novel end-to-end discourse parser that, given a plain text document in input, identifies the discourse relations in the text, assigns them a semantic label and detects discourse arguments spans. The parsing architecture is based on a cascade of decisions supported by Conditional Random Fields (CRF). We train and evaluate three different parsers using the PDTB corpus. The three system versions are compared to evaluate their robustness with respect to deep/shallow and automatically extracted syntactic features.},
keywords = {Discourse, Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Using Syntacting and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!},
author = {Moschitti A., Chu-Carroll J., Patwardhan S., Fan J. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EMNLP11-QuesClassJeopardy.pdf},
year = {2011},
date = {2011-01-01},
journal = {EMNLP, Edinburgh, 2011},
abstract = {The last decade has seen many interesting applications of Question Answering (QA) technology. The Jeopardy! quiz show is certainly one of the most fascinating, from the viewpoints of both its broad domain and the complexity of its language. In this paper, we study kernel methods applied to syntactic/semantic structures for accurate classification of Jeopardy! definition questions. Our extensive empirical analysis shows that our classification models largely improve on classifiers based on word-language models. Such classifiers are also used in the state-of-the-art QA pipeline constituting Watson, the IBM Jeopardy! system. Our experiments measuring their impact on Watson show enhancements in QA accuracy and a consequent increase in the amount of money earned in game-based evaluation.},
keywords = {Conversational and Interactive Systems , Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Recognition of Personality Traits from Human Spoken Conversations},
author = {Ivanov A. V., Riccardi G., Sporka A. J. and Franc J.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS11-PersonalityRecog.pdf},
year = {2011},
date = {2011-01-01},
journal = {INTERSPEECH, Florence, 2011},
abstract = {We are interested in understanding human personality and its manifestations in human interactions. The automatic analysis of such personality traits in natural conversation is quite complex due to the user-profiled corpora acquisition, annotation task and multidimensional modeling. While in the experimental psychology research this topic has been addressed extensively, speech and language scientists have recently engaged in limited experiments. In this paper we describe an automated system for speaker-independent personality prediction in the context of human-human spoken conversations. The evaluation of such system is carried out on the PersIA human-human spoken dialog corpus annotated with user self-assessments of the Big-Five personality traits. The personality predictor has been trained on paralinguistic features and its evaluation on five personality traits shows encouraging results for the conscientiousness and extroversion labels.},
keywords = {Affective Computing}
}
title = {Collecting Life Logs for Experience Based Corpora},
author = {Francesconi F., Ghosh A., Riccardi G., Ronchetti M. and Vagin A.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS11-Iscout.pdf},
year = {2011},
date = {2011-01-01},
journal = {INTERSPEECH, Florence, 2011},
abstract = {In this paper we propose an approach to lightweight acquisition, sharing and annotation of experience-based corpora via mobile devices. Corpora acquisition is the crucial and often costly process in speech and language science and engineering. To address this problem, we have built a system for creating a location based corpora annotated with multimedia tags (e.g. text, speech, image) generated by end-users. We describe a relevant case study for the collection of mobile user life logs. We plan to make publicly available such tools and platforms to the research community for collaborative development and distributed experiential corpora collection.},
keywords = {Conversational and Interactive Systems }
}
title = {Simultaneous Dialog Act Segmentation and Classification from Human-Human Spoken Conversations},
author = {Quarteroni S., Ivanov A. V. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP11-DASegmcClass.pdf},
year = {2011},
date = {2011-01-01},
journal = {ICASSP, Prague, 2011},
abstract = {An accurate identification dialog acts (DAs), which represent the illocutionary aspect of communication, is essential to support the understanding of human conversations. This requires 1) the segmentation of human-human dialogs into turns, 2) the intra-turn segmentation into DA boundaries and 3) the classification of each segment according to a DA tag. This process is particularly challenging when both segmentation and tagging are automated and utterance hypotheses derive from the erroneous results of ASR. In this paper, we use Conditional Random Fields to learn models for simultaneous segmentation and labeling of DAs from whole human-human spoken dialogs. We identify the best performing lexical feature combinations on the LUNA and SWITCHBOARD human-human dialog corpora and compare performances to those of discriminative D classifiers based on manually segmented utterances. Additionally, we assess our models’ robustness to recognition errors, showing that DA identification is robust in the presence of high word error rates.},
keywords = {Natural Language Processing, Signal Annotation and Interpretation, Speech Processing}
}
title = {POMDP Concept Policies and Task Structures for Hybrid Dialog Management},
author = {Varges S., Riccardi G., Quarteroni S. , and Ivanov A. V.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP11-HybridPOMDPs.pdf},
year = {2011},
date = {2011-01-01},
journal = {ICASSP, Prague, 2011},
abstract = {We address several challenges for applying statistical dialog managers based on Partially Observable Markov Models to real world problems: to deal with large numbers of concepts, we use individual POMDP policies for each concept. To control the use of the concept policies, the dialog manager uses explicit task structures. The POMDP policies model the confusability of concepts at the value level. In contrast to previous work, we use explicit confusability statistics including confidence scores based on real world data in the POMDP models. Since data sparseness becomes a key issue for estimating these probabilities, we introduce a form of smoothing the observation probabilities that maintains the overall concept error rate. We evaluated three POMDP-based dialog systems and a rule-based one in a phone-based user evaluation in a tourist domain. The results show that a POMDP that uses confidence scores, in combination with an imp},
keywords = {Conversational and Interactive Systems }
}
title = {Tell Me Your Needs: Assistance for Public Transport Users},
author = {Ludwig B., Haecker M., Schaeller R., Zenker B., Ivanov A. V. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGCHI11-MobSDSPublTransport.pdf},
year = {2011},
date = {2011-01-01},
journal = {ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Pisa, 2011},
abstract = {Providing navigation assistance to users is a complex task generally consisting of two phases: planning a tour (phase one) and supporting the user during the tour (phase two). In the first phase, users interface to databases via constrained or natural language interaction to acquire prior knowledge such as bus schedules etc. In the second phase, often unexpected external events, such as delays or accidents, happen, user preferences change, or new needs arise. This requires machine intelligence to support users in the navigation realtime task, update information and trip replanning. To provide assistance in phase two, a navigation system must monitor external events, detect anomalies of the current situation compared to the plan built in the first phase, and provide assistance when the plan has become unfeasible. In this paper we present a prototypical mobile speech-controlled navigation system that provides assistance in both phases. The system was designed based on implications from an analysis of real user assistance needs investigated in a diary study that underlines the vital importance of assistance in phase two.},
keywords = {Conversational and Interactive Systems }
}
2010
title = {Combining User Intention and Error Modeling for Statistical Dialog Simulators},
author = {Quarteroni S., Gonzalez M., Riccardi G. and Varges S.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-StatistDUS.pdf},
year = {2010},
date = {2010-02-01},
journal = {INTERSPEECH, Makuhari, 2010},
abstract = {Statistical user simulation is an efficient and effective way to train and evaluate the performance of a (spoken) dialog system. In this paper, we design and evaluate a modular data-driven dialog simulator where we decouple the “intentional” component of the User Simulator from the Error Simulator representing different types of ASR/SLU noisy channel distortion. While the former is composed by a Dialog Act Model, a Concept Model and a User Model, the latter is centered around an Error Model. We test different Dialog Act Models and Error Models against a baseline dialog manager and compare results with real dialogs obtained using the same dialog manager. On the grounds of dialog act, task and concept accuracy, our results show that 1) datadriven Dialog Act Models achieve good accuracy with respect to real user behavior and 2) data-driven Error Models make task completion times and rates closer to real data.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Hypotheses Selection for Re-Ranking semantic Annotation},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT10-HypothSelectionReranking.pdf},
year = {2010},
date = {2010-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, San Francisco, 2010},
abstract = {Discriminative reranking has been successfully used for several tasks of Natural Language Processing (NLP). Recently it has been applied also to Spoken Language Understanding, imrpoving state-of-the-art for some applications. However, such proposed models can be further improved by considering: (i) a better selection of the initial nbest hypotheses to be re-ranked and (ii) the use of a strategy that decides when the reranking model should be used, i.e. in some cases only the basic approach should be applied. In this paper, we apply a semantic inconsistency metric to select the n-best hypotheses from a large set generated by an SLU basic system. Then we apply a state-of-the-art re-ranker based on the Partial Tree Kernel (PTK), which encodes SLU hypotheses in Support Vector Machines (SVM) with complex structured features. Finally, we apply a decision model based on confidence values to select between the first hypothesis provided by the basic SLU model and the first hypothesis provided by the re-ranker. We show the effectiveness of our approach presenting comparative results obtained by reranking hypotheses generated by two very different models: a simple Stochastic Language Model encoded in Finite State Machines (FSM) and a Conditional Random Field (CRF) model. We evaluate our approach on the French MEDIA corpus and on an Italian corpus acquired in the European Project LUNA. The results show a significant improvement with respect to the current state-of-the-art and previous re-ranking models. Index Terms: Spoken Language Understanding, Discriminative Reranking, Kernel Methods.},
keywords = {Signal Annotation and Interpretation}
}
title = {Classifying Dialog Acts in Human-Human and Human-Machine Spoken Conversations},
author = {Quarteroni S. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-DAClass.pdf},
year = {2010},
date = {2010-01-01},
journal = {INTERSPEECH, Makuhari, 2010},
abstract = {Dialog acts represent the illocutionary aspect of the communication; depending on the nature of the dialog and its participants, different types of dialog act occur and an accurate classification of these is essential to support the understanding of human conversations. We learn effective discriminative dialog act classifiers by studying the most predictive classification features on Human-Human and Human-Machine corpora such as LUNA and SWITCHBOARD; additionally, we assess classifier robustness to speech errors. Our results exceed the state of the art on dialog act classification from reference transcriptions on SWITCHBOARD and allow us to reach a very satisfying performance on ASR transcriptions.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Investigating Clarification Strategies in a Hybrid POMDP Dialog Manager},
author = {Varges S., Quarteroni S., Riccardi G. and Ivanov A. V.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGDial10-ClariStratPOMDP.pdf},
year = {2010},
date = {2010-01-01},
journal = {SIGDial, Tokyo, 2010},
abstract = {We investigate the clarification strategies exhibited by a hybrid POMDP dialog manager based on data obtained from a phone-based user study. The dialog manager combines task structures with a number of POMDP policies each optimized for obtaining an individual concept. We investigate the relationship between dialog length and task completion. In order to measure the effectiveness of the clarification strategies, we compute concept precisions for two different mentions of the concept in the dialog: first mentions and final values after clarifications and similar strategies, and compare this to a rulebased system on the same task. We observe an improvement in concept precision of 12.1% for the hybrid POMDP compared to 5.2% for the rule-based system.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Cooperative User Models in Statistical Dialog Simulators},
author = {Gonzalez M., Quarteroni S., Riccardi G. and Varges S.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGDial10-DUSCoopUser.pdf},
year = {2010},
date = {2010-01-01},
journal = {SIGDial 2010, Tokyo 2010},
abstract = {Statistical user simulation is a promising methodology to train and evaluate the performance of (spoken) dialog systems. We work with a modular architecture for data-driven simulation where the “intentional” component of user simulation includes a User Model representing userspecific features. We train a dialog simulator that combines traits of human behavior such as cooperativeness and context with domain-related aspects via the Expectation-Maximization algorithm. We show that cooperativeness provides a finer representation of the dialog context which directly affects task completion rate.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {The LUNA Spoken Dialogue System: Beyond Utterance Classification},
author = {Dinarelli M.,Stepanov E.,Varges S. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP10-SDSBeyondUttClass.pdf},
year = {2010},
date = {2010-01-01},
journal = {ICASSP, Dallas, 2010},
abstract = {We present a call routing application for complex problem solving tasks. Up to date work on call routing has been mainly dealing with call-type classification. In this paper we take call routing further: Initial call classification is done in parallel with a robust statistical Spoken Language Understanding module. This is followed by a dialogue to elicit further taskrelevant details from the user before passing on the call. The dialogue capability also allows us to obtain clarifications of the initial classifier guess. Based on an evaluation, we show that conducting a dialogue significantly improves upon call routing based on call classification alone. We present both subjective and objective evaluation results of the system according to standard metrics on real users.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Acoustic Correlates of Meaning Structure in Conversational Speech},
author = {Ivanov A. V., Riccardi G., Ghosh S.,Tonelli S. and Stepanov E.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-AcousticSemanticCorrelates.pdf},
year = {2010},
date = {2010-01-01},
journal = {INTERSPEECH, Makuhari, 2010},
abstract = {We are interested in the problem of extracting meaning structures from spoken utterances in human communication. In Spoken Language Understanding (SLU) systems, parsing of meaning structures is carried over the word hypotheses generated by the Automatic Speech Recognizer (ASR). This approach suffers from high word error rates and ad-hoc conceptual representations. In contrast, in this paper we aim at discovering meaning components from direct measurements of acoustic and non-verbal linguistic features. The meaning structures are taken from the frame semantics model proposed in FrameNet, a consistent and extendable semantic structure resource covering a large set of domains. We give a quantitative analysis of meaning structures in terms of speech features across human–human dialogs from the manually annotated LUNA corpus. We show that the acoustic correlations between pitch, formant trajectories, intensity and harmonicity and meaning features are statistically significant over the whole corpus as well as relevant in classifying the target words evoked by a semantic frame. Index Terms: spoken language understanding, spoken dialog, frame semantics, speech mining, acoustic features.},
keywords = {Natural Language Processing, Speech Processing}
}
title = {Automatic Turn Segmentation in Spoken Conversations},
author = {Ivanov A. V., Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS10-AutoTurnSegmentation.pdf},
year = {2010},
date = {2010-01-01},
journal = { INTERSPEECH, Makuhari, 2010},
abstract = {In this paper we have studied the problem of detecting the spoken turn boundaries in human-human spoken conversations. The automation of this task is essential to enable the analysis, recognition and understanding of the speech transcriptions and dialog structures (e.g. turn taking, dialog act segmentation etc.). The problem formulation is different from previous work on metadata extraction in that we work on the time domain for the detection of boundaries. This approach has the advantage of giving fine grain measures of speech events and does not rely on the automatic speech transcriptions. We have explored applicability of different algorithms for this task and have found that a hidden Markov model combining results of the modulation spectrum analysis and Kullback-Leibler divergence of adjacent signal portions produces the best results. The performance of the algorithms has been evaluated on the Switchboard conversational speech corpus.},
keywords = {Signal Annotation and Interpretation}
}
title = {Kernel-based Reranking for Named-Entity Extraction},
author = {Nguyen T. T., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/Coling10-KernelNE.pdf},
year = {2010},
date = {2010-01-01},
journal = {COLING, Bejing, 2010},
abstract = {We present novel kernels based on structured and unstructured features for reranking the N-best hypotheses of conditional random fields (CRFs) applied to entity extraction. The former features are generated by a polynomial kernel encoding entity features whereas tree kernels are used to model dependencies amongst tagged candidate examples. The experiments on two standard corpora in two languages, i.e. the Italian EVALITA 2009 and the English CoNLL 2003 datasets, show a large improvement on CRFs in F-measure, i.e. from 80.34% to 84.33% and from 84.86% to 88.16%, respectively. Our analysis reveals that both kernels provide a comparable improvement over the CRFs baseline. Additionally, their combination improves CRFs much more than the sum of the individual contributions, suggesting an interesting kernel synergy.},
keywords = {Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Annotation of Discourse Relations for Conversational Spoken Dialogs},
author = {Sara Tonelli S., Riccardi G., Prasad R. and Joshi A.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/LREC10-DialoguePDTBAnnotation.pdf},
year = {2010},
date = {2010-01-01},
journal = {LREC Valletta, 2010},
abstract = {In this paper, we make a qualitative and quantitative analysis of discourse relations within the LUNA conversational spoken dialog corpus. In particular, we describe the adaptation of the Penn Discourse Treebank (PDTB) annotation scheme to the LUNA dialogs. We discuss similarities and differences between our approach and the PDTB paradigm and point out the peculiarities of spontaneous dialogs w.r.t. written text, which motivated some changes in the sense hierarchy. Then, we present corpus statistics about the discourse relations within a representative set of annotated dialogs.},
keywords = {Discourse, Natural Language Processing, Signal Annotation and Interpretation}
}
2009
title = {Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System},
author = {Varges S., Quarteroni S., Riccardi G., Ivanov A. V. and Roberti P.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/12/ACL09-Demo.pdf},
year = {2009},
date = {2009-08-02},
journal = {ACL, Demo Session, Singapore, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Conversational and Interactive Systems }
}
title = {Ontology-Based Grounding of Spoken Language Understanding},
author = {Quarteroni S., Dinarelli M. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU09-OntologyGrounding.pdf},
year = {2009},
date = {2009-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Merano, 2009},
abstract = {Current Spoken Language Understanding models rely on either hand-written semantic grammars or flat attributevalue sequence labeling. In most cases, no relations between concepts are modeled, and both concepts and relations are domain-specific, making it difficult to expand or port the domain model. In contrast, we expand our previous work on a domain model based on an ontology where concepts follow the predicateargument semantics and domain-independent classical relations are defined on such concepts. We conduct a thorough study on a spoken dialog corpus collected within a customer care problemsolving domain, and we evaluate the coverage and impact of the ontology for the interpretation, grounding and},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {The Exploration/Exploitation Trade-Off in Reinforcement Learning for Dialogue Management},
author = {Varges S., Riccardi G., Quarteroni S., Ivanov A. V.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU09-ExploitationExplorationTradeoffSDS.pdf},
year = {2009},
date = {2009-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Merano, 2009.},
abstract = {Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner’s lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Leveraging POMDPs trained with User Simulations and Rule-Based Dialog Management in a SDS},
author = {Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SIGDial09-demo.pdf},
year = {2009},
date = {2009-01-01},
journal = {SIGDIAL, Demo Session, London, 2009},
abstract = {We have developed a complete spoken dialogue framework that includes rule-based and trainable dialogue managers, speech recognition, spoken language understanding and generation modules, and a comprehensive web visualization interface. We present a spoken dialogue system based on Reinforcement Learning that goes beyond standard rule based models and computes on-line decisions of the best dialogue moves. Bridging the gap between handcrafted (e.g. rule-based) and adaptive (e.g. based on Partially Observable Markov Decision Processes - POMDP) dialogue models, this prototype is able to learn high rewarding policies in a number of dialogue situations.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {A Statistical Dialog Manager for the LUNA Project},
author = {Griol D., Riccardi G. and Sanchis E.},
editor = {INTERSPEECH, Brighton, 2009},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-DM.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {In this paper, we present an approach for the development of a statistical dialog manager, in which the system response is selected by means of a classification process which considers all the previous history of the dialog to select the next system response. In particular, we use decision trees for its implementation. The statistical model is automatically learned from training data which are labeled in terms of different SLU features. This methodology has been applied to develop a dialog manager within the framework of the European LUNA project, whose main goal is the creation of a robust natural spoken language understanding system. We present an evaluation of this approach for both human machine and human-human conversations acquired in this project. We demonstrate that a statistical dialog manager developed with the proposed technique and learned from a corpus of human-machine dialogs can successfully infer the task-related topics present in spontaneous humanhuman dialogs.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Learning the Structure of Human-Computer and Human-Human Spoken Conversations},
author = {Griol D., Riccardi G. and Sanchis E.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-HHConvStructure.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {We are interested in the problem of understanding human conversation structure in the context of human-machine and human-human interaction. We present a statistical methodology for detecting the structure of spoken dialogs based on a generative model learned using decision trees. To evaluate our approach we have used the LUNA corpora, collected from real users engaged in problem solving tasks. The results of the evaluation show that automatic segmentation of spoken dialogs is very effective not only with models built using separately human-machine dialogs or human-human dialogs, but it is also possible to infer the task-related structure of human-human dialogs with a model learned using only human-machine dialogs.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {What's in an Ontology for Spoken Language Understanding},
author = {Quarteroni S., Riccardi G. and Dinarelli M.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-Ontology.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {Current Spoken Language Understanding systems rely either on hand-written semantic grammars or on flat attribute-value sequence labeling. In both approaches, concepts and their relations (when modeled at all) are domain-specific, thus making it difficult to expand or port the domain model. To address this issue, we introduce: 1) a domain model based on an ontology where concepts are classified into either predicative or argumentative; 2) the modeling of relations between such concept classes in terms of classical relations as defined in lexical semantics. We study and analyze our approach on a corpus of customer care data, where we evaluate the coverage and relevance of the ontology for the interpretation of speech utterances.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Concept Segmentation and Labeling for Conversational Speech},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-RR.pdf},
year = {2009},
date = {2009-01-01},
journal = {INTERSPEECH, Brighton, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Can Machines Call People?- User Experience While Answering Telephone Calls Initiated by Machine},
author = {Sporka A. J., Jakub F. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-RR1.pdf},
year = {2009},
date = {2009-01-01},
journal = {CHI, Boston, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Re-Ranking Models Based on Small Training Data for Spoken Language Understanding},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS09-RR3.pdf},
year = {2009},
date = {2009-01-01},
journal = {EMNLP, Singapore, 2009},
abstract = {Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-ofthe-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction},
author = {Nguyen T. T., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EMNLP09-ConKernel.pdf},
year = {2009},
date = {2009-01-01},
journal = {EMNLP, Singapore, 2009},
abstract = {This paper explores the use of innovative kernels based on syntactic and semantic structures for a target relation extraction task. Syntax is derived from constituent and dependency parse trees whereas semantics concerns to entity types and lexical sequences. We investigate the effectiveness of such representations in the automated relation extraction from texts. We process the above data by means of Support Vector Machines along with the syntactic tree, the partial tree and the word sequence kernels. Our study on the ACE 2004 corpus illustrates that the combination of the above kernels achieves high effectiveness and significantly improves the current state-of-the-art.},
keywords = {Machine Learning, Natural Language Processing}
}
title = {On-Line Strategy Computation in Spoken Dialog Systems},
author = {Varges S., Riccardi G., Quarteroni S., Ivanov A. V. and Roberti P.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP09-POMDPs.pdf},
year = {2009},
date = {2009-01-01},
journal = {ICASSP, Demo Session, Singapore, 2009. VIDEO},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Annotating Spoken Dialogs: from Speech Segments to Dialog Acts and Frame Semantics},
author = {Dinarelli M., Quarteroni S., Tonelli S., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EACL09-LUNACorpusAnnotation.pdf},
year = {2009},
date = {2009-01-01},
journal = {EACL Workshop on Semantic Representation of Spoken Language -Athens, 2009},
abstract = {We are interested in extracting semantic structures from spoken utterances generated within conversational systems. Current Spoken Language Understanding systems rely either on hand-written semantic grammars or on flat attribute-value sequence labeling. While the former approach is known to be limited in coverage and robustness, the latter lacks detailed relations amongst attribute-value pairs. In this paper, we describe and analyze the human annotation process of rich semantic structures in order to train semantic statistical parsers. We have annotated spoken conversations from both a human-machine and a human-human spoken dialog corpus. Given a sentence of the transcribed corpora, domain concepts and other linguistic features are annotated, ranging from e.g. part-of-speech tagging and constituent chunking, to more advanced annotations, such as syntactic, dialog act and predicate argument structure. In particular, the two latter annotation layers appear to be promising for the design of complex dialog systems. Statistics and mutual information estimates amongst such features are reported and compared across corpora.},
keywords = {Natural Language Processing, Signal Annotation and Interpretation}
}
title = {Re-Ranking Models For Spoken Language Understanding},
author = {Dinarelli M., Moschitti A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EACL09-RR.pdf},
year = {2009},
date = {2009-01-01},
journal = {EACL Conference, Athens, 2009},
abstract = {Spoken Language Understanding aims at mapping a natural language spoken sentence into a semantic representation. In the last decade two main approaches have been pursued: generative and discriminative models. The former is more robust to overfitting whereas the latter is more robust to many irrelevant features. Additionally, the way in which these approaches encode prior knowledge is very different and their relative performance changes based on the task. In this paper we describe a machine learning framework where both models are used: a generative model produces a list of ranked hypotheses whereas a discriminative model based on structure kernels and Support Vector Machines, re-ranks such list. We tested our approach on the MEDIA corpus (human-machine dialogs) and on a new corpus (human-machine and humanhuman dialogs) produced in the European LUNA project. The results show a large improvement on the state-of-the-art in concept segmentation and labeling.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {The Multisite 2009 EVALITA Spoken Dialog System Evaluation},
author = {Baggia P., Cutugno F., Danieli M., Pieraccini R.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/EVALITA09-MultisiteSDSEvaluation.pdf},
year = {2009},
date = {2009-01-01},
journal = {AI*IA EVALITA Workshop, Brescia, 2009},
abstract = {This document presents the coordination and the evaluation procedures for the Spoken Dialogue System Task in EVALITA 2009. Three institutions participated into the competition, University of Trento, University of Naples and Loquendo. EVALITA participants were asked to develop a SDS application operating in the sales force domain, they were provided with a preliminary list of scenarios indicating system accounting modalities and a possible list of subtasks that should made possible. The three systems were hosted on a server at Trento, 19 volunteers called all of them. The calls have been recorded, transcribed and annotated. The evaluation work, based on scripting run on the annotations, has been mainly focused on assessing performance at the dialogue, task, and concept levels. Detailed results indicating the systems performances are reported in the paper. This document presents the coordination and the evaluation procedures for the Spoken Dialogue System Task in EVALITA 2009. Three institutions participated into the competition, University of Trento, University of Naples and Loquendo SpA.},
keywords = {Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
title = {The Voice Multimodal Application Framework},
author = {Riccardi G., Mosca N., Roberti P. and Baggia P.},
year = {2009},
date = {2009-01-01},
journal = {AVIOS, San Diego, 2009. VIDEO},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Spoken Dialog Systems: From Theory to Technology},
author = {Riccardi G., Baggia P. and Roberti P.},
year = {2009},
date = {2009-01-01},
journal = {Proc. Work. Toni Mian, Padua, 2007},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus},
author = {Rodríguez K. J., Dipper S., Götze M., Poesio P., Riccardi G., Raymond C., Wisniewska J.},
year = {2009},
date = {2009-01-01},
journal = {ACL LAW Workshop, Prague, 2007},
keywords = {Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
2008
title = {Spoken Language Understanding},
author = {De Mori R., Bechet F., Hakkani-Tur D., McTear M., Riccardi G. and Tur G.},
year = {2008},
date = {2008-01-01},
journal = {IEEE Signal Processing Magazine vol. 25, pp.50-58 ,2008},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Semantic Annotations For Conversational Speech: from speech transcriptions to predicate argument structures},
author = {Bisazza A., Dinarelli M., Quarteroni S., Tonelli S., Moschitti A., Riccardi G.},
year = {2008},
date = {2008-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Goa, 2008},
keywords = {Machine Learning, Natural Language Processing}
}
title = {Joint Generative And Discriminative Models For Spoken Language Understanding},
author = {Dinarelli M., Moschitti A., Riccardi G.},
year = {2008},
date = {2008-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Goa, 2008},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Automatic FrameNet-Based Annotation of Conversational Speech},
author = {Coppola B., Moschitti A., Tonelli S., Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SLT08-FramenetParser.pdf},
year = {2008},
date = {2008-01-01},
journal = {IEEE/ACL Workshop on Spoken Language Technology, Goa, 2008},
abstract = {Current Spoken Language Understanding technology is based on a simple concept annotation of word sequences, where the interdependencies between concepts and their compositional semantics are neglected. This prevents an effective handling of language phenomena, with a consequential limitation on the design of more complex dialog systems. In this paper, we argue that shallow semantic representation as formulated in the Berkeley FrameNet Project may be useful to improve the capability of managing more complex dialogs. To prove this, the first step is to show that a FrameNet parser of sufficient accuracy can be designed for conversational speech. We show that exploiting a small set of FrameNetbased manual annotations, it is possible to design an effective semantic parser. Our experiments on an Italian spoken dialog corpus, created within the LUNA project, show that our approach is able to automatically annotate unseen dialog turns with a high accuracy.},
keywords = {Machine Learning, Natural Language Processing}
}
title = {Persistent Information State in a Data-Centric Architecture},
author = {Sebastian V., Riccardi G. and Quarteroni S.},
year = {2008},
date = {2008-01-01},
journal = {SIGdial Workshop on Discourse and Dialogue, Columbus, 2008},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Learning with Noisy Supervision for Spoken Language Understanding},
author = {Raymond C. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ICASSP08-NoisySupervisionSLU.pdf},
year = {2008},
date = {2008-01-01},
journal = {Proc. IEEE ICASSP, Las Vegas,2008},
abstract = {Data-driven Spoken Language Understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been successfully applied to automatic speech recognition and utterance classification. In general, corpora annotation for SLU involves such tasks as sentence segmentation, chunking or frame labeling and predicate-argument annotation. In such cases human annotations are subject to errors increasing with the annotation complexity. We investigate two alternative noise-robust active learning strategies that are either data-intensive or supervision-intensive. The strategies detect likely erroneous examples and improve significantly the SLU performance for a given labeling cost. We apply uncertainty based active learning with conditional random fields on the concept segmentation task for SLU. We perform annotation experiments on two databases, namely ATIS (English) and Media (French). We show that our noise-robust algorithm could improve the accuracy up to 6% (absolute) depending on the noise level and the labeling cost.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Active Annotation in the LUNA Italian Corpus of Spontaneous Dialogues},
author = {Rodríguez K., Raymond C. and Riccardi G.},
year = {2008},
date = {2008-01-01},
journal = {Proc. Language Resources and Evaluation (LREC), Marrakech, 2008},
keywords = {Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
2007
title = {A Data-Centric Architecture for Data-Driven Spoken Dialog Systems},
author = {Varges S. and Riccardi G.},
year = {2007},
date = {2007-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, 2007},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Spoken Language Understanding with Kernels for Syntactic/Semantic Structures},
author = {Moschitti A., Riccardi G. and Raymond C.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ASRU07-SLUKernels.pdf},
year = {2007},
date = {2007-01-01},
journal = {IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, 2007},
abstract = {Automatic concept segmentation and labeling are the fundamental problems of Spoken Language Understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with Support Vector Machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with offthe-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-theart models when combined with n-gram based models.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Generative and Discriminative Algorithms for Spoken Language Understanding},
author = {Raymond C., Riccardi G.},
year = {2007},
date = {2007-01-01},
journal = {INTERSPEECH, Antwerp, 2007},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Searching for in Information in Video Lectures},
author = {Fogarolli A., Riccardi G. and Ronchetti M.},
year = {2007},
date = {2007-01-01},
journal = {Proc. ED-MEDIA Conference, Vancouver, 2007},
keywords = {Conversational and Interactive Systems }
}
title = {The LUNA Corpus: an Annotation Scheme for a Multi-domain Multi-lingual Dialogue Corpus},
author = {Raymond C., Riccardi G., Rodríguez K. J. and Wisniewska J.},
year = {2007},
date = {2007-01-01},
journal = {11th Workshop on the Semantics and Pragmatics of Dialogue (DECALOG'07), Rovereto, 2007},
keywords = {Machine Learning, Natural Language Processing, Signal Annotation and Interpretation}
}
2006
title = {Spoken Dialog Systems: From Theory to Technology},
author = {Riccardi G. and Baggia P.},
year = {2006},
date = {2006-01-01},
journal = {Edizione della Normale di Pisa, 2006},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {An Active Approach to spoken Language Processing},
author = {Hakkani-Tur D., Riccardi G. and Tur G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/acm-tslp-06.pdf},
year = {2006},
date = {2006-01-01},
journal = {ACM Transactions on Speech and Language Processing, Vol. 3, No. 3, pp 1-31, 2006},
abstract = {State of the art data-driven speech and language processing systems require a large amount of human intervention ranging from data annotation to system prototyping. In the traditional supervised passive approach, the system is trained on a given number of annotated data samples and evaluated using a separate test set. Then more data is collected arbitrarily, annotated, and the whole cycle is repeated. In this article, we propose the active approach where the system itself selects its own training data, evaluates itself and re-trains when necessary. We first employ active learning which aims to automatically select the examples that are likely to be the most informative for a given task. We use active learning for both selecting the examples to label and the examples to re-label in order to correct labeling errors. Furthermore, the system automatically evaluates itself using active evaluation to keep track of the unexpected events and decides on-demand to label more examples. The active approach enables dynamic adaptation of spoken language processing systems to unseen or unexpected events for nonstationary input while reducing the manual annotation effort significantly. We have evaluated the active approach with the AT&T spoken dialog system used for customer care applications. In this article, we present our results for both automatic speech recognition and spoken language understanding. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Speech recognition and synthesis; I.5.1 [Pattern Recognition]: Models—Statistical General Terms: Algorithms, Languages, Performance Additional Key Words and Phrases: Passive learning, active learning, adaptive learning, unsupervised learning, active evaluation, spoken language understanding, automatic speech recognition, spoken dialog systems, speech and language processing},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Beyond ASR 1-Best: Using Word Confusion Network},
author = {Hakkani-Tur D., Bechet F., Riccardi G. and Tur G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/CSL-pivot-slu.pdf},
year = {2006},
date = {2006-01-01},
journal = {Computer Speech and Language, volume 20, Issue 4, pp. 495-514, 2006},
abstract = {We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. Ó 2005 Elsevier Ltd. All rights reserved.},
keywords = {Language Modeling, Speech Processing}
}
title = {The AT&T Spoken Language Understanding System},
author = {Gupta N., Tur G., Hakkani-Tur D., Bangalore S., Riccardi G. and Rahim M.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE-SAP-2005-SLU.pdf},
year = {2006},
date = {2006-01-01},
journal = {IEEE Trans. on Audio, Speech and Language Processing, volume 14, Issue 1, pp. 213-22, 2006},
abstract = {Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users’ intents for call classification, to combinations of users’ intents and named entities. In this paper, we present the SLU system of VoiceTone ® (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users’ utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {Shallow Semantic Parsing for Spoken Language Understanding},
author = {Coppola B., Moschitti A., Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/NAACL09-ShallowSemanticParsingFrameNet.pdf},
year = {2006},
date = {2006-01-01},
journal = {NAACL, Boulder, Colorado, 2009},
abstract = {Most Spoken Dialog Systems are based on speech grammars and frame/slot semantics. The semantic descriptions of input utterances are usually defined ad-hoc with no ability to generalize beyond the target application domain or to learn from annotated corpora. The approach we propose in this paper exploits machine learning of frame semantics, borrowing its theoretical model from computational linguistics. While traditional automatic Semantic Role Labeling approaches on written texts may not perform as well on spoken dialogs, we show successful experiments on such porting. Hence, we design and evaluate automatic FrameNet-based parsers both for English written texts and for Italian dialog utterances. The results show that disfluencies of dialog data do not severely hurt performance. Also, a small set of FrameNet-like manual annotations is enough for realizing accurate Semantic Role Labeling on the target domains of typical Dialog Systems.},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
title = {NEEDLE: Next Generation Digital Libraries},
author = {Riccardi G., Ronchetti M.},
year = {2006},
date = {2006-01-01},
journal = {AISV Workshop, Trento, November 2006},
keywords = {Conversational and Interactive Systems }
}
title = {Natural Language Watermarking: Research Challenges and Applications},
author = {Topkara M., Riccardi G., Hakkani-Tur D., Atallah M. J.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/SPIE06.pdf},
year = {2006},
date = {2006-01-01},
journal = {SPIE Conference, San Diego, January, 2006},
abstract = {This paper gives an overview of the research and implementation challenges we encountered in building an endto-end natural language processing based watermarking system. With natural language watermarking, we mean embedding the watermark into a text document, using the natural language components as the carrier, in such a way that the modifications are imperceptible to the readers and the embedded information is robust against possible attacks. Of particular interest is using the structure of the sentences in natural language text in order to insert the watermark. We evaluated the quality of the watermarked text using an objective evaluation metric, the BLEU score. BLEU scoring is commonly used in the statistical machine translation community. Our current system prototype achieves 0.45 BLEU score on a scale [0,1].},
keywords = {Natural Language Processing}
}
2005
title = {Grounding Emotions in Human-Machine Conversational Systems},
author = {Riccardi G. and Hakkani-Tur D.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/intetain05.pdf},
year = {2005},
date = {2005-01-01},
journal = {Lecture Notes in Computer Science, Springer-Verlag, , pp. 144 – 154, 2005},
abstract = {Abstract. In this paper we investigate the role of user emotions in human-machine goal-oriented conversations. There has been a growing interest in predicting emotions from acted and non-acted spontaneous speech. Much of the research work has gone in determining what are the correct labels and improving emotion prediction accuracy. In this paper we evaluate the value of user emotional state towards a computational model of emotion processing. We consider a binary representation of emotions (positive vs. negative) in the context of a goal-driven conversational system. For each human-machine interaction we acquire the temporal emotion sequence going from the initial to the final conversational state. These traces are used as features to characterize the user state dynamics. We ground the emotion traces by associating its patterns to dialog strategies and their effectiveness. In order to quantify the value of emotion indicators, we evaluate their predictions in terms of speech recognition and spoken language understanding errors as well as task success or failure. We report results on the 11.5K dialog corpus samples from the How may I Help You? corpus.},
keywords = {Affective Computing, Conversational and Interactive Systems }
}
title = {Active Learning: Theory and Applications to Automatic Speech Recognition},
author = {Riccardi G. and Hakkani-Tur D.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ieee-al-05.pdf},
year = {2005},
date = {2005-01-01},
journal = {IEEE Trans. on Speech and Audio, vol. 13, n.4 , pp. 504-511, 2005},
abstract = {We are interested in the problem of adaptive learning in the context of automatic speech recognition (ASR). In this paper, we propose an active learning algorithm for ASR. Automatic speech recognition systems are trained using human supervision to provide transcriptions of speech utterances. The goal of Active Learning is to minimize the human supervision for training acoustic and language models and to maximize the performance given the transcribed and untranscribed data. Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. In this paper we describe how to estimate the confidence score for each utterance through an on-line algorithm using the lattice output of a speech recognizer. The utterance scores are filtered through the informativeness function and an optimal subset of training samples is selected. The active learning algorithm has been applied to both batch and on-line learning scheme and we have experimented with different selective sampling algorithms. Our experiments show that by using active learning the amount of labeled data needed for a given word accuracy can be reduced by more than 60% with respect to random sampling.},
keywords = {Machine Learning}
}
title = {Adaptive Categorical Understanding for Spoken Dialogue Systems'},
author = {Potamianos A., Narayanan S. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/ieee_adapt-categ-05.pdf},
year = {2005},
date = {2005-01-01},
journal = {Potamianos A., Narayanan S and Riccardi, G.},
abstract = {IEEE Trans. on Speech and Audio, vol. 13, n.3 , pp. 321-329, 2005},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Using Context to Improve Emotion Detection in Spoken Dialog Systems},
author = {Liscombe J., Riccardi G. and Hakkani-Tur D.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IS05-EmotRecog.pdf},
year = {2005},
date = {2005-01-01},
journal = {INTERSPEECH, Lisbon, Sept. 2005},
abstract = {Most research that explores the emotional state of users of spoken dialog systems does not fully utilize the contextual nature that the dialog structure provides. This paper reports results of machine learning experiments designed to automatically classify the emotional state of user turns using a corpus of 5,690 dialogs collected with the “How May I Help You SM ” spoken dialog system. We show that augmenting standard lexical and prosodic features with contextual features that exploit the structure of spoken dialog and track user state increases classification accuracy by 2.6%.},
keywords = {Affective Computing, Speech Processing}
}
title = {Error Prediction in Spoken Dialog: from Signal-to-Noise Ratio to Semantic Confidence Scores},
author = {Hakkani-Tur D., Tur G., Riccardi G. and Kim H. K.},
year = {2005},
date = {2005-01-01},
journal = {IEEE ICASSP, Philadelphia, March 2005},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {The AT&T WATSON Speech Recognizer},
author = {Goffin V., Allauzen C., Bocchieri E., Hakkani-Tur D., Ljolje A., Parthasarathy S., Rahim M., Riccardi G. and Saraclar M.},
year = {2005},
date = {2005-01-01},
journal = {IEEE ICASSP, Philadelphia, March 2005},
keywords = {Speech Processing}
}
title = {Mining Spoken dialogue Corpora for system Evaluation and Modeling},
author = {Bechet F., Riccardi G. and Hakkani-Tur D.},
year = {2005},
date = {2005-01-01},
journal = {EMNLP Conference, Barcelona, 2004},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Combining Classifiers for Spoken Language Understanding},
author = {Karahan M., Hakkani-Tur D., Riccardi G. and Tur G.},
year = {2005},
date = {2005-01-01},
journal = {IEEE ASRU, U.S. Virgin Islands, Dec., 2003},
keywords = {Machine Learning, Natural Language Processing, Speech Processing}
}
2004
title = {Unsupervised and Active Learning in Automatic Speech Recognition for Call Classification},
author = {Hakkani-Tur D., Tur G., Rahim M. and Riccardi G.},
year = {2004},
date = {2004-01-01},
journal = {ICASSP, Montreal, May 2004},
keywords = {Language Modeling, Machine Learning, Speech Processing}
}
title = {Extending Boosting For Call Classification Using Word Confusion Networks},
author = {Tur G., Hakkani-Tur D. and Riccardi G.},
year = {2004},
date = {2004-01-01},
journal = {IEEE ICASSP, Montreal, May 2004},
keywords = {Machine Learning, Signal Annotation and Interpretation}
}
2003
title = {Multi-channel Sentence Classification for Spoken Dialogue Modeling},
author = {Bechet F., Riccardi G. and Hakkani-Tur D.},
year = {2003},
date = {2003-01-01},
journal = {EUROSPEECH, Geneve, Switzerland, Sept. 2003},
keywords = {Machine Learning, Speech Processing}
}
title = {Active and Unsupervised Learning for Automatic Speech Recognition},
author = {Riccardi G. and Hakkani-Tur D.},
year = {2003},
date = {2003-01-01},
journal = {EUROSPEECH, Geneve, Switzerland, Sept. 2003},
keywords = {Machine Learning}
}
title = {A General Algorithm for Word Graph Decomposition},
author = {Hakkani-Tur D. and Riccardi G.},
year = {2003},
date = {2003-01-01},
journal = {IEEE ICASSP, Honk-Kong, 2003},
keywords = {Speech Processing}
}
2002
title = {Stochastic Finite-State Models for Spoken Language Machine Translation},
author = {Bangalore S. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/mt_journal_special-02.pdf},
year = {2002},
date = {2002-01-01},
journal = {Machine Translation , vol 17, n. 3, pp. 165-184, 2002 (Invited paper)},
abstract = {Abstract. The problem of machine translation can be viewed as consisting of two subproblems (a) lexical selection and (b) lexical reordering. In this paper, we propose stochastic finite-state models for these two subproblems. Stochastic finite-state models are efficiently learnable from data, effective for decoding and are associated with a calculus for composing models which allows for tight integration of constraints from various levels of language processing. We present a method for learning stochastic finite-state models for lexical selection and lexical reordering that are trained automatically from pairs of source and target utterances. We use this method to develop models for English–Japanese and English–Spanish translation and present the performance of these models for translation on speech and text. We also evaluate the efficacy of such a translation model in the context of a call routing task of unconstrained speech utterances.},
keywords = {Statistical Machine Translation}
}
title = {Automated Natural Spoken Dialog},
author = {Gorin A., Abella A., Alonso T., Riccardi G. and Wright J.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/computer_magazine_2002.pdf},
year = {2002},
date = {2002-01-01},
journal = {IEEE Computer, vol. 35, n.4, pp. 51-56, April, 2002 (invited paper)},
abstract = {Engineers have long sought to design systems that understand and act upon spoken language. Extracting meaning from natural, unconstrained speech over the telephone is technically challenging, and quantifying semantic content is crucial for engineering and evaluating such systems.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Bootstrapping Bilingual Data Using Consensus Translation for a Multilingual Instant Messaging System},
author = {Bangalore S., Murdock V., Riccardi G.},
year = {2002},
date = {2002-01-01},
journal = {COLING, Taipei, 2002},
keywords = {Statistical Machine Translation}
}
title = {Improving Spoken Language Understanding Using Word Confusion Networks},
author = {Gokhan T., Wright J., Gorin A., Riccardi G., Tur H.},
year = {2002},
date = {2002-01-01},
journal = {ICSLP, Denver, 2002},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Acoustic and Word Lattice Based Algorithms for Confidence Scores},
author = {Falavigna D., Gretter R. and Riccardi G.},
year = {2002},
date = {2002-01-01},
journal = {Proc. ICSLP, Denver, 2002},
keywords = {Speech Processing}
}
title = {AT&T Help Desk},
author = {Fabbrizio G., Dutton D., Gupta N., Hollister B., Rahim M., Riccardi G., Schapire R. and Schroeter J.},
year = {2002},
date = {2002-01-01},
journal = {Proc. ICSLP, Denver, 2002},
keywords = {Conversational and Interactive Systems , Machine Learning, Speech Processing}
}
title = {Combining Prior Knowledge and Boosting for Call Classification in Spoken Language Dialogue},
author = {Rochery M., Schapire R., Rahim M., Gupta G., Riccardi G., Bangalore S., Alshawi H. and Douglas S.},
year = {2002},
date = {2002-01-01},
journal = {Proc. IEEE ICASSP, Orlando, 2002},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Active Learning for Automatic Speech Recognition},
author = {Tur D., Riccardi G. and Gorin A. L.
},
year = {2002},
date = {2002-01-01},
journal = {Proc. IEEE ICASSP, Orlando, 2002},
keywords = {Machine Learning}
}
title = {Computing Consensus Translation from Multiple Machine Translation Systems},
author = {Bangalore S., Bordel G. and Riccardi G.},
year = {2002},
date = {2002-01-01},
journal = {Proc. IEEE ASRU, Madonna di Campiglio, Italy, 2001},
keywords = {Statistical Machine Translation}
}
2001
title = {Robust Numeric Recognition in Spoken Language Dialogue},
author = {Rahim M., Riccardi G., Saul L., Wright J., Buntschuh B. and Gorin A. L.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/numericlang-speechcomm-2001.pdf},
year = {2001},
date = {2001-11-01},
journal = {Speech Communication, 34, pp. 195-212, 2001},
abstract = {This paper addresses the problem of automatic numeric recognition and understanding in spoken language dialogue. We show that accurate numeric understanding in ̄uent unconstrained speech demands maintaining robustness at several dierent levels of system design, including acoustic, language, understanding and dialogue. We describe a robust system for numeric recognition and present algorithms for feature extraction, acoustic and language modeling, discriminative training, utterance veri®cation and numeric understanding and validation. Experimental results from a ®eld-trial of a spoken dialogue system are presented that include customers\' responses to credit card and telephone number requests. Ó 2001 Elsevier Science B.V. All rights reserved.},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {On-line learning of language models with word error probability distributions},
author = {Gretter R. and Riccardi G.},
year = {2001},
date = {2001-05-07},
journal = {Proc. IEEE ICASSP 2001, Salt Lake City, Utah, 7-11 May 2001},
keywords = {Machine Learning, Speech Processing}
}
title = {Integration of Utterance Verification with Statistical Language Modeling and Spoken Language Understanding},
author = {Rose R. C., Yao H., Riccardi G. and Wright J. H.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/uv-speechcomm-2001.pdf},
year = {2001},
date = {2001-01-01},
journal = {Speech Communication, 34, pp. 321-331, 2001},
abstract = {Methods for utterance veri®cation (UV) and their integration into statistical language modeling and understanding formalisms for a large vocabulary spoken understanding system are presented. The paper consists of three parts. First, a set of acoustic likelihood ratio (LR) based UV techniques are described and applied to the problem of rejecting portions of a hypothesized word string that may have been incorrectly decoded by a large vocabulary continuous speech recognizer. Second, a procedure for integrating the acoustic level con®dence measures with the statistical language model is described. Finally, the eect of integrating acoustic level con®dence into the spoken language understanding unit (SLU) in a call-type classi®cation task is discussed. These techniques were evaluated on utterances collected from a highly unconstrained call routing task performed over the telephone network. They have been evaluated in terms of their ability to classify utterances into a set of 15 call-types that are accepted by the application. Ó 2001 Elsevier Science B.V. All rights reserved.},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {A Finite-State Approach to Machine Translation},
author = {Bangalore S. and Riccardi G.},
year = {2001},
date = {2001-01-01},
journal = {Proc. NAACL Conference, Pittsburgh, June, 2001},
keywords = {Statistical Machine Translation}
}
2000
title = {Finite-state models for lexical reordering in spoken language translation},
author = {Bangalore S. and Riccardi G.},
year = {2000},
date = {2000-10-01},
journal = {Proc. ICSLP, Beijing, Oct. 2000},
keywords = {Statistical Machine Translation}
}
title = {On-line Learning of Acoustic and Lexical Units for Domain-Independent ASR},
author = {Riccardi G.},
year = {2000},
date = {2000-10-01},
journal = {Proc. ICSLP, Beijing, Oct. 2000},
keywords = {Machine Learning}
}
title = {Detecting Acoustic Morphemes in Lattices for Spoken Language Understanding},
author = {Petrovska-Delacretaz D., Gorin A. L., Riccardi G. and Wright J. H.},
year = {2000},
date = {2000-10-01},
journal = {Proc. ICSLP, Beijing, Oct. 2000},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {A Spoken Dialog System for Conference/Workshop Services},
author = {Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Kamm C., Narayanan S.},
year = {2000},
date = {2000-10-01},
journal = {Proc. ICSLP, Beijing, Oct. 2000},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Semantic information processing of spoken language},
author = {Gorin A. L., Wright J. H., Riccardi G., Abella A. and Alonso T.},
year = {2000},
date = {2000-10-01},
journal = {ATR Workshop on Multi-Lingual Speech Communication, Oct. 2000},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Stochastic finite-state models for spoken language machine translation,'' Proc. Workshop on Embedded Machine Translation Systems},
author = {Bangalore S. and Riccardi G.},
year = {2000},
date = {2000-05-01},
journal = {NAACL, pp. 52-59, Seattle, May 2000},
keywords = {Statistical Machine Translation}
}
title = {Spoken language adaptation over time and state in a natural spoken dialog system},
author = {Riccardi G. and Gorin A. L.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEETSLP00-LMAdapt.pdf},
year = {2000},
date = {2000-01-01},
journal = {IEEE Trans. on Speech and Audio, vol. 8, pp. 3-10, 2000},
abstract = {We are interested in adaptive spoken dialog systems for automated services. Peoples’ spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialog. Thus, it is crucial to adapt automatic speech recognition (ASR) language models to these varying conditions. We characterize and quantify these variations based on a database of 30K user-transactions with AT&T’s experimental How May I Help You? spoken dialog system. We describe a novel adaptation algorithm for language models with time and dialog-state varying parameters. Our language adaptation framework allows for recognizing and understanding unconstrained speech at each stage of the dialog, enabling context-switching and error recovery. These models have been used to train state-dependent ASR language models. We have evaluated their performance with respect to word accuracy and perplexity over time and dialog states. We have achieved a reduction of 40% in perplexity and of 8.4% in word error rate over the baseline system, averaged across all dialog states.},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1999
title = {W99- A Spoken Dialog System for the ASRU99 Workshop},
author = {Rahim M., Pieraccini R., Eckert W., Levin E., Di Fabbrizio G., Riccardi G., Lin C., Kamm C.},
year = {1999},
date = {1999-12-01},
journal = {Proc. IEEE ASRU, Keystone, Colorado, Dec. 1999},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Learning head-dependency relations from unannotated corpora},
author = {Riccardi G., Bangalore S. and Sarin P.},
year = {1999},
date = {1999-12-01},
journal = {Proc. IEEE ASRU, Keystone, Colorado, Dec. 1999},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events},
author = {Conkie A., Riccardi G. and Rose R. C.},
year = {1999},
date = {1999-09-01},
journal = {Proc. EUROSPEECH, Budapest, Hungary, Sept. 1999},
keywords = {Machine Learning, Speech Processing}
}
title = {Categorical understanding using statistical N-gram models},
author = {Potamianos A., Riccardi G. and Narayanan S.},
year = {1999},
date = {1999-09-01},
journal = {Proc. EUROSPEECH, Budapest, Hungary, Sept. 1999},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Automatic speech recognition using acoustic confidence conditioned language models},
author = {Rose R. C. and Riccardi G.},
year = {1999},
date = {1999-09-01},
journal = {Proc. EUROSPEECH, Budapest, Hungary, Sept. 1999},
keywords = {Speech Processing}
}
title = {Spoken language variation over time and state in a natural spoken dialog system},
author = {Gorin A. L. and Riccardi G.},
year = {1999},
date = {1999-03-01},
journal = {Proc. ICASSP, Phoenix, Mar. 1999},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Modeling dysfluency and background events in ASR for a natural language understanding task},
author = {Rose R. C. and Riccardi G.},
year = {1999},
date = {1999-03-01},
journal = {Proc. ICASSP., Phoenix, March 1999},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Grammar fragment acquisition using syntactic and semantic clustering},
author = {Arai K., Wright J. H., Riccardi G. and Gorin A. L.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/fragclustering-speechcomm-19981.pdf},
year = {1999},
date = {1999-01-01},
journal = {Speech Communication, vol. 27, no. 1, Jan. 1999},
abstract = {A new method for automatically acquiring Fragments for understanding ̄uent speech is proposed. The goal of this method is to generate a collection of Fragments, each representing a set of syntactically and semantically similar phrases. First, phrases observed frequently in the training set are selected as candidates. Each candidate phrase has three associated probability distributions: of following contexts, of preceding contexts, and of associated semantic actions. The similarity between candidate phrases is measured by applying the Kullback±Leibler distance to these three probability distributions. Candidate phrases that are close in all three distances are clustered into a Fragment. Salient sequences of these Fragments are then automatically acquired, and exploited by a spoken language understanding module to classify calls in AT&T\'s ``How may I help you?\'\' task. These Fragments allow us to generalize unobserved phrases. For instance, they detected 246 phrases in the test-set that were not present in the training-set. This result shows that unseen phrases can be automatically discovered by our new method. Experimental results show that 2.8% of the improvement in call-type classi®catio},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Learning spoken language without transcription},
author = {Gorin A. L., Petrovska-Delacretaz D., Riccardi G. and Wright J. H.},
year = {1999},
date = {1999-01-01},
journal = {Proc.IEEE ASRU Workshop, Colorado, 1999},
keywords = {Machine Learning}
}
title = {Robust automatic speech recognition in a natural spoken dialog},
author = {Rahim M., Riccardi G., Wright J., Buntschuh B. and Gorin A.},
year = {1999},
date = {1999-01-01},
journal = {Workshop on Robust Methods for Speech Recognition in Adverse Condition, Tampere, Finland, 1999},
keywords = {Language Modeling, Speech Processing}
}
1998
title = {Grammar fragment acquisition using syntactic and semantic clustering},
author = {Arai K., Wright J. H., Riccardi G. and Gorin A. L.},
year = {1998},
date = {1998-11-01},
journal = {Proc. ICSLP, Sydney, Nov. 1998},
keywords = {Machine Learning}
}
title = {Language model adaptation for spoken dialog systems},
author = {Riccardi G., Potamianos A. and Narayanan S.},
year = {1998},
date = {1998-11-01},
journal = {Proc. ICSLP, Sydney, Nov. 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Stochastic language models for speech recognition and understanding},
author = {Riccardi G. and Gorin A. L.},
year = {1998},
date = {1998-11-01},
journal = {Proc. ICSLP, Sydney, Nov. 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Integration of utterance verification with statistical language modeling and spoken language understanding},
author = {Rose R. C., Yao H., Riccardi G. and Wright J.},
year = {1998},
date = {1998-05-01},
journal = {Proc. ICASSP., Seattle, May 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {Automatic acquisition of phrase grammars for stochastic language modeling},
author = {Riccardi G. and Bangalore S.},
year = {1998},
date = {1998-01-01},
journal = {Proc. 6th ACL Workshop on Very Large Corpora, Montreal, 1998},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1997
title = {Grammar fragment acquisition using syntactic and semantic clustering,'' Proc. Workshop Spoken Language Understanding & Communication},
author = {Arai K., Wright J., Riccardi G. and Gorin A.},
year = {1997},
date = {1997-12-01},
journal = {Yokosuka, Japan, Dec. 1997},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
title = {How may I help you?},
author = {Gorin A. L., Riccardi G. and Wright J. H.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/specom97.pdf},
year = {1997},
date = {1997-01-01},
journal = {Speech Communication, vol. 23, Oct. 1997, pp. 113-127.},
abstract = {We are interested in providing automated services via natural spoken dialog systems. By natural, we mean that the machine understands and acts upon what people actually say, in contrast to what one would like them to say. There are many issues that arise when such systems are targeted for large populations of non-expert users. In this paper, we focus on the task of automatically routing telephone calls based on a user’s fluently spoken response to the open-ended prompt of ‘‘How may I help you?’’. We first describe a database generated from 10,000 spoken transactions between customers and human agents. We then describe methods for automatically acquiring language models for both recognition and understanding from such data. Experimental results evaluating call-classification from speech are reported for that database. These methods have been embedded within a spoken dialog system, with subsequent processing for information retrieval and formfilling. q 1997 Elsevier Science B.V.},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {A spoken language system for automated call routing},
author = {Riccardi G., Gorin A. L., Ljolje A. and Riley M.},
year = {1997},
date = {1997-01-01},
journal = {Proc. ICASSP '97, 1997, pp. 1143-1146},
keywords = {Conversational and Interactive Systems , Speech Processing}
}
title = {Integrating multiple knowledge sources for utterance verification in a large vocabulary speech understanding system},
author = {Rose R. C., Yao H., Riccardi G. and Wright J.},
year = {1997},
date = {1997-01-01},
journal = {Proc. IEEE ASR Workshop Proc., Santa Barbara, 1997},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {Automatic acquisition of salient grammar fragments for call-type classification},
author = {Wright J. H., Gorin A. L. and Riccardi G.},
year = {1997},
date = {1997-01-01},
journal = {Proc. EUROSPEECH, Rhodes, Greece, 1997, pp. 1419-1422},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1996
title = {Stochastic automata for language modeling},
author = {Riccardi G., Pieraccini R. and Bocchieri E.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/csl96.pdf},
year = {1996},
date = {1996-01-01},
journal = { Computer Speech and Language, vol. 10(4), 1996, pp. 265-293},
abstract = {Stochastic language models are widely used in spoken language understanding to recognize and interpret the speech signal: the speech samples are decoded into word transcriptions by means of acoustic and syntactic models and then interpreted according to a semantic model. Both for speech recognition and understanding, search algorithms use stochastic models to extract the most likely uttered sentence and its correspondent interpretation. The design of the language models has to be effective in order to mostly constrain the search algorithms and has to be efficient to comply with the storage space limits. In this work we present the Variable N-gram Stochastic Automaton (VNSA) language model that provides a unified formalism for building a wide class of language models. First, this approach allows for the use of accurate language models for large vocabulary speech recognition by using the standard search algorithm in the one-pass Viterbi decoder. Second, the unified formalism is an effective approach to incorporate different sources of information for computing the probability of word sequences. Third, the VNSAs are well suited for those applications where speech and language decoding cascades are implemented through weighted rational transductions. The VNSAs have been compared to standard bigram and trigram language models and their reduced set of parameters does not affect by any means the performances in terms of perplexity. The design of a stochastic language model through the VNSA is described and applied to word and phrase class-based language models. The effectiveness of VNSAs has been tested within the Air Travel Information System (ATIS) task to build the language model for th},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1995
title = {State tying of triphone HMM's for the 1994 AT&T ARPA ATIS recognizer},
author = {Bocchieri E. and Riccardi G.},
year = {1995},
date = {1995-09-01},
journal = {Proc. EUROSPEECH '95, Madrid, Sept. 1995, pp. 1499-1503},
keywords = {Speech Processing}
}
title = {Understanding spontaneous speech},
author = {Bocchieri E., Levin E., Pieraccini R. and Riccardi G.},
year = {1995},
date = {1995-01-01},
journal = {J. of the Italian Assoc. of Artificial Intelligence, Sept. 1995},
keywords = {Machine Learning, Signal Annotation and Interpretation, Speech Processing}
}
title = {The 1994 AT&T ATIS CHRONUS recognizer},
author = {Bocchieri E., Riccardi G. and Anantharaman J.},
year = {1995},
date = {1995-01-01},
journal = {Proc. 1995 ARPA Spoken Languge Technology Workshop, Austin, Texas, Jan. 1995, pp. 265-268},
keywords = {Language Modeling, Signal Annotation and Interpretation}
}
title = {Non deterministic stochastic language models for speech recognition},
author = {Riccardi G., Bocchieri E. and Pieraccini R.},
year = {1995},
date = {1995-01-01},
journal = {Proc. ICASSP, Detroit, pp. 247-250, Detroit, 1995},
keywords = {Conversational and Interactive Systems , Language Modeling, Speech Processing}
}
1994
title = {Improved multipulse algorithm for speech coding by means of adaptive Boltzmann annealing},
author = {Mumolo E., Rebelli A. and Riccardi G.},
year = {1994},
date = {1994-01-01},
journal = {European Transactions on Telecommunications, vol. 5, no. 6, Nov. 1994},
keywords = {Speech Processing}
}
title = {A localization property of line spectrum pairs},
author = {Mian G. A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE_LSP.pdf},
year = {1994},
date = {1994-01-01},
journal = {IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 536-539, Oct. 1994},
keywords = {Speech Processing}
}
title = {The 1993 AT&T ATIS system},
author = {Bocchieri E. and Riccardi G.},
year = {1994},
date = {1994-01-01},
journal = {Proc. 1994 ARPA Spoken Language Technology Workshop, Plainsboro, NJ, March 1994, pp. 41-42},
keywords = {Language Modeling, Signal Annotation and Interpretation, Speech Processing}
}
title = {Dynamic bit allocation in subband coding of wideband audio with multipulse},
author = {Menardi P., Mian G. A. and Riccardi G.},
year = {1994},
date = {1994-01-01},
journal = {Proc. EUSIPCO, Edinburgh, 1994, pp. 1449-1452},
keywords = {Speech Processing}
}
1993
title = {Analysis-by-synthesis algorithms for low bitrate coding},
author = {Riccardi G. and Mian G. A.},
year = {1993},
date = {1993-10-01},
journal = {IEEE Workshop on Speech Coding for Telecommunications, Montreal, Oct. 1993},
keywords = {Speech Processing}
}
title = {An approach to parameter reoptimization in multipulse based coders},
author = {Fratti M., Mian G. A. and Riccardi G.},
url = {http://sisl.disi.unitn.it/wp-content/uploads/2014/11/IEEE_Multipulse.pdf},
year = {1993},
date = {1993-01-01},
journal = {IEEE Trans. Speech & Audio Proc., vol. 1, no. 4, pp. 463-465, Oct. 1993},
keywords = {Speech Processing}
}
title = {Use of the forward-backward search for large vocabulary recognition with continuous observation density HMM's},
author = {Bocchieri E. and Riccardi G.},
year = {1993},
date = {1993-01-01},
journal = {Proc. IEEE Workshop on Automatic Speech Recognition, pp. 85-86, Snowbird, 1993},
keywords = {Speech Processing}
}
1992
title = {On the effectiveness of parameter reoptimization in multipulse based coders},
author = {Fratti M., Mian G. A. and Riccardi G.},
year = {1992},
date = {1992-11-01},
journal = {Proc. ICASSP '92, San Francisco, 1992, pp. 72-77},
keywords = {Speech Processing}
}