|
Abstract: . . . it is not known how “typical” of modern British English dialogue or text the BNC data really is, but we believe that the task of producing a large, well-balanced, representative corpus is probably very difficult. However, the results presented here suggest that characteristics particular to dialogue can and should be exploited in building statistical language models for dialogue. In particular, this work suggests that such models be made more sensitive to speaker turns and to the location in changes in speaker and not just to the previous lexical context. It is relatively well-known that reductions . . . . . . and a target – probably because of its frequent occurrence in twentieth century dates 6. CONCLUSIONS AND FUTURE WORK This study has shown that cache and word trigger pair models, which both allow a statistical language model some scope to adapt to the current context and which both have analogues in psycholinguistics, can be of some practical value in the statistical modelling of both text and dialogue material in modern British English. However, the relative success of the two types of models is not the same for both types of data. The relatively simple cache model is remarkably successful for . . . . . . Statistical Language Modelling”, Computer Speech & Language, Vol. 10, pp 187-228 [19] Rosenfeld, R. (2000a) “Incorporating Linguistic Structure into Statistical Language Modelling”, Philosophical Transactions of the Royal Society of London A, Vol. 358, pp 1311-1324. [20] Rosenfeld, R. (2000b) “Two Decades of Statistical Language Modelling : Where do we go from here?”, Proceedings of the IEEE, Vol. 88(8), pp 1270 - 1278 [21] Traum, D.R. & Heeman, P.A. (1996) "Utterance Units and Grounding in Spoken Dialogue", Proceedings of ICSLP '96 , October 1996 [22] Walker, M.A. (1996) “Limited Attention and Discourse . . . . . . studies by applying statistical modelling techniques – some of which have natural analogues in psycholinguistic models – to modern British English dialogue, to compare the results obtained with those for modern British English text data and to interpret the findings in the light of psycholinguistic approaches to natural language processing in humans, and dialogue in particular. 2. THE BRITISH NATIONAL CORPUS (BNC) The British National Corpus [3], henceforth referred to as the BNC, is a very large corpus of both spoken and written contemporary British English, compiled between 1991 and 1994. The . . . . . . Why and How ?”, Knowledge- Based Systems, Vol. 6, no. 4, pp 258-266 [27] Jönsson, A. & Dahlbäck, N. (2000) “Distilling Dialogues – A Method Using Natural Dialogue Corpora for Dialogue Systems Development”, Proceedings of the 6 th Applied Natural Language Processing Conference, Seattle, USA, pp 44-51. Page 8 [28] Pirker, H., Lderer, G. Trost, H. (1999) “Thus Spoke the User to the Wizard”, Proceedings of Eurospeech ’99, Budapest, Hungary, Vol. 3, pp 1171-1174 [29] Stuttle, M.N., Williams, J.D. & Young, S. (2004) “A Framework for Dialogue Data Collection with a Simulated ASR Channel”, Proceedings of . . . --3000,5,300,3369,33836
|