- Abstract:
-
Multiparty meetings are a ubiquitous feature of organizations, and there are considerable economic benefits that would arise from their automatic analysis and structuring. In this paper we are concerned with the segmentation and structuring of meetings (recorded using multiple cameras and microphones) into sequences of group meeting actions such as monologue, discussion and presentation. We outline four families of multimodal features based on speaker turns, lexical transcription, prosody and visual motion that are extracted from the raw audio and video recordings. We relate these low-level features to more complex group behaviours using a multistream modelling framework based on multistream dynamic Bayesian networks. This results in an effective approach to the segmentation problem, resulting in an action error rate of only 12.2%, compared with 43% using an approach based on hidden Markov models. Moreover the multistream dynamic Bayesian network developed here leaves scope for many further improvements and extensions.
- Links To Paper
- 1st Link
- Bibtex format
- @Article{EDI-INF-RR-0930,
- author = {
Alfred Dielmann
and Steve Renals
},
- title = {Automatic meeting segmentation using dynamic Bayesian networks},
- journal = {IEEE Transactions on Multimedia},
- publisher = {IEEE Signal Processing Society (+ 3 other IEEE societies)},
- year = 2007,
- month = {Jan},
- volume = {9},
- pages = {25-36},
- doi = {10.1109/TMM.2006.886337},
- url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=4032598&arnumber=4032608&count=20&index=3},
- }
|