Project Description

Operating Framework and General Objectives

Virtual communities on the Internet have grown considerably in our societies. They range from development communities (open source) to cultural communities (Wikipedia) through communities of all kinds via forums and blogs pointing to one another. These communities are all gathered by one or several topics and, according to the topic, they create information, deliver it with regard to the subject and comment it. The delivery of thematic information is fundamental to these communities, whatever its nature or duration : some are ephemeral such as those related to the current presidential elections. These communities are facing the challenge that the mass of information can be excessive and it is difficult to both monitor and view everything. This is all the more true since audio and video podcasts are getting more and more numerous and it is much more difficult to skim through an audio or video flow than through text.

The RPM2 project focuses on the summarization issue in order to provide a condensed and highly relevant information. The main objectives are the following ones:

  • summarizations will be carried out on all media: text, audio, image, video;
  • summarizations will be multi-documents, that is to say, for instance, that the production of a daily flow (comprising several documents) will be condensed into only one summary;
  • summarizations will take into account several opinions so as to reflect the oppositions.

Summarizations will be able to include only the essentials (summarization by condensation), or the maximum of different information (summarization removing the duplications but keeping all of the original information).
Works will have to be validated in a real context around the virtual communities.

Economic Stakes

The economic stakes involved in the project are major:

  • automatic summarizations are the most logical key to an excess of information;
  • the multimedia summarization is required given the exponential increase of podcasts;
  • summarizations taking into account opinion criteria to give a voice to the various trends guarantee an ethical delivery of information.

Prospects are thus important, whether it is in the industrial aspects with a commercial potential or in the scientific aspects with the openings brought by new processing and summarization methods.

Technological Barriers

Barriers are importants and concern each development point of the RPM2:

  • Summarization is a difficult technique, even on text. Applying it to audio and video in a multimedia context makes the project even more complex. However, this is what makes it so interesting because this is the challenge we have to face in the current society.
  • Fundamentally, it consists in providing synthetic and relevant representations of an unstructured set of heterogeneous documents; a scientific added value of the project will come from the methods developed to assess the expressive quality of each media and to combine them efficiently.
  • Assessment: the assessment in a summarization context is always difficult. In a multimedia context, it is even more complicated. The metrics and methods sometimes used are still not standardized. Within the framework of the plurimedia summarization, no work presenting assessment metrics has been written yet, to our knowledge. The challenges of assessment will be treated cautiously and with great precision in the RPM2 project. Subjective assessments will undoubtedly be possible thanks to the deployment of virtual communities, but other methods must be considered.
  • The opinion-based classification is a new approach requiring the implementation of a certain number of techniques which, even though they are known, will need an important research step for their combination. The assessment is also a concern in this case.
  • Taking into account external parameters to create summaries will bring additional difficulties. The aim is to develop methods allowing:
    • a summarization of all the articles concerning a specific event
    • a summarization of all the events on a specific period (limited set of topics)
    • a summarization presenting the different opinions

Bibliography

R. Benmokhtar, E. Dumont, B. Mérialdo, B. Huet - Eurecom in TrecVid 2006: high level features extractions and rushes study, TrecVid 2006, 10th International Workshop on Video Retrieval Evaluation, November 2006, Gaithersburg, USA.

R. Benmokhtar, B. Huet - Multi-level fusion for semantic indexing video content, AMR'07, International Workshop on Adaptive Multimedia Retrieval, 5-6 June 2007, Paris, France.

F. Boudin, J. M. Torres Moreno, M. El-Bèze - "Mixing Statistical and Symbolic Approaches for Chemical Names Recognition". In Proceedings of the conference CICLing 2008, Haifa (Israel), 2008 17-23 February. The Springer LNCS Proceedings 4919 - pages 334-349.

F. Boudin, B. Favre, F. Béchet, M. El-Bèze, L. Gillard, J.-M. Torres-Moreno - "The LIA-Thales summarization system at DUC-2007". In Proceedings of the Document Understanding Conference 2007, Rochester (USA), 2007 april 26-27.

F. Boudin, J. M. Torres Moreno - "A Cosine Maximization Minimization approach for User Oriented Multi-Document Update Summarization". In Proceedings of the conference RANLP 2007,    Borovets (Bulgaria), 2007 september 27-29.

F. Boudin, J. M. Torres Moreno - "NEO-CORTEX: a performant user-oriented multi document summarization system". In Proceedings of the conference CICLing 2007, Mexico DF (Mexico), 2007 18-24 February. The Springer LNCS Proceedings 4394, pages 551-562.

I. da Cunha, S. Fernandez, P. Velazquez Morales, J. Vivaldi, E. SanJuan, J. M. Torres Moreno - "A new hybrid summarizer based on Vector Space model, Statistical Physics and Linguistics". In proceedings of the conference MICAI 2007, Aguascalientes (México), 2007 4-10 november.

E. Dumont, B. Mérialdo - Split-screen dynamically accelerated video summaries, MM 2007, 15th International ACM Conference on Multimedia, 24-29 September 2007, Augsburg, Germany.

E. Dumont, B. Mérialdo - Video search using a visual dictionary, CBMI 2007, 5th International Workshop on Content-Based Multimedia Indexing, 25-27 June 2007, Bordeaux, France.

M. El-Bèze, J. M. Torres Moreno, F. Béchet - "Un duel probabiliste pour départager deux Présidents", RNTI not yet published, 2007.

M. El-Bèze, J. M. Torres-Moreno, F. Béchet - "Peut-on rendre automatiquement à César ce qui lui appartient ? Application au jeu du Chirand-Mitterrac", DEFT05 TAL/RECITAL 2005, vol 2 pp 125-134, 2005.

B. Favre, F. Béchet, P. Bellot, F. Boudin, M. El-Bèze, L. Gillard, G. Lapalme, J.-M. Torres-Moreno - "The LIA-Thales summarization system at DUC-2006". In Proceedings of the Document Understanding Conference 2006, New York (USA), 2006 8-9 june.

S. Fernandez, E. SanJuan, J. M. Torres-Moreno - "Energie textuelle de mémoires associatives". Conference TALN 2007, Toulouse (France), 2007 5-8 june. Pages 25-34.

S. Fernandez, E. SanJuan et J. M. Torres-Moreno - "Textual Energy of Associative Memories: performants applications of ENERTEX algorithm in text summarization and topic segmentation". In proceedings of the conference MICAI 2007, Aguascalientes (México), 2007 4-10 november.

S. Fernández, P. Velázquez, S. Mandin, J. M. Torres-Moreno - "Les systèmes de résumé automatique sont-ils vraiment des mauvais élèves ?". Conference JADT 2008. Lyon (France), 2008 12-14 march.

J. Jiten, B. Mérialdo - Video modeling using 3-D Hidden Markov Model, VISAPP 2007, 2nd International Conference on Computer Vision Theory and Applications, 8-11 March 2007, Barcelona, Spain.

J. Jiten, B. Huet, B. Mérialdo - Semantic feature extraction with multidimensional hidden Markov model, SPIE Conference on Multimedia Content Analysis, Management and Retrieval 2006, 17-19 January 2006, San Jose, USA - SPIE Proceedings Volume 6073, pp 211-221.

B. Mérialdo, B. Huet - Automatic video summarization, Chapter in "Interactive Video, Algorithms and Technologies" by Hammoud, Riad (Ed.), 2006, XVI, 250p., ISBN: 3-540-33214-6, pp 27-41.

M. Rouvier, G. Linares, B. Lecouteux - On-the-fly term spotting by phonetic filtering and request-driven decoding, 2008 IEEE Workshop on Spoken Language Technology, December 2008, Goa (India).

J. M. Torres-Moreno, M. El-Bèze, F. Béchet, N. Camelin - "Comment faire pour que l'opinion forgée à la sortie des urnes soit la bonne ? Application au défi DEFT 2007", DEFT07, pp 119-133, Plate-forme AFIA 2007, Grenoble, 2007.

J. M. Torres-Moreno, P. Velázquez-Morales, J. G. Meunier - "Condensés de textes par des méthodes numériques", JADT 2002, ISBN 2-7261-1198XC215 Vol. (2):723-734, A. Morin & P. Sébillot éditeurs, IRISA/INRIA., 2002.

J. M. Torres-Moreno, P. Velázquez-Morales, J. G. Meunier - "Cortex : un algorithme pour la condensation automatique des textes". La cognition entre individu et société ARCo 2001. Coord. Hélène Paugam-Moissy, Vincent Nyckees, Josiane Caron-Pargue Lyon, Hermès Science ISBN 2-746203588 France. pp 365 + vol 2. ISC-Lyon.

R. Trichet, B. Mérialdo - Fast video object selection for interactive television, ICME 2006, IEEE International Conference on Multimedia & Expo, 9-16 July 2006, Toronto, Canada.