Project Description

Operating Framework and General Objectives

Virtual communities on the Internet have grown considerably in our societies. They range from development communities (open source) to cultural communities (Wikipedia) through communities of all kinds via forums and blogs pointing to one another. These communities are all gathered by one or several topics and, according to the topic, they create information, deliver it with regard to the subject and comment it. The delivery of thematic information is fundamental to these communities, whatever its nature or duration : some are ephemeral such as those related to the current presidential elections. These communities are facing the challenge that the mass of information can be excessive and it is difficult to both monitor and view everything. This is all the more true since audio and video podcasts are getting more and more numerous and it is much more difficult to skim through an audio or video flow than through text.

The RPM2 project focuses on the summarization issue in order to provide a condensed and highly relevant information. The main objectives are the following ones:

  • summarizations will be carried out on all media: text, audio, image, video;
  • summarizations will be multi-documents, that is to say, for instance, that the production of a daily flow (comprising several documents) will be condensed into only one summary;
  • summarizations will take into account several opinions so as to reflect the oppositions.

Summarizations will be able to include only the essentials (summarization by condensation), or the maximum of different information (summarization removing the duplications but keeping all of the original information).
Works will have to be validated in a real context around the virtual communities.

Economic Stakes

The economic stakes involved in the project are major:

  • automatic summarizations are the most logical key to an excess of information;
  • the multimedia summarization is required given the exponential increase of podcasts;
  • summarizations taking into account opinion criteria to give a voice to the various trends guarantee an ethical delivery of information.

Prospects are thus important, whether it is in the industrial aspects with a commercial potential or in the scientific aspects with the openings brought by new processing and summarization methods.

Technological Barriers

Barriers are importants and concern each development point of the RPM2:

  • Summarization is a difficult technique, even on text. Applying it to audio and video in a multimedia context makes the project even more complex. However, this is what makes it so interesting because this is the challenge we have to face in the current society.
  • Fundamentally, it consists in providing synthetic and relevant representations of an unstructured set of heterogeneous documents; a scientific added value of the project will come from the methods developed to assess the expressive quality of each media and to combine them efficiently.
  • Assessment: the assessment in a summarization context is always difficult. In a multimedia context, it is even more complicated. The metrics and methods sometimes used are still not standardized. Within the framework of the plurimedia summarization, no work presenting assessment metrics has been written yet, to our knowledge. The challenges of assessment will be treated cautiously and with great precision in the RPM2 project. Subjective assessments will undoubtedly be possible thanks to the deployment of virtual communities, but other methods must be considered.
  • The opinion-based classification is a new approach requiring the implementation of a certain number of techniques which, even though they are known, will need an important research step for their combination. The assessment is also a concern in this case.
  • Taking into account external parameters to create summaries will bring additional difficulties. The aim is to develop methods allowing:
    • a summarization of all the articles concerning a specific event
    • a summarization of all the events on a specific period (limited set of topics)
    • a summarization presenting the different opinions


