Previous ANR projects

From the conception of  SegCor project, we planned to build on previous projects concerning prosody or syntax in order to proceed with the segmentation by taking advantage of previous work, so as not to reinvent the wheel. The main issue was to verify that they were adapted to interaction corpora.

As regard interaction it was impossible because no prior work of segmentation existed for this linguistic domain.


In syntax, the most recent projects being the ANR Orfeo and Rhapsodie projects, we decided to focus on their results by checking if they were suitable for interaction data :

  • Orfeo ANR : As ICAR was partner of the project, we had good knowledge of the results but we contacted the responsible Jeanne-Marie Debaisieux who gave us her agreement and we benefited from her know-how on the segmentation of the corpus :
    •  we reused the segmenter developed in Orfeo to split the corpus into « macro units » with its guidelines, a first step before defining more precise units ;
    • for syntactics unities, whereas the division was adapted to the processing by automatic tools of the oral ORFEO corpus of 3.5 million words, certain choices made did not match our theoretical requirements for interaction data, so we proposed another solution more suited to our needs.
  • Rhapsodie ANR : As Nathalie Rossi Gensane was member of the project, she contributed to discussions to make decisions and establish the guidelines :
    • we have taken into account the catagories and tried to apply them on interaction corpora ;
    • we simplified some of them, because they were too complex for non experts users.

In summary we proposed new segmentation rules suitable to interaction corpora with a compromise between the macro categories of Orfeo adapted to automatic tools and the very fine, for non experts users complexful categories of Rhapsodie. We organized a workshop in Lyon in 2017 with Sylvain Kahane  and a session with Kim Gerdes in our symposium in Lyon in 2018 to discuss our solutions, particularly for the specific categories of interaction.


In prosody, we reused  the guidelines of the most recent ANR project Rhapsody, as François Delafontaine used to work with Mathieu Avanzi, he contacted him again for advice on how to get started:

  • we reused the Rhapsodie guidelines but we suggest some adaptations required for our pilot corpus on interaction, a type of oral data which is not included in the Rhapsody corpus ;
  • we took over the automatic tool Analor and checked the results between the automatic and the manual annotations, and discovered a lot of differences that we didn’t manage to resolve.

In summary, we proposed an adaptation of the Rhapsodie guidelines in prosody which were suitable to interaction corpora. We organized a workshop with Mathieu Avanzi on prosody in Lyon in 2017 to discuss this adaptation.