German-French collaboration - SEGmentation of oral CORpora

From the beginning of the project, the German and the French teams worked closely together, meeting and exchanging frequently during videoconferences, regular thematic workshops and annual symposia to define together:

A pilot corpus: we chose together comparable interactional settings in German and French data to make comparison easier
A set of common critical cases: we identified a shared set of critical segmentation cases in German and French with several examples (cf. deliverables)
The conditions and requirements of tasks automatisation: we discussed together the possibility of automatic and semi-automatic segmentation of interactional corpora
The level of expertise of the annotators: we discussed how to chose the annotators (number, level of expertise, etc.) in order to use the guidelines and allow us to improve them
The needed exploratory work on segmentation in interaction concerning the units to chose, first comparative analyses, the necessity of publishing a scientific article (position paper) on these questions, the annotation guidelines
The conditions for a contrastive study of German and French:
- Divergence in macrosyntactic units: we analysed our common set of critical segmentation cases and found different solutions of segmentation for German and French based on different sets of categories (cf. Deliverables)
- Convergence in interactional units: the same methodology was applied for German and French where the annotations were similar with a common set of categories (cf. Deliverables)
The use of a common annotation tool EXMaRALDA developped by Thomas Schmidt