Automatic tools

Automatic tools have been developped to segment oral corpora :

  • An automatic syntactic segmenter in German: The tool takes as an input FOLKER transcripts (FLN-XML format) which are segmented into inter-pausal units and automatically calculates boundaries of syntactic segments, merging and splitting the respective contributions as needed, and adjusting the alignment through an appropriate interpolation. The ouput is again a FOLKER transcript in FLN-XML format. The segmenter is currently applied in the internal workflow of the FOLK project. More information can be found in: Rehbein, Ines/Ruppenhofer, Josef/Schmidt, Thomas (2020): Improving Sentence Boundary Detection for Spoken Language Transcripts. In: Calzolari, Nicoletta/Béchet, Frédéric/Blache, Philippe/Choukri, Khalid/Cieri, Christopher/Declerck, Thierry/Goggi, Sara/Isahara, Hitoshi/Maegaard, Bente/Mariani, Joseph/Mazo, Hélène/Moreno, Asuncion/Odijk, Jan/Piperidis, Stelios (Hrsg.): Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), May 11-16, 2020, Palais du Pharo, Marseille, France. Paris: European Language Resources Association, 2020. S. 7102-7111.
  • Developers: Ines Rehbein and Josef Ruppenhofer
  • CHOUCAS (Chunker l’Oral : Unités linguistiques, Corpus Alignés et Segmentés), an automatic segmentation tool with a chunker for French transcriptions. The french tutorial and the software are available in free access since january 2022.
  • Developer : Flora Badin

    software launch interface