LExique des Formes Fléchies du Français (on licence « LGPLLR » (Lesser General Public License For Linguistic Resources) with POS and lemma was used in order to have a segmentation for words et multiwords expressions like « aujourd’hui », « ciné-club » and « par exemple ».
This software was used before the project both in ESLO and CLAPI to align and transcribe corpora easily, then we imported this format file (trs) in eXmaralda.
The EXMARaLDA tools (FOLKER, Partitur-Editor) are used for most of the segmentation and annotation tasks in this project. Our results will be based on this software.
This software was used for the french pilot corpus to annotate precisely the signal to identify the proeminences and the disfluences, as well as for interactions of more than 3 speakers that were difficult to process within eXmalrada.
This software is an annotation tool. The results of the CHOUCAS tool (one of the automatic tools created) are viewable on this software.
In order to create our automatic tools (cf Automatic tools), some preexisting software have been used :
Wapiti was used for segmenting and labeling sequences with discriminative models (maxent models, maximum entropy Markov models and linear-chain CRF) for the chunker.
The TreeTagger was used to annotate transcriptions with part-of-speech and lemma information for french corpus and also for the chunker (machine learning).
This software was used for the french pilot corpus in order to have word-to-speech alignment, useful for all the other annotation created.
JTrans was used for his automatic word-to-speech alignment for the chunker.
This software is a helpful conversion tool from Elan, Clan, Transcriber and Praat files to TEI files and back, used for the chunker.