Segmented data - SEGmentation of oral CORpora

The available segmented data consist of :

The recordings :

records_CLAPI_pack1

records_CLAPI_pack2

records_ESLO2_pack1

records_ESLO2_pack2

The original transcriptions documents : Annotations_original

The orthographic annotation : Annotations_ortho

The Interactional Units annotation : Annotations_IAU

The chunk annotation is divided in two parts : Annotations_chunk

the first segmentation is based on the interactional unit one
the second is segmented on the pauses present in the recordings

The fribourg’s macrosyntax annotation : Annotations_MSF

The proeminence annotation : Annotations_prom

All resources (corpus, annotations, guides, tools) are reusable identically or modifiable but for non-commercial uses and with citation of the source (SegCor, http://segcor.cnrs.fr for annotations / guides; Eslo for the corpus Eslo http://eslo.huma-num.fr/ or Clapi for the corpus http://clapi.icar.cnrs.fr; and http://segcor.cnrs.fr/deliverable/tools/ for the tools), they can be redistributed under the same conditions according to the Creative Common 4.0 International license ( CC BY-NC-SA 4.0, https://creativecommons.org/licenses/by-nc-sa/4.0/).

The German annotated data (pilot corpus annotation, GOLD standard segmentation and annotation results from the pause annotation experiment) are available from the download pages of the Database for Spoken German (DGD) via the following link:

https://dgd.ids-mannheim.de/DGD2Web/ExternalAccessServlet?command=goldStandard&corpus=FOLK-SegCor