Definition : The chunk is the smallest linguistic’s item string possible, and a non recursif component. It include a head and other items that depends of it. The division in chunks allow a first segmentation that show some potential macro-syntaxic borders. This type of annotation is particularly usefull for spoken language. As we know spoken language can be hard to analise syntactically, specially because of those non ended sentences.
Example from Clapi :
Example from ESLO2 :
Categories :
tag | chunk type | corpora’s examples |
AP | chunk adjectival | trop [AP 0] joli [AP 1] |
AdP | chunk adverbial | peut-être [AdP 0] |
NP | chunk nominal | tes [NP 0] chaussures [NP 1] |
PP | chunk prépositionnel | de [PP 0] loin[PP 1] |
VP | chunk verbal | on [VP 0] nous [VP 1] entend [VP 2] |
ARTIC | articulateur | et [ARTIC 0] |
FNO | forme noyau | salut [FNO 0] |
UNKNOW | chunk non-identifié (amorce, mot inconnu) | le [UNKNOW] |
The annotation was done by Marie Skrovec and Iris Eshkol
Maarouf_2017_apprentissage_auto_chunk_fr_oral