Institut für Deutsche Sprache (IDS)
With the Research and Teaching Corpus of Spoken German FOLK (Forschungs- und Lehrkorpus Gesprochenes Deutsch), the Institute for the German Language (IDS) is building up a large corpus of spoken interactions, recorded on audio and/or video.
FOLK is intended as a reference corpus, to be used in research and teaching, by scholars and students, in different disciplines (or at least different subfields of linguistics), and with diverse qualitative and quantitative methodological approaches. FOLK is currently available via the Database for Spoken German (DGD), an internet platform for oral corpora. It attempts to combine tools for qualitative approaches to the data (browsing and viewing/listening to metadata, transcripts and recordings) with query mechanisms, which are known from corpus linguistics and adapted to better suit the work with spoken data.
FOLK is a constantly growing corpus of natural interaction data aiming at covering a maximal large variety of interaction types from private, institutional and public settings. In its current version, the corpus comprises 332 interactions with 285 hours of audio and 93 hours of video recordings. All recordings are transcribed according to the cGAT conventions for minimal transcripts using the FOLKER annotation tool. In sum, FOLK amounts to 2.7 million transcribed tokens. Orthographic normalisation by OrthoNormal, lemmatisation and part-of-speech tagging are added as additional annotation layers. Together with the explicit, xml-based data model of the FOLK transcripts, these tools also yielded important parameters used for methods for (semi-)automatic segmentation developed in the project.
In the context of FOLK, a series of standards and guidelines were developed aiming at establishing best practices in the work with oral corpora. Most importantly, this comprises an ISO standard for transcriptions of spoken language based on the guidelines of the Text Encoding Initiative and two guidelines discussing technical and legal aspects of the work with oral data.
More information on FOLK (in German) can be found on the website of the Archive for Spoken German (AGD). The corpus itself is accessible (after registration) via the Database for Spoken German (DGD). More than 9000 researchers, teachers and students have registered for using FOLK in the DGD.
The following papers give more information on different aspects of FOLK :
- Thomas Schmidt (2014): The Database for Spoken German – DGD2. In: Proceedings of the Ninth conference on International Language Resources and Evaluation (LREC’14), Reykjavik, Iceland: European Language Resources Association (ELRA), 1451-1457.
- Thomas Schmidt (2017): Construction and Dissemination of a Corpus of Spoken Interaction – Tools and Workflows in the FOLK project. In: Corpus Linguistic Software Tools, Journal for Language Technology and Computational Linguistics (JLCL 31/1), by Kupietz, Marc & Geyken, Alexander (Hrsg.), 127-154.
- Thomas Schmidt (2016): Good practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German. In: International Journal of Corpus Linguistics, Volume 21, Issue 3, Jan 2016, p. 396 – 418
- Thomas Schmidt (2014): The research and teaching corpus of spoken German – FOLK In: Proceedings of the Ninth conference on International Language Resources and Evaluation (LREC’14), Reykjavik, Iceland: European Language Resources Association (ELRA), 383-387.
Team
- Core team:
- Thomas SCHMIDT, Head of the Department « Oral Corpora », German Coordinator, Research Director
- Arnulf DEPPERMANN, Professor, Research Director
- Swantje WESTPFAHL, Post-doc, Mannheim Research Coordinator (until July 2019)
- Ines REHBEIN, Post-doc, Research Associate (September 2019 – December 2019)
- Student assistants
- Isabell NEISE, Student Assistant
- Melanie HOBICH, Student Assistant
- Julia LARBIG, Student Assistant
- Anton BORLINGHAUS, Student Assistant
- Hanna STRUB, Student Assistant
- Arthur BERGS, Student Assistant
- Asscociated members
- Henrike HELMER, Post-doc, Research Associate
- Jan GORISCH, Post-doc, Research Associate
- Nadine PROSKE, Post-doc, Research Associate
- Joachim GASCH, Post-doc, Research Associate
- Josef RUPPENHOFER, Post-doc, Research Associate