Research | Verena Blaschke

Publications & preprints

(*Denotes equal contribution.)

Standard-to-dialect transfer trends differ across text and speech:
A case study on intent and topic classification in German dialects

Verena Blaschke, Miriam Winkler & Barbara Plank

Unpublished preprint · Abstract Preprint

Research on cross-dialectal transfer from a standard to a non-standard dialect variety has typically focused on text data. However, dialects are primarily spoken, and non-standard spellings are known to cause issues in text processing. We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model. In our experiments, we focus on German and multiple German dialects in the context of written and spoken intent and topic classification. To that end, we release the first dialectal audio intent classification dataset. We find that the speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data. While the cascaded systems lag behind the text-only models for German, they perform relatively well on the dialectal data if the transcription system generates normalized, standard-like output.
Make every letter count:
Building dialect variation dictionaries from monolingual corpora

Robert Litschko, Verena Blaschke, Diana Burkhardt, Barbara Plank & Diego Frassinelli

EMNLP Findings 2025 · Abstract Cite PDF Code/Data

Dialects exhibit a substantial degree of variation due to the lack of a standard orthography. At the same time, the ability of Large Language Models (LLMs) to process dialects remains largely understudied. To address this gap, we use Bavarian as a case study and investigate the lexical dialect understanding capability of LLMs by examining how well they recognize and translate dialectal terms across different parts-of-speech. To this end, we introduce DiaLemma, a novel annotation framework for creating dialect variation dictionaries from monolingual data only, and use it to compile a ground truth dataset consisting of 100K human-annotated German-Bavarian word pairs. We evaluate how well nine state-of-the-art LLMs can judge Bavarian terms as dialect translations, inflected variants, or unrelated forms of a given German lemma. Our results show that LLMs perform best on nouns and lexically similar word pairs, and struggle most in distinguishing between direct translations and inflected variants. Interestingly, providing additional context in the form of example usages improves the translation performance, but reduces their ability to recognize dialect variants. This study highlights the limitations of LLMs in dealing with orthographic dialect variation and emphasizes the need for future work on adapting LLMs to dialects.
Robert Litschko, Verena Blaschke, Diana Burkhardt, Barbara Plank, and Diego Frassinelli (2025). “Make Every Letter Count: Building Dialect Variation Dictionaries from Monolingual Corpora.” In Findings of the Association for Computational Linguistics: EMNLP 2025, pp. 14157–14174. Suzhou, China. Association for Computational Linguistics.
@inproceedings{litschko-etal-2025-make, title = {Make Every Letter Count: Building Dialect Variation Dictionaries from Monolingual Corpora}, author = {Litschko, Robert and Blaschke, Verena and Burkhardt, Diana and Plank, Barbara and Frassinelli, Diego}, editor = {Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025}, month = nov, year = {2025}, address = {Suzhou, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.findings-emnlp.762/}, pages = {14157--14174}, isbn = {979-8-89176-335-7}, }
DistaLs: A comprehensive collection of language distance measures

Rob van der Goot, Esther Ploeger, Verena Blaschke & Tanja Samardžić

EMNLP System Demos 2025 · Abstract Cite PDF Code

Languages vary along a wide variety of dimensions. In Natural Language Processing (NLP), it is useful to know how “distant” languages are from each other, so that we can inform NLP models about these differences or predict good transfer languages. Furthermore, it can inform us about how diverse language samples are. However, there are many different perspectives on how distances across languages could be measured, and previous work has predominantly focused on either intuition or a single type of distance, like genealogical or typological distance. Therefore, we propose DistaLs, a toolkit that is designed to provide users with easy access to a wide variety of language distance measures. We also propose a filtered subset, which contains less redundant and more reliable features. DistaLs is designed to be accessible for a variety of use cases, and offers a Python, CLI, and web interface. It is easily updateable, and available as a pip package. Finally, we provide a case-study in which we use DistaLs to measure correlations of distance measures with performance on four different morphosyntactic tasks.
Rob van der Goot, Esther Ploeger, Verena Blaschke, and Tanja Samardžić (2025). “DistaLs: A Comprehensive Collection of Language Distance Measures.” In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 307–318. Suzhou, China. Association for Computational Linguistics.
@inproceedings{van-der-goot-etal-2025-distals, title = {{DistaLs:} A Comprehensive Collection of Language Distance Measures}, author = {van~der~Goot, Rob and Ploeger, Esther and Blaschke, Verena and Samardžić, Tanja}, year = {2025}, month = nov, editor = {Habernal, Ivan and Schulam, Peter and Tiedemann, J{\"o}rg}, booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations}, address = {Suzhou, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.emnlp-demos.23/}, pages = {307--318}, isbn = {979-8-89176-334-0}, }
A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation

Verena Blaschke, Miriam Winkler, Constantin Förster, Gabriele Wenger-Glemser & Barbara Plank

Interspeech 2025 · Abstract Cite PDF Slides Supplementary material Press

Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany (Franconian, Bavarian, Alemannic), and half an hour of Standard German speech. We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them. We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions. Qualitative error analyses of the best ASR model reveal that it sometimes normalizes grammatical differences, but often stays closer to the dialectal constructions.
- Bedtime stories reveal how automatic speech recognition struggles with Bavarian dialect, AI for Media Network, 18 Aug 2025
- The press release got picked up by multiple German news outlets, for instance: Bairisch für Anfänger: KI versteht keine bayerischen Dialekte [Bavarian for beginners: AI does not understand dialects from Bavaria], Der Spiegel, 21 Aug 2025
- Warum KI kein Bairisch versteht [Why AI doesn’t understand Bavarian]. Segment in the tv programme Capriccio on BR (Bavarian Broadcasting Station), 23 Oct 2025
Verena Blaschke, Miriam Winkler, Constantin Förster, Gabriele Wenger-Glemser, and Barbara Plank (2025). “A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation.” In Interspeech 2025, pp. 913–917. DOI: 10.21437/Interspeech.2025-318
@inproceedings{blaschke-etal-2025-multi, title = {A Multi-Dialectal Dataset for {German} Dialect {ASR} and Dialect-to-Standard Speech Translation}, author = {Blaschke, Verena and Winkler, Miriam and Förster, Constantin and Wenger-Glemser, Gabriele and Plank, Barbara}, booktitle = {Interspeech 2025}, pages = {913--917}, doi = {10.21437/Interspeech.2025-318}, year = {2025}, month = aug, <li><a href="https://aiformedia.network/bedtime-stories-reveal-how-automatic-speech-recognition-struggles-with-bavarian-dialect/">Bedtime stories reveal how automatic speech recognition struggles with Bavarian dialect</a>, AI for Media Network, 18 Aug 2025</li> </ul>} }
Analyzing the effect of linguistic similarity on cross-lingual transfer:
Tasks and experimental setups matter

Verena Blaschke, Masha Fedzechkina & Maartje ter Hoeve

ACL Findings 2025 · Abstract Cite PDF Slides Poster

Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families and/or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we analyze cross-lingual transfer for 263 languages from a wide variety of language families. Moreover, we include three popular NLP tasks: POS tagging, dependency parsing, and topic classification. Our findings indicate that the effect of linguistic similarity on transfer performance depends on a range of factors: the NLP task, the (mono- or multilingual) input representations, and the definition of linguistic similarity.
Verena Blaschke, Masha Fedzechkina, and Maartje ter Hoeve (2025). “Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter.” In Findings of the Association for Computational Linguistics: ACL 2025, pp. 8653–8684. Vienna, Austria. Association for Computational Linguistics. DOI: 10.18653/v1/2025.findings-acl.454
@inproceedings{blaschke-etal-2025-analyzing, title = {Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter}, author = {Blaschke, Verena and Fedzechkina, Masha and ter~Hoeve, Maartje}, editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher}, year = {2025}, url = {https://aclanthology.org/2025.findings-acl.454/}, doi = {10.18653/v1/2025.findings-acl.454}, pages = {8653--8684}, isbn = {979-8-89176-256-5}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, month = jul, }
Methods and resources in Germanic variationist linguistics

John Nerbonne, Verena Blaschke, Hinrich Schütze & Barbara Plank

Oxford Research Encyclopedia of Linguistics 2025 · Abstract Cite PDF Preprint

Variationist linguistics, encompassing dialectology and sociolinguistics, studies how linguistic variation is distributed and the dynamics behind the distribution. This article aims to present the most important current resources—methods and data and software archives—for research in Germanic variationist linguists. It is novel to include an article on resources in a collection such as this Encyclopedia, so we begin by motivating its inclusion, justifying why contemporary scholars are expected to make resources available to the discipline. With respect to methods, the emphasis is on analytical methods as opposed to methods for field work, site selection, or interviews, and the focus is on software for data analysis. With respect to archives, we emphasize digital repositories. We report on resources important in the variationist research community, that is, dialectology and sociolinguistics, but also on resources used in the growing community of computational linguists interested in variation.
John Nerbonne, Verena Blaschke, Hinrich Schütze, and Barbara Plank (2025). “Methods and Resources in Germanic Variationist Linguistics.” Oxford University Press. DOI: 10.1093/acrefore/9780199384655.013.1033
@incollection{nerbonne-etal-2025-methods, title = {Methods and Resources in {Germanic} Variationist Linguistics}, author = {Nerbonne, John and Blaschke, Verena and Sch\"{u}tze, Hinrich and Plank, Barbara}, booktitle = {Oxford Research Encyclopedia of Linguistics}, publisher = {Oxford University Press}, editor = {K\"{u}rschner, Sebastian and Dammel, Antje}, doi = {10.1093/acrefore/9780199384655.013.1033}, url = {https://doi.org/10.1093/acrefore/9780199384655.013.1033}, year = {2025}, month = may, }
Evaluating pixel language models on non-standardized languages

Alberto Muñoz-Ortiz, Verena Blaschke & Barbara Plank

COLING 2025 · Abstract Cite PDF Model

We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts.
Alberto Muñoz-Ortiz, Verena Blaschke, and Barbara Plank (2025). “Evaluating Pixel Language Models on Non-Standardized Languages.” In Proceedings of the 31st International Conference on Computational Linguistics, pp. 6412–6419. Abu Dhabi, UAE. Association for Computational Linguistics.
@inproceedings{munoz-ortiz-etal-2025-evaluating, author = {Mu{\~n}oz-Ortiz, Alberto and Blaschke, Verena and Plank, Barbara}, title = {Evaluating Pixel Language Models on Non-Standardized Languages}, booktitle = {Proceedings of the 31st International Conference on Computational Linguistics}, year = {2025}, month = jan, address = {Abu Dhabi, UAE}, editor = {Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Di Eugenio, Barbara and Schockaert, Steven}, publisher = {Association for Computational Linguistics}, pages = {6412–6419}, url = {https://aclanthology.org/2025.coling-main.427/}, }
Cross-dialect information retrieval:
Information access in low-resource and high-variance languages

Robert Litschko, Oliver Kraus, Verena Blaschke & Barbara Plank

COLING 2025 · Abstract Cite PDF Data

A large amount of local and culture-specific knowledge (e.g., people, traditions, food) can only be found in documents written in dialects. While there has been extensive research conducted on cross-lingual information retrieval (CLIR), the field of cross-dialect retrieval (CDIR) has received limited attention. Dialect retrieval poses unique challenges due to the limited availability of resources to train retrieval models and the high variability in non-standardized languages. We study these challenges on the example of German dialects and introduce the first German dialect retrieval dataset, dubbed WikiDIR, which consists of seven German dialects extracted from Wikipedia. Using WikiDIR, we demonstrate the weakness of lexical methods in dealing with high lexical variation in dialects. We further show that commonly used zero-shot cross-lingual transfer approach with multilingual encoders do not transfer well to extremely low-resource setups, motivating the need for resource-lean and dialect-specific retrieval models. We finally demonstrate that (document) translation is an effective way to reduce the dialect gap in CDIR.
Robert Litschko, Oliver Kraus, Verena Blaschke, and Barbara Plank (2025). “Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages.” In Proceedings of the 31st International Conference on Computational Linguistics, pp. 10158–10171. Abu Dhabi, UAE. Association for Computational Linguistics.
@inproceedings{litschko-etal-2025-cross, author = {Litschko, Robert and Kraus, Oliver and Blaschke, Verena and Plank, Barbara}, title = {Cross-Dialect Information Retrieval: {Information} Access in Low-Resource and High-Variance Languages}, booktitle = {Proceedings of the 31st International Conference on Computational Linguistics}, year = {2025}, month = jan, address = {Abu Dhabi, UAE}, editor = {Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Di Eugenio, Barbara and Schockaert, Steven}, publisher = {Association for Computational Linguistics}, pages = {10158--10171}, url = {https://aclanthology.org/2025.coling-main.678/}, }
Improving dialectal slot and intent detection with auxiliary tasks:
A multi-dialectal Bavarian case study

Xaver Maria Krückl*, Verena Blaschke* & Barbara Plank

VarDial @ COLING 2025 · Abstract Cite PDF Slides Data Code

Reliable slot and intent detection (SID) is crucial in natural language understanding for applications like digital assistants. Encoder-only transformer models fine-tuned on high-resource languages generally perform well on SID. However, they struggle with dialectal data, where no standardized form exists and training data is scarce and costly to produce. We explore zero-shot transfer learning for SID, focusing on multiple Bavarian dialects, for which we release a new dataset for the Munich dialect. We evaluate models trained on auxiliary tasks in Bavarian, and compare joint multi-task learning with intermediate-task training. We also compare three types of auxiliary tasks: token-level syntactic tasks, named entity recognition (NER), and language modelling. We find that the included auxiliary tasks have a more positive effect on slot filling than intent classification (with NER having the most positive effect), and that intermediate-task training yields more consistent performance gains. Our best-performing approach improves intent classification performance on Bavarian dialects by 5.1 and slot filling F1 by 8.4 percentage points.
Xaver Maria Krückl, Verena Blaschke, and Barbara Plank (2025). “Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study.” In Proceedings of the Twelfth Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2025), pp. 128–146. Abu Dhabi, UAE. Association for Computational Linguistics.
@inproceedings{krueckl-etal-2025-improving, title = {Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: {A} Multi-Dialectal {Bavarian} Case Study}, author = {Kr{\"u}ckl, Xaver Maria and Blaschke, Verena and Plank, Barbara}, year = {2025}, editor = {Scherrer, Yves and Jauhiainen, Tommi and Ljube{\v{s}}i{\'c}, Nikola and Nakov, Preslav and Tiedemann, J{\"o}rg and Zampieri, Marcos}, booktitle = {Proceedings of the Twelfth Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2025)}, publisher = {Association for Computational Linguistics}, month = jan, address = {Abu Dhabi, UAE}, url = {https://aclanthology.org/2025.vardial-1.10/}, pages = {128--146}, }
Add noise, tasks, or layers?
MaiNLP at the VarDial 2025 shared task on Norwegian dialectal slot and intent detection

Verena Blaschke*, Felicia Körner* & Barbara Plank

VarDial @ COLING 2025 · Abstract Cite PDF Poster Code

Slot and intent detection (SID) is a classic natural language understanding task. Despite this, research has only more recently begun focusing on SID for dialectal and colloquial varieties. Many approaches for low-resource scenarios have not yet been applied to dialectal SID data, or compared to each other on the same datasets. We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties, and compare multiple set-ups: varying the training data (English, Norwegian, or dialectal Norwegian), injecting character-level noise, training on auxiliary tasks, and applying Layer Swapping, a technique in which layers of models fine-tuned on different datasets are assembled into a model. We find noise injection to be beneficial while the effects of auxiliary tasks are mixed. Though some experimentation was required to successfully assemble a model from layers, it worked surprisingly well; a combination of models trained on English and small amounts of dialectal data produced the most robust slot predictions. Our best models achieve 97.6% intent accuracy and 85.6% slot F1 in the shared task.
Verena Blaschke, Felicia Körner, and Barbara Plank (2025). “Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection.” In Proceedings of the Twelfth Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2025), pp. 182–199. Abu Dhabi, UAE. Association for Computational Linguistics.
@inproceedings{blaschke-etal-2025-add, title = {Add Noise, Tasks, or Layers? {MaiNLP} at the {VarDial} 2025 Shared Task on {Norwegian} Dialectal Slot and Intent Detection}, author = {Blaschke, Verena and K{\"o}rner, Felicia and Plank, Barbara}, year = {2025}, booktitle = {Proceedings of the Twelfth Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2025)}, editor = {Scherrer, Yves and Jauhiainen, Tommi and Ljube{\v{s}}i{\'c}, Nikola and Nakov, Preslav and Tiedemann, J{\"o}rg and Zampieri, Marcos}, publisher = {Association for Computational Linguistics}, pages = {182--199}, month = jan, address = {Abu Dhabi, UAE}, url = {https://aclanthology.org/2025.vardial-1.14/}, }
What do dialect speakers want?
A survey of attitudes towards language technology for German dialects

Verena Blaschke, Christoph Purschke, Hinrich Schütze & Barbara Plank

ACL 2024 · Abstract Cite PDF Poster

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations’ needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German – a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.
Verena Blaschke, Christoph Purschke, Hinrich Schütze, and Barbara Plank (2024). “What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects.” In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 823–841. Bangkok, Thailand. Association for Computational Linguistics.
@inproceedings{blaschke-etal-2024-dialect, title = {What Do Dialect Speakers Want? {A} Survey of Attitudes Towards Language Technology for {German} Dialects}, author = {Blaschke, Verena and Purschke, Christoph and Schütze, Hinrich and Plank, Barbara}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, year = {2024}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-short.74}, pages = {823--841}, }
MaiBaam: A multi-dialectal Bavarian Universal Dependency treebank

Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze & Barbara Plank

LREC–COLING 2024 · Abstract Cite PDF Poster Data Code Annotation guidelines

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in within-language breadth’: most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers’ orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available.
Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze, and Barbara Plank (2024). “MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 10921–10938. Torino, Italy. ELRA and ICCL.
@inproceedings{blaschke-etal-2024-maibaam, title = {{MaiBaam}: {A} Multi-Dialectal {Bavarian} {Universal} {Dependency} Treebank}, author = {Blaschke, Verena and Kova{\v{c}}i{\'c}, Barbara and Peng, Siyao and Sch{\"u}tze, Hinrich and Plank, Barbara}, editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, month = may, year = {2024}, address = {Torino, Italy}, publisher = {ELRA and ICCL}, url = {https://aclanthology.org/2024.lrec-main.953}, pages = {10921--10938}, }
Sebastian, Basti, Wastl?!
Recognizing named entities in Bavarian dialectal data

Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova & Barbara Plank

LREC–COLING 2024 · Abstract Cite PDF Data

Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs from standard German in lexical distribution, syntactic construction, and entity information. We conduct in-domain, cross-domain, sequential, and joint experiments on two Bavarian and three German corpora and present the first comprehensive NER results on Bavarian. Incorporating knowledge from the larger German NER (sub-)datasets notably improves on bar-wiki and moderately on bar-tweet. Inversely, training first on Bavarian contributes slightly to the seminal German CoNLL 2006 corpus. Moreover, with gold dialect labels on Bavarian tweets, we assess multi-task learning between five NER and two Bavarian-German dialect identification tasks and achieve NER SOTA on bar-wiki. We substantiate the necessity of our low-resource BarNER corpus and the importance of diversity in dialects, genres, and topics in enhancing model performance.
Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, and Barbara Plank (2024). “Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 14478–14493. Torino, Italy. ELRA and ICCL.
@inproceedings{peng-etal-2024-sebastian, title = {{Sebastian}, {Basti}, {Wastl}?! Recognizing Named Entities in {Bavarian} Dialectal Data}, author = {Peng, Siyao and Sun, Zihang and Shan, Huangyan and Kolm, Marie and Blaschke, Verena and Artemova, Ekaterina and Plank, Barbara}, editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, month = may, year = {2024}, address = {Torino, Italy}, publisher = {ELRA and ICCL}, url = {https://aclanthology.org/2024.lrec-main.1262}, pages = {14478--14493}, }
Exploring the robustness of task-oriented dialogue systems for colloquial German varieties

Ekaterina Artemova, Verena Blaschke & Barbara Plank

EACL 2024 · Abstract Cite PDF Slides Poster Code

Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages. We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data. Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets. Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance.Using these new datasets, we conduct an experimental evaluation across six different transformers. Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F_1 score. Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.
Ekaterina Artemova, Verena Blaschke, and Barbara Plank (2024). “Exploring the robustness of task-oriented dialogue systems for colloquial German varieties.” In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 445–468. St. Julian’s, Malta. Association for Computational Linguistics.
@inproceedings{artemova-etal-2024-exploring, title = {Exploring the robustness of task-oriented dialogue systems for colloquial {German} varieties}, author = {Artemova, Ekaterina and Blaschke, Verena and Plank, Barbara}, editor = {Graham, Yvette and Purver, Matthew}, booktitle = {Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)}, year = {2024}, month = mar, address = {St. Julian{'}s, Malta}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.eacl-long.28}, pages = {445--468}, }
A survey of corpora for Germanic low-resource languages and dialects

Verena Blaschke, Hinrich Schütze & Barbara Plank

NoDaLiDa 2023 · Abstract Cite Website PDF Slides

Despite much progress in recent years, the vast majority of work in natural language processing (NLP) is on standard languages with many speakers. In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages. Even within branches of major language families, often considered well-researched, little is known about the extent and type of available resources and what the major NLP challenges are for these language varieties. The first step to address this situation is a systematic survey of available corpora (most importantly, annotated corpora, which are particularly valuable for NLP research). Focusing on Germanic low-resource language varieties, we provide such a survey in this paper. Except for geolocation (origin of speaker or document), we find that manually annotated linguistic resources are sparse and, if they exist, mostly cover morphosyntax. Despite this lack of resources, we observe that interest in this area is increasing: there is active development and a growing research community. To facilitate research, we make our overview of over 80 corpora publicly available.
Verena Blaschke, Hinrich Schütze, and Barbara Plank (2023). “A survey of corpora for Germanic low-resource languages and dialects.” In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 392–414. Tórshavn, Faroe Islands. University of Tartu Library.
@inproceedings{blaschke-etal-2023-survey, title = {A survey of corpora for {Germanic} low-resource languages and dialects}, author = {Blaschke, Verena and Sch{\"u}tze, Hinrich and Plank, Barbara}, year = {2023}, month = may, booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)}, address = {T{\'o}rshavn, Faroe Islands}, publisher = {University of Tartu Library}, url = {https://aclanthology.org/2023.nodalida-1.41}, pages = {392--414}, }
Does manipulating tokenization aid cross-lingual transfer?
A study on POS tagging for non-standardized languages

Verena Blaschke, Hinrich Schütze & Barbara Plank

VarDial @ EACL 2023 · Abstract Cite PDF Slides Video Poster Code

One of the challenges with finetuning pretrained language models (PLMs) is that their tokenizer is optimized for the language(s) it was pretrained on, but brittle when it comes to previously unseen variations in the data. This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography. Despite the high linguistic similarity, tokenization no longer corresponds to meaningful representations of the target data, leading to low performance in, e.g., part-of-speech tagging.
In this work, we finetune PLMs on seven languages from three different families and analyze their zero-shot performance on closely related, non-standardized varieties. We consider different measures for the divergence in the tokenization of the source and target data, and the way they can be adjusted by manipulating the tokenization during the finetuning step. Overall, we find that the similarity between the percentage of words that get split into subwords in the source and target data (the split word ratio difference) is the strongest predictor for model performance on target data.
Verena Blaschke, Hinrich Schütze, and Barbara Plank (2023). “Does manipulating tokenization aid cross-lingual transfer? A study on POS tagging for non-standardized languages.” In Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 40–54. Dubrovnik, Croatia. Association for Computational Linguistics.
@inproceedings{blaschke-etal-2023-manipulating, title = {Does manipulating tokenization aid cross-lingual transfer? {A} study on {POS} tagging for non-standardized languages}, author = {Blaschke, Verena and Sch{\"u}tze, Hinrich and Plank, Barbara}, booktitle = {Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects}, year = {2023}, month = may, address = {Dubrovnik, Croatia}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.vardial-1.5}, pages = {40--54}, }
Navigable atom-rule interactions in PSL models enhanced by rule verbalizations, with an application to etymological inference

Verena Blaschke, Thora Daneyko, Jekaterina Kaparina, Zhuge Gao & Johannes Dellert

ILP 2022 · Abstract Cite PDF Slides Poster Code (PSL-Infrastructure) Code (PSL-RAGviewer)

Adding to the budding landscape of advanced analysis tools for Probabilistic Soft Logic (PSL), we present a graphical explorer for grounded PSL models. It exposes the structure of the model from the perspective of any single atom, listing the ground rules in which it occurs. The other atoms in these rules serve as links for navigation through the resulting rule-atom graph (RAG). As additional diagnostic criteria, each associated rule is further classified as exerting upward or downward pressure on the atom’s value, and as active or inactive depending on its importance for the MAP estimate.
Our RAG viewer further includes a general infrastructure for making PSL results explainable by stating the reasoning patterns in terms of domain language. For this purpose, we provide a Java interface for “talking” predicates and rules which can generate verbalized explanations of the atom interactions effected by each rule. If the model’s rules are structured similarly to the way the domain is conceptualized by users, they will receive an intuitive explanation of the result in natural language.
As an example application, we present the current state of the loanword detection component of EtInEn, our upcoming software for machine-assisted etymological theory development.
Verena Blaschke, Thora Daneyko, Jekaterina Kaparina, Zhuge Gao, and Johannes Dellert (2024). “Navigable atom-rule interactions in PSL models enhanced by rule verbalizations, with an application to etymological inference.” In Inductive Logic Programming (ILP 2022), pp. 15–24. Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-55630-2_2
@inproceedings{blaschke2024navigable, author = {Blaschke, Verena and Daneyko, Thora and Kaparina, Jekaterina and Gao, Zhuge and Dellert, Johannes}, year = {2024}, url = {https://link.springer.com/chapter/10.1007/978-3-031-55630-2_2}, doi = {10.1007/978-3-031-55630-2_2}, booktitle = {Inductive Logic Programming (ILP 2022)}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {15--24}, editor = {Muggleton, Stephen H. and Tamaddoni-Nezhad, Alireza}, }
CyberWallE at SemEval-2020 task 11: An analysis of feature engineering for ensemble models for propaganda detection

Verena Blaschke, Maxim Korniyenko & Sam Tureski

SemEval @ COLING 2020 · Abstract Cite PDF Poster Code

This paper describes our participation in the SemEval-2020 task Detection of Propaganda Techniques in News Articles. We participate in both subtasks: Span Identification (SI) and Technique Classification (TC). We use a bi-LSTM architecture in the SI subtask and train a complex ensemble model for the TC subtask. Our architectures are built using embeddings from BERT in combination with additional lexical features and extensive label post-processing. Our systems achieve a rank of 8 out of 35 teams in the SI subtask (F1-score: 43.86%) and 8 out of 31 teams in the TC subtask (F1-score: 57.37%).
Verena Blaschke, Maxim Korniyenko, and Sam Tureski (2020). “CyberWallE at SemEval-2020 task 11: An analysis of feature engineering for ensemble models for propaganda detection.” In Proceedings of the Fourteenth Workshop on Semantic Evaluation (SemEval 2020), pp. 1469–1480. Barcelona (online). International Committee for Computational Linguistics. DOI: 10.18653/v1/2020.semeval-1.192
@inproceedings{blaschke-etal-2020-cyberwalle, author = {Blaschke, Verena and Korniyenko, Maxim and Tureski, Sam}, booktitle = {Proceedings of the Fourteenth Workshop on Semantic Evaluation (SemEval 2020)}, year = {2020}, address = {Barcelona (online)}, publisher = {International Committee for Computational Linguistics}, url = {https://aclanthology.org/2020.semeval-1.192}, doi = {10.18653/v1/2020.semeval-1.192}, pages = {1469--1480}, }
Tübingen-Oslo Team at the VarDial 2018 evaluation campaign: An analysis of n-gram features in language variety identification

Çağrı Çöltekin, Taraka Rama & Verena Blaschke

VarDial @ COLING 2018 · Abstract Cite PDF

This paper describes our systems for the VarDial 2018 evaluation campaign. We participated in all language identification tasks, namely, Arabic dialect identification (ADI), German dialect identification (GDI), discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). In all of the tasks, we only used textual transcripts (not using audio features for ADI). We submitted system runs based on support vector machine classifiers (SVMs) with bag of character and word n-grams as features, and gated bidirectional recurrent neural networks (RNNs) using units of characters and words. Our SVM models outperformed our RNN models in all tasks, obtaining the first place on the DFS task, third place on the ADI task, and second place on others according to the official rankings. As well as describing the models we used in the shared task participation, we present an analysis of the n-gram features used by the SVM models in each task, and also report additional results (that were run after the official competition deadline) on the GDI surprise dialect track.
Çağrı Çöltekin, Taraka Rama, and Verena Blaschke (2018). “Tübingen-Oslo Team at the VarDial 2018 evaluation campaign: An analysis of n-gram features in language variety identification.” In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 55–65. Santa Fe, New Mexico, USA. Association for Computational Linguistics.
@inproceedings{coltekin-etal-2018-tubingen, title = {{T{\"u}bingen}-{Oslo} Team at the {VarDial} 2018 evaluation campaign: An analysis of n-gram features in language variety identification}, author = {{\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i} and Rama, Taraka and Blaschke, Verena}, booktitle = {Proceedings of the Fifth Workshop on {NLP} for Similar Languages, Varieties and Dialects ({V}ar{D}ial 2018)}, year = {2018}, address = {Santa Fe, New Mexico, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W18-3906}, pages = {55--65}, }

Theses

Explainable machine learning in linguistics and applied NLP:
Two case studies of Norwegian dialectometry and sexism detection in French tweets

Master’s thesis, 2021, supervised by Çağrı Çöltekin & John Nerbonne · Abstract PDF Code

This thesis presents an exploration of explainable machine learning in the context of a traditional linguistic area (dialect classification) and an applied task (sexism detection). In both tasks, the input features deemed especially relevant for the classification form meaningful groups that fit in with previous research on the topic, although not all such features are easy to understand or provide plausible explanations. In the case of dialect classification, some important features show that the model also learned patterns that are not typically presented by dialectologists. For both case studies, I use LIME [1] to rank features by their importance for the classification. For the sexism detection task, I additionally examine attention weights, which produce feature rankings that are in many cases similar to the LIME results but that are overall worse at showcasing tokens that are especially characteristic of sexist tweets.

[1] M. T. Ribeiro, S. Singh, and C. Guestrin (2016). “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, pp. 1135–1144. Association for Computing Machinery.
Clustering dialect varieties based on historical sound correspondences

Bachelor’s thesis, 2018, supervised by Çağrı Çöltekin. Finalist for the GSCL Award for the best Bachelor’s thesis in computational linguistics 2017–2019 · Abstract PDF Summary Slides Code

While information on historical sound shifts plays an important role for examining the relationships between related language varieties, it has rarely been used for computational dialectology. This thesis explores the performance of two algorithms for clustering language varieties based on sound correspondences between Proto-Germanic and modern continental West Germanic dialects. Our experiments suggest that the results of agglomerative clustering match common dialect groupings more closely than the results of (divisive) bipartite spectral graph co-clustering. We also observe that adding phonetic context information to the sound correspondences yields clusters that are more frequently associated with representative and distinctive sound correspondences).

Publications & preprints

Standard-to-dialect transfer trends differ across text and speech: A case study on intent and topic classification in German dialects

Make every letter count: Building dialect variation dictionaries from monolingual corpora

DistaLs: A comprehensive collection of language distance measures

A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation

Analyzing the effect of linguistic similarity on cross-lingual transfer: Tasks and experimental setups matter

Methods and resources in Germanic variationist linguistics

Evaluating pixel language models on non-standardized languages

Cross-dialect information retrieval: Information access in low-resource and high-variance languages

Improving dialectal slot and intent detection with auxiliary tasks: A multi-dialectal Bavarian case study

Add noise, tasks, or layers?MaiNLP at the VarDial 2025 shared task on Norwegian dialectal slot and intent detection

What do dialect speakers want?A survey of attitudes towards language technology for German dialects

MaiBaam: A multi-dialectal Bavarian Universal Dependency treebank

Sebastian, Basti, Wastl?!Recognizing named entities in Bavarian dialectal data

Exploring the robustness of task-oriented dialogue systems for colloquial German varieties

A survey of corpora for Germanic low-resource languages and dialects

Does manipulating tokenization aid cross-lingual transfer?A study on POS tagging for non-standardized languages

Navigable atom-rule interactions in PSL models enhanced by rule verbalizations, with an application to etymological inference

CyberWallE at SemEval-2020 task 11: An analysis of feature engineering for ensemble models for propaganda detection

Tübingen-Oslo Team at the VarDial 2018 evaluation campaign: An analysis of n-gram features in language variety identification

Theses

Explainable machine learning in linguistics and applied NLP:Two case studies of Norwegian dialectometry and sexism detection in French tweets

Clustering dialect varieties based on historical sound correspondences

Standard-to-dialect transfer trends differ across text and speech:
A case study on intent and topic classification in German dialects

Make every letter count:
Building dialect variation dictionaries from monolingual corpora

Analyzing the effect of linguistic similarity on cross-lingual transfer:
Tasks and experimental setups matter

Cross-dialect information retrieval:
Information access in low-resource and high-variance languages

Improving dialectal slot and intent detection with auxiliary tasks:
A multi-dialectal Bavarian case study

Add noise, tasks, or layers?
MaiNLP at the VarDial 2025 shared task on Norwegian dialectal slot and intent detection

What do dialect speakers want?
A survey of attitudes towards language technology for German dialects

Sebastian, Basti, Wastl?!
Recognizing named entities in Bavarian dialectal data

Does manipulating tokenization aid cross-lingual transfer?
A study on POS tagging for non-standardized languages

Explainable machine learning in linguistics and applied NLP:
Two case studies of Norwegian dialectometry and sexism detection in French tweets