Publications · talks · theses & other


  1. A survey of corpora for Germanic low-resource languages and dialects
    Verena Blaschke, Hinrich Schütze & Barbara Plank
    NoDaLiDa 2023 | Abstract Cite Website PDF Slides
  2. Does manipulating tokenization aid cross-lingual transfer?
    A study on POS tagging for non-standardized languages
    Verena Blaschke, Hinrich Schütze & Barbara Plank
    VarDial @ EACL 2023 | Abstract Cite PDF Slides Video Poster Code
  3. [Forthcoming] Navigable atom-rule interactions in PSL models enhanced by rule verbalizations, with an application to etymological inference
    Verena Blaschke, Thora Daneyko, Jekaterina Kaparina, Zhuge Gao & Johannes Dellert
    ILP 2022 | Abstract Slides Poster Code (PSL-Infrastructure) Code (PSL-RAGviewer)
  4. CyberWallE at SemEval-2020 task 11:
    An analysis of feature engineering for ensemble models for propaganda detection
    Verena Blaschke, Maxim Korniyenko & Sam Tureski
    SemEval @ COLING 2020 | Abstract Cite PDF Poster Code
  5. TĂĽbingen-Oslo Team at the VarDial 2018 evaluation campaign:
    An analysis of n-gram features in language variety identification
    Çağrı Çöltekin, Taraka Rama & Verena Blaschke
    VarDial @ COLING 2018 | Abstract Cite PDF


(Excluding paper presentations, which are linked in the Publications section where applicable.)

  1. [Accepted] Configurable language-specific tokenization for CLDF databases
    Johannes Dellert & Verena Blaschke
    Exploiting standardized cross-linguistic data in historical linguistics @ ICHL 2023 | Abstract
  2. Correlating borrowing events across concepts to derive a data-driven source of evidence for loanword etymologies
    Verena Blaschke & Johannes Dellert
    Model and Evidence in Quantitative Comparative Linguistics @ DGfS 2021 | Abstract Slides Code
  3. Clustering dialect varieties based on historical sound correspondences
    Verena Blaschke
    GSCL Student Award nominee presentations @ KONVENS 2019 | Abstract Summary Slides Code


  1. Explainable machine learning in linguistics and applied NLP:
    Two case studies of Norwegian dialectometry and sexism detection in French tweets
    Master’s thesis, 2021, supervised by Çağrı Çöltekin & John Nerbonne | Abstract PDF Code
  2. Clustering dialect varieties based on historical sound correspondences
    Bachelor’s thesis, 2018, supervised by Çağrı Çöltekin. Finalist for the GSCL Award for the best Bachelor’s thesis in computational linguistics 2017–2019 | Abstract PDF Summary Slides Code


  1. LanguageStructure/TuLeD [TupĂ­an Language Database]:
    Pre-release (version 0.9)
    Fabrício Ferraz Gerardi, Stanislav Reichert, Tim Wientzek, Verena Blaschke, Eric Mattos, Zhuge Gao, Mihai Manolescu & Nianheng Wu
    2020 | DOI