Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:


Week 9

Thursday 8th December 2022



Assessing Hybrid Identities in Online Extremist Communities through sociolinguistic styles

Shengnan Liu

Psychology, Lancaster University

  • Abstract

Style-shifting has been the focus of language variation and change in sociolinguistics since

1960s. As sociolinguistic styles are sensitive to social change (Ure, 1982), it is not surprising

that they have become a focus of social psychologists who seek to assess social identities

through linguistic styles. ASIA (Automated Social Identity Assessment toolkit) (Koschate et

al., 2021), a toolkit which leverages machine learning and natural language processing to

automatically assess which identity is situationally salient through sociolinguistic styles, has

been proven to be successful in assessing feminist and parent identity in Reddit and Mumsnet

online communities. Cork et al (2022) has applied ASIA to assess entrepreneur and libertarian

identities. With an interest on the recent rise in online influence of hybrid communities which

are characterised by ideological mutations, this study investigates the dynamic nature and

influence of hybrid eco-fascist identities. It trains and validates an ASIA model to

automatically assess which identity (eco or fascist) is situationally salient. This allows us to

examine the dynamic interplay of these identities over time, and the role that linguistic style

plays in the expression of the ecological and the fascist identities in eco-fascist movements. To

train the model, the study used Reddit data form environmental and far-right forums that were

publicly available for the period 2016-2020. Once trained, ASIA was applied to public data

from Reddit eco-fascist forums. Topic modelling and corpus linguistics analysis are then

adopted to validate the results produced by the ASIA model. The results demonstrate that 1)

social linguistics styles can indeed be used to detect and assess hybrid identities, 2)

interdisciplinary research on hybrid identity assessment provides new methodological and

theoretical insights to social psychology, sociolinguistics, and computational linguistics.

Week 8

Thursday 1st December 2022


Microsoft Teams

Categorising keywords: a case study on German conspiracy discourse

Nathan Dykes

Friedrich-Alexander-Universität Erlangen-Nürnberg

  • Abstract

Keyword analysis is central to corpus-assisted discourse studies (CADS), as a means of comparing two corpora on a high level. It is typically used to identify starting points for a more detailed analysis.

Usually, keywords are grouped into thematic categories, which are seen as pointers to central topics of the discourse at hand.

However, there is no best practice as to how these categories are formed, and this question has so far received little attention.

In this talk, two different approaches to keyword categorisation in CADS are compared on the keywords of two actors known to spread conspiracies and misinformation on German Telegram channels.

The first strategy examined is the classic approach of topic-based categories, where the categories formed by two independent researchers are compared to explore how individual experts might differ in what central topics are identified.

The second strategy places more focus on linguisic form by annotating surface-level semantic and grammatical features rather than discourse dependent topics.

Overall, the study hopes to open up the discussion with regards to shifting the methodological discussion to the role of the researcher and of linguistic versus thematic categories.

Week 4

Thursday 3rd November 2022


Microsoft Teams - request a link via email

Speech Analytics for the Detection of Neurological Conditions in Global English

Sam Hollands

University of Sheffield

  • Abstract

Dementia is an umbrella term for the loss of cognitive and memory abilities caused by a wide variety of neurological conditions. It has been discovered that both the content of an individual's discourse and the acoustics of their produced speech can be automatically analysed to help detect dementia and other neurological conditions. Whilst the cutting edge demonstrates effective diagnostic capabilities on L1 (native) speakers of English, this talk will explore ongoing research assessing the efficacy and exploring solutions for L2+ (non-native) English performance. This research treats a dementia classification pipeline as a modular system containing an automatic speech recognition (ASR) component to extract transcribed language; and then the challenge of classifying using features extracted from the acoustic signal and transcribed output. Limitations of ASR across a wide range of L2+ backgrounds will be explored challenging existing beliefs about the competency of state-of-the-art cloud-based ASR APIs on non-native speech and critically assessing the limitations of word error rate (WER) as the ubiquitous metric for ASR evaluation. My talk will then explore ongoing research into the features of dementia, potential issues in the generalisability of sparse dementia corpora, and early work looking at the impact of features of non-native speech.

Week 4

Monday 31st October 2022


Microsoft Teams - request a link via email

FoLD: a permanent, controlled-access, online repository for forensic linguistic research

Tim Grant

School of Languages & Social Sciences, Aston University

  • Abstract

This talk presents an innovative online resource for sharing and accessing forensic linguistics data, the Forensic Linguistic Databank (FoLD -, developed in the Aston Institute for Forensic Linguistics (AIFL) at Aston University, Birmingham. FoLD is a permanent, controlled access online repository for forensic linguistic data, including malicious communication data, investigative interview data, hate speech, and legal language.

Since access to relevant forensic linguistic data has been notoriously challenging since the conception of the discipline in the 1960s, FoLD represents the first attempt to provide researchers with the opportunity of sharing datasets of different levels of sensitivity and ethical concern.

In this talk we present the FoLD repository, how to donate data, and how to access already existing datasets from the website.

We further showcase a project carried out by researchers in the FoLD research centre at AIFL using data from FoLD.

This talk is a cross-over with FORGE, who provide seminars on forensic linguistics

Week 3

Thursday 27th October 2022


Microsoft Teams - request a link via email

Towards a methodological tree: combining Discourse Theory, Critical Discourse Studies and Corpus Linguistics

Katy Brown

University of Bath

  • Abstract

Discourse studies as a broad field has demonstrated openness to incorporating mixed methodologies and perspectives to provide a range of insights into complex phenomena. This paper seeks to propose a new framework which brings together the diverse traditions of Discourse Theory (DT), Critical Discourse Studies (CDS) and Corpus Linguistics (CL). While there are some excellent examples of work combining two of these approaches, particularly CDS and CL (e.g., Subtirelu and Baker, 2018; Baker, 2012), and a growing discussion around the potential compatibility of DT and CDS (Brown, 2020; De Cleen et al., 2021), or DT and CL (Wilkinson, 2022; Nikisianis et al., 2019), there have been very few attempts to bring them all together into a coherent research programme. The aim here then, expanding on recent studies conducted using this framework (Brown and Mondon, 2020; Brown, Mondon and Winter, 2021), is to develop a detailed account of how this combination can be achieved and what benefits it brings to the field of discourse studies. To demonstrate the way this can be implemented in textual analysis, examples are drawn from a study of far-right Brexit discourse and the process of mainstreaming.

Week 1

Thursday 13th October 2022


Welcome Lecture LT1 / Teams link available via email

A year to remember? Introducing the BE21 corpus and exploring recent part of speech tag change in British English

Paul Baker

CASS, Lancaster University

  • Abstract

This talk describes the collection and analysis of the most recent edition of the Brown family, the BE21 corpus, consisting of 1 million words of written British English texts, published in 2021. Using measures of the Coefficient of Variance, the frequencies of part-of-speech tags in BE21 are compared against the other four British members of the Brown family (from 1931, 1961, 1991 and 2006). Part-of-speech tags that are steadily increasing or decreasing in all five or the latest three corpora are examined via concordance lines and their distributions in order to identify new and emerging trends in British English. The analysis points to the continuation of some trends (such as declines in modal verbs and titles of address), along with newer trends like the rise of first person pronouns. The analysis indicates that more general trends of densification, democratisation and colloquialisation are continuing in British English.