Seminars from previous years are still being added, the archive is still available on the old website.
This talk describes the analysis of ways in which pain is described by people experiencing a particular health condition, trigeminal neuralgia (TN), in comparison to people experiencing a wider range of painful conditions. The research was prompted by a request from a healthcare professional with a view to gaining a more nuanced understanding of the ways people voluntarily describe pain relating to TN and pain relating to more generic musculoskeletal conditions, to assist in clinical practice and patient communication. Using a range of corpus linguistic techniques, the use of different terms to describe and evaluate pain are explored in two corpora of online forum contributions, with particular focus on the pain descriptors which feature in the short version of the McGill Pain Questionnaire (a widely-used instrument in healthcare settings in the diagnosis and treatment of pain).
Style-shifting has been the focus of language variation and change in sociolinguistics since
1960s. As sociolinguistic styles are sensitive to social change (Ure, 1982), it is not surprising
that they have become a focus of social psychologists who seek to assess social identities
through linguistic styles. ASIA (Automated Social Identity Assessment toolkit) (Koschate et
al., 2021), a toolkit which leverages machine learning and natural language processing to
automatically assess which identity is situationally salient through sociolinguistic styles, has
been proven to be successful in assessing feminist and parent identity in Reddit and Mumsnet
online communities. Cork et al (2022) has applied ASIA to assess entrepreneur and libertarian
identities. With an interest on the recent rise in online influence of hybrid communities which
are characterised by ideological mutations, this study investigates the dynamic nature and
influence of hybrid eco-fascist identities. It trains and validates an ASIA model to
automatically assess which identity (eco or fascist) is situationally salient. This allows us to
examine the dynamic interplay of these identities over time, and the role that linguistic style
plays in the expression of the ecological and the fascist identities in eco-fascist movements. To
train the model, the study used Reddit data form environmental and far-right forums that were
publicly available for the period 2016-2020. Once trained, ASIA was applied to public data
from Reddit eco-fascist forums. Topic modelling and corpus linguistics analysis are then
adopted to validate the results produced by the ASIA model. The results demonstrate that 1)
social linguistics styles can indeed be used to detect and assess hybrid identities, 2)
interdisciplinary research on hybrid identity assessment provides new methodological and
theoretical insights to social psychology, sociolinguistics, and computational linguistics.
Keyword analysis is central to corpus-assisted discourse studies (CADS), as a means of comparing two corpora on a high level. It is typically used to identify starting points for a more detailed analysis.
Usually, keywords are grouped into thematic categories, which are seen as pointers to central topics of the discourse at hand.
However, there is no best practice as to how these categories are formed, and this question has so far received little attention.
In this talk, two different approaches to keyword categorisation in CADS are compared on the keywords of two actors known to spread conspiracies and misinformation on German Telegram channels.
The first strategy examined is the classic approach of topic-based categories, where the categories formed by two independent researchers are compared to explore how individual experts might differ in what central topics are identified.
The second strategy places more focus on linguisic form by annotating surface-level semantic and grammatical features rather than discourse dependent topics.
Overall, the study hopes to open up the discussion with regards to shifting the methodological discussion to the role of the researcher and of linguistic versus thematic categories.
Dementia is an umbrella term for the loss of cognitive and memory abilities caused by a wide variety of neurological conditions. It has been discovered that both the content of an individual's discourse and the acoustics of their produced speech can be automatically analysed to help detect dementia and other neurological conditions. Whilst the cutting edge demonstrates effective diagnostic capabilities on L1 (native) speakers of English, this talk will explore ongoing research assessing the efficacy and exploring solutions for L2+ (non-native) English performance. This research treats a dementia classification pipeline as a modular system containing an automatic speech recognition (ASR) component to extract transcribed language; and then the challenge of classifying using features extracted from the acoustic signal and transcribed output. Limitations of ASR across a wide range of L2+ backgrounds will be explored challenging existing beliefs about the competency of state-of-the-art cloud-based ASR APIs on non-native speech and critically assessing the limitations of word error rate (WER) as the ubiquitous metric for ASR evaluation. My talk will then explore ongoing research into the features of dementia, potential issues in the generalisability of sparse dementia corpora, and early work looking at the impact of features of non-native speech.
This talk presents an innovative online resource for sharing and accessing forensic linguistics data, the Forensic Linguistic Databank (FoLD - https://fold.aston.ac.uk), developed in the Aston Institute for Forensic Linguistics (AIFL) at Aston University, Birmingham. FoLD is a permanent, controlled access online repository for forensic linguistic data, including malicious communication data, investigative interview data, hate speech, and legal language.
Since access to relevant forensic linguistic data has been notoriously challenging since the conception of the discipline in the 1960s, FoLD represents the first attempt to provide researchers with the opportunity of sharing datasets of different levels of sensitivity and ethical concern.
In this talk we present the FoLD repository, how to donate data, and how to access already existing datasets from the website.
We further showcase a project carried out by researchers in the FoLD research centre at AIFL using data from FoLD.
This talk is a cross-over with FORGE, who provide seminars on forensic linguistics
Discourse studies as a broad field has demonstrated openness to incorporating mixed methodologies and perspectives to provide a range of insights into complex phenomena. This paper seeks to propose a new framework which brings together the diverse traditions of Discourse Theory (DT), Critical Discourse Studies (CDS) and Corpus Linguistics (CL). While there are some excellent examples of work combining two of these approaches, particularly CDS and CL (e.g., Subtirelu and Baker, 2018; Baker, 2012), and a growing discussion around the potential compatibility of DT and CDS (Brown, 2020; De Cleen et al., 2021), or DT and CL (Wilkinson, 2022; Nikisianis et al., 2019), there have been very few attempts to bring them all together into a coherent research programme. The aim here then, expanding on recent studies conducted using this framework (Brown and Mondon, 2020; Brown, Mondon and Winter, 2021), is to develop a detailed account of how this combination can be achieved and what benefits it brings to the field of discourse studies. To demonstrate the way this can be implemented in textual analysis, examples are drawn from a study of far-right Brexit discourse and the process of mainstreaming.
This talk describes the collection and analysis of the most recent edition of the Brown family, the BE21 corpus, consisting of 1 million words of written British English texts, published in 2021. Using measures of the Coefficient of Variance, the frequencies of part-of-speech tags in BE21 are compared against the other four British members of the Brown family (from 1931, 1961, 1991 and 2006). Part-of-speech tags that are steadily increasing or decreasing in all five or the latest three corpora are examined via concordance lines and their distributions in order to identify new and emerging trends in British English. The analysis points to the continuation of some trends (such as declines in modal verbs and titles of address), along with newer trends like the rise of first person pronouns. The analysis indicates that more general trends of densification, democratisation and colloquialisation are continuing in British English.