Seminars from previous years are still being added, the archive is still available on the old website.
Abstract:
In this talk, we will discuss the behavior of YouTube's recommendation algorithm on competing narratives to identify potential biases. For our analysis, we collected recommended videos across five recommendation depths from seed videos related to the competing narratives. We used drift analysis to examine the evolution of various video characteristics, such as emotion and content in each recommendation depth, as a function of the decision-making process of the model. We also developed a methodology to determine 'highly-influential' video(s) responsible for driving the recommendation in recommendation depths. By leveraging the knowledge of how videos evolve in characteristics across recommendation depths, we were able to identify narrative-dependent biases in YouTube's recommendation algorithm as a function of content. The contributions of this analysis will add a layer of understanding to the 'black-box' nature of the YouTube recommendation algorithm. This study will also be applicable in judging the fairness of recommender systems, understanding patterns of model recommendation, information diffusion, echo-chamber formation, and other significant problems.
Profile - Nitin Agarwal, (Ph.D.):
Dr. Nitin Agarwal's research aims to push the boundaries of our understanding of digital and cyber social behaviors that emerge and evolve constantly in modern information and communication platforms. At COSMOS, he leads projects with a combined funding of over $25 million from an array of U.S. federal agencies, including the Department of Defense, DARPA, Department of State, and National Science Foundation. He plays a significant role in the long-term partnership between UA Little Rock and the Department of Homeland Security. He developed publicly available social media analysis tools (Blogtracker and VTracker), assisting NATO Strategic Communications and Public Affairs, European Defense agencies, Australian Defense Science and Technology agency's strategic policy group, Singapore government, Arkansas Attorney General's office, among others. Dr. Agarwal participates in the National Tech Innovation Hub launched by the U.S. Department of State to defeat foreign-based propaganda.
Dr. Agarwal's research contributions lie at the intersection of social computing, behavior-cultural modeling, collective action, social cyber forensics, Artificial Intelligence, data mining, machine learning, smart health, and privacy. From Saudi Arabian women's right to drive cyber campaigns to Autism awareness campaigns to ISIS' and anti-West/anti-NATO disinformation campaigns, at COSMOS, he is directing several projects that have made foundational and applicational contributions to social and computational sciences, particularly in understanding coordinated cyber campaigns. He has published 11 books and over 300 articles in top-tier peer-reviewed forums, including NATO's Defense StratCom Journal, Army University Press, CANSOFCOM's Future Conflict journal, and Baltic Security, among others, with several best paper awards and nominations. His most recent book explores deviant behaviors on the Internet and is published by Springer in their series on cybersecurity. Local, national, and international media, including Bloomberg, US News, KUAR, Arkansas Business, Arkansas Times, Arkansas Democrat-Gazette, and many others, have covered his work. Over the last several years, Dr. Agarwal has spoken at various public and professional, national and international forums such as NATO's StratCom COE (Riga, Latvia), DARPA, US Department of State, US Naval Space and Warfare (SPAWAR), US Pentagon's Strategic Multilevel Assessment groups, US National Academies of Sciences Engineering and Medicine, US Office of the Director of National Intelligence, Facebook Asia Pacific HQ, Twitter Asia Pacific HQ, US Embassy in Singapore, Singapore Ministry of Communication and Information, NATO Senior Leadership meetings, USIP, among others. He serves as technical advisor to Little Rock-based firms, including through the FinTech Accelerator.
Dr. Agarwal obtained Ph.D. from Arizona State University with outstanding dissertation recognition in 2009. He was recognized as one of 'The New Influentials: 20 In Their 20s' by Arkansas Business in 2012. He was recognized with the University-wide Faculty Excellence Award in Research and Creative Endeavors by UALR in 2015 and 2021. Dr. Agarwal received the Social Media Educator of the Year Award at the 21st International Education and Technology Conference in 2015. In 2017 the Arkansas Times featured Dr. Agarwal in their special issue on "Visionary Arkansans". Dr. Agarwal was nominated as International Academy, Research and Industry Association (IARIA) Fellow in 2017, an Arkansas Academy of Computing (AAoC) Fellow in 2018, and an Arkansas Research Alliance (ARA) Fellow in 2018. In 2021, his research was recognized as one of the top 10 solutions for "Countering Cognitive Warfare: The invisible Threat" by NATO's Innovation Hub out of 132 teams from the 30 NATO member nations. In 2022, his COVID-19 Misinformation tracker was recognized by the World Health Organization (WHO) as one of the key technological innovations globally to address the COVID-19 pandemic. IEEE, the world's premier electrical and electronic engineering professional organization, recognized Dr. Agarwal as a senior member in 2022. He can be reached at nxagarwal@ualr.edu.
For queries or meeting link, contact Dr. Ignatius Ezeani (i.ezeani@lancaster.ac.uk)
Abstract
In March 2022, the number of North Korean Defectors (NKDs) residing in South Korea reached 33,882 according to the Ministry of Unification. Despite a surge in the number of defectors over two decades, socio-economic challenges and prejudice against those who crossed the border continues to intensify. In this presentation, I provide a corpus-based analysis of public discourse regarding NKDs. The analysis examines how media function to formulate the identity of NKDs and stereotypes/prejudices through linguistic representation. Noting how power is exercised through language in social and political structure, my study presents an analysis on how the South Korean and Western news media identify, categorize, and represent NKDs and explores the dynamics of language, identity, and power in public discourse. The analysis of public discourse must be interrogated from broad realms of social, historical, and political contexts. In conjunction with the long-term research project on the comprehensive discourse analysis of NKDs, the current work focuses on four major broadsheet newspapers that have distinct political stances and investigates interactive discourse features that contribute to representations of NKDs in the South Korean community. Additionally, I examine how the same topics have been represented in the Western media including New York Times and The Guardian and The Times London. Both qualitative analysis and quantitative tools explicate how language is used to substantiate stereotypes and bias by media, resulting in a credible analysis. The outcomes are expected to reveal empirical issues and challenges not only for current South Korean society and its inclusion of NKDs but also for a reunified Korea in the future.
Bio Profile
Dr. Sun-Hee Lee is a Professor of Korean in the Department of East Asian Languages and Cultures at Wellesley College in the US. She earned her doctoral degrees from the Linguistics Department at The Ohio State University and from the Korean Language and Literature Department at Yonsei University. Dr. Lee's research areas include corpus linguistics, learner corpora, and discourse analysis. She has published several books and articles on Korean grammatical constructions, corpus analysis, and learner language. Her recent research interest is in a corpus-based analysis of media, gender, and personal narratives in addition to learner corpus research.
For queries or meeting link, contact Dr. Ignatius Ezeani (i.ezeani@lancaster.ac.uk)
Systems that support users in the automatic creation of visualizations must address several subtasks - understand the semantics of data, enumerate relevant visualization goals, and generate visualization specifications. In this work, we pose visualization generation as a multi-stage generation problem and argue that well-orchestrated pipelines based on large language models (LLMs) and image generation models (IGMs) are suitable for addressing these tasks. This talk presents LIDA, a novel tool for generating grammar-agnostic visualizations and infographics. LIDA comprises of 4 modules - A SUMMARIZER that converts data into a rich but compact natural language summary, a GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that generates, refines, executes, and filters visualization code, and an INFOGRAPHER module that yields data-faithful stylized graphics using IGMs. LIDA provides a Python API, and a hybrid user interface (direct manipulation and multilingual natural language) for interactive charts, infographics, and data story generation.
Overall, the talk will cover:
Project Page: https://microsoft.github.io/lida/
Victor Dibia is a Principal Research Software Engineer at the Human-AI eXperiences (HAX) team, Microsoft Research, where he focuses on Generative AI. His research interests span human-computer interaction, computational social science, and applied machine learning. Victor's work has been published at conferences such as ACL, EMNLP, AAAI, and CHI, earning multiple best paper awards and garnering attention from media outlets like the Wall Street Journal and VentureBeat. He is an IEEE Senior member, a Google Certified Professional in Data Engineering and Cloud Architect, and a Google Developer Expert in Machine Learning. Victor holds a Ph.D. in Information Systems from the City University of Hong Kong and a Masters in Information Networking from Carnegie Mellon University.
For queries or meeting link, contact Dr. Ignatius Ezeani (i.ezeani@lancaster.ac.uk)
For queries or meeting link, contact Dr. Ignatius Ezeani (i.ezeani@lancaster.ac.uk)
Abstract
A narrative is a conceptual basis of collective human understanding. Humans use stories to represent characters' intentions, feelings and the attributes of objects and events. A widely-held thesis in psychology to justify the centrality of narrative in human life is that humans make sense of reality by structuring events into narratives. Therefore, narratives are central to human activity in cultural, scientific, and social areas. Story maps are computer science realizations of narratives based on maps. They are online interactive maps enriched with text, pictures, videos, and other multimedia information, whose aim is to tell a story over a territory. This talk presents a semi-automatic workflow that, using a CRM-based ontology and the Semantic Web technologies, produces semantic narratives in the form of story maps (and timelines as an alternative representation) from textual documents. An expert user first assembles one territory-contextual document containing text and images. Then, automatic processes use natural language processing and Wikidata services to (i) extract entities and geospatial points of interest associated with the territory, (ii) assemble a logically-ordered sequence of events that constitute the narrative, enriched with entities and images, and (iii) openly publish online semantic story maps and an interoperable Linked Open Data-compliant knowledge base for event exploration and inter-story correlation analyses. Once the story maps are published, the users can review them through a user-friendly web tool. Overall, our workflow complies with Open Science directives of open publication and multi-discipline support and is appropriate to convey "information going beyond the map" to scientists and the large public. As demonstrations, the talk will show workflow-produced story maps to represent (i) 23 European rural areas across 16 countries, their value chains and territories, (ii) a Medieval journey, (iii) the history of the legends, biological investigations, and AI-based modelling for habitat discovery of the giant squid Architeuthis dux.
Profile
Valentina Bartalesi Lenzi is a researcher at the Institute of Information Science and Technologies (ISTI) of The National Research Council of Italy (CNR) and external professor of Semantic Web in the Computer Science master's degree course at the University of Pisa. She earned her PhD in Information Engineering from the University of Pisa and graduated in Digital Humanities from the University of Pisa. Her research fields mainly concern Knowledge Representation, Semantic Web technologies, and the development of formal ontologies for representing textual content and narratives. She has participated in several European and National research projects, including CRAEFT, MOVING, MINGEI, PARTHENOS, E-RIHS PP, IMAGO, and DanteSources. She is the author of over 50 peer-reviewed articles in national and international conferences and scientific journals.
For queries or meeting link, contact Dr. Ignatius Ezeani (i.ezeani@lancaster.ac.uk)
Contrastive training still underlies many technologies within the realm of machine learning. It has shown much promise in multimodal activations and logical abilities. However, replication remains an ongoing challenge in academic and low-resource communities. This talk showcases an exploration of using different data shapes to train models with multiple input streams. There are myriad applications in supervised training, low-resource language, cross-modal training, and machine translation tasks where annotations are almost none existent.
For queries, contact Ignatius Ezeani (i.ezeani@lancaster.ac.uk)
Patient forums are forums centered around patient communities. Previous qualitative work has shown that patients gather on patient forums to exchange information and experiences, and support each other emotionally. Patient forums can also be a source for medical hypotheses, e.g. on the effectiveness of medication and side effects. This specifically benefits patients with a rare disease for which clinical trials are often too costly. In the Ph.D. project of Anne Dirkson, we have developed text mining techniques to process and extract information from the large volume of messages on a patient forum. Specifically, we have focussed on extracting the side effects of medications, and the coping strategies of patients who suffer from these side effects. The extraction and aggregation of this information are more challenging than extracting regular named entities (like names and locations) because side effects and coping strategies are not proper nouns; they can be described descriptively with a large variation. For example, one could describe their headache with 'my head is bursting', 'throbbing pain in my head', or 'pounding headache' to name a few. In my presentation, I will explain the challenges of knowledge discovery from patient forum data, the methods that we developed, and the results that we obtained. I will also show how the extracted information relates to results from questionnaire data among patients.
Short bio:Suzan Verberne is an associate professor at the Leiden Institute of Advanced Computer Science at Leiden University. She is the group leader of Text Mining and Retrieval. She obtained her Ph.D. in 2010 on the topic of Question Answering and has since then been working on the edge between Natural Language Processing (NLP) and Information Retrieval (IR). She has supervised projects involving a large number of application domains: from social media to law and from archaeology to health. Her research focus is to advance NLP "beyond the benchmark", addressing challenging problems in specific domains. She is highly active in the NLP and IR communities, holding chairing positions in large worldwide conferences. See link to bio profile
For queries, contact Chloe: c.humphreys@lancaster.ac.uk or Ignatius: i.ezeani@lancaster.ac.uk
This talk describes the analysis of ways in which pain is described by people experiencing a particular health condition, trigeminal neuralgia (TN), in comparison to people experiencing a wider range of painful conditions. The research was prompted by a request from a healthcare professional with a view to gaining a more nuanced understanding of the ways people voluntarily describe pain relating to TN and pain relating to more generic musculoskeletal conditions, to assist in clinical practice and patient communication. Using a range of corpus linguistic techniques, the use of different terms to describe and evaluate pain are explored in two corpora of online forum contributions, with particular focus on the pain descriptors which feature in the short version of the McGill Pain Questionnaire (a widely-used instrument in healthcare settings in the diagnosis and treatment of pain).
Style-shifting has been the focus of language variation and change in sociolinguistics since
1960s. As sociolinguistic styles are sensitive to social change (Ure, 1982), it is not surprising
that they have become a focus of social psychologists who seek to assess social identities
through linguistic styles. ASIA (Automated Social Identity Assessment toolkit) (Koschate et
al., 2021), a toolkit which leverages machine learning and natural language processing to
automatically assess which identity is situationally salient through sociolinguistic styles, has
been proven to be successful in assessing feminist and parent identity in Reddit and Mumsnet
online communities. Cork et al (2022) has applied ASIA to assess entrepreneur and libertarian
identities. With an interest on the recent rise in online influence of hybrid communities which
are characterised by ideological mutations, this study investigates the dynamic nature and
influence of hybrid eco-fascist identities. It trains and validates an ASIA model to
automatically assess which identity (eco or fascist) is situationally salient. This allows us to
examine the dynamic interplay of these identities over time, and the role that linguistic style
plays in the expression of the ecological and the fascist identities in eco-fascist movements. To
train the model, the study used Reddit data form environmental and far-right forums that were
publicly available for the period 2016-2020. Once trained, ASIA was applied to public data
from Reddit eco-fascist forums. Topic modelling and corpus linguistics analysis are then
adopted to validate the results produced by the ASIA model. The results demonstrate that 1)
social linguistics styles can indeed be used to detect and assess hybrid identities, 2)
interdisciplinary research on hybrid identity assessment provides new methodological and
theoretical insights to social psychology, sociolinguistics, and computational linguistics.
Keyword analysis is central to corpus-assisted discourse studies (CADS), as a means of comparing two corpora on a high level. It is typically used to identify starting points for a more detailed analysis.
Usually, keywords are grouped into thematic categories, which are seen as pointers to central topics of the discourse at hand.
However, there is no best practice as to how these categories are formed, and this question has so far received little attention.
In this talk, two different approaches to keyword categorisation in CADS are compared on the keywords of two actors known to spread conspiracies and misinformation on German Telegram channels.
The first strategy examined is the classic approach of topic-based categories, where the categories formed by two independent researchers are compared to explore how individual experts might differ in what central topics are identified.
The second strategy places more focus on linguisic form by annotating surface-level semantic and grammatical features rather than discourse dependent topics.
Overall, the study hopes to open up the discussion with regards to shifting the methodological discussion to the role of the researcher and of linguistic versus thematic categories.
Dementia is an umbrella term for the loss of cognitive and memory abilities caused by a wide variety of neurological conditions. It has been discovered that both the content of an individual's discourse and the acoustics of their produced speech can be automatically analysed to help detect dementia and other neurological conditions. Whilst the cutting edge demonstrates effective diagnostic capabilities on L1 (native) speakers of English, this talk will explore ongoing research assessing the efficacy and exploring solutions for L2+ (non-native) English performance. This research treats a dementia classification pipeline as a modular system containing an automatic speech recognition (ASR) component to extract transcribed language; and then the challenge of classifying using features extracted from the acoustic signal and transcribed output. Limitations of ASR across a wide range of L2+ backgrounds will be explored challenging existing beliefs about the competency of state-of-the-art cloud-based ASR APIs on non-native speech and critically assessing the limitations of word error rate (WER) as the ubiquitous metric for ASR evaluation. My talk will then explore ongoing research into the features of dementia, potential issues in the generalisability of sparse dementia corpora, and early work looking at the impact of features of non-native speech.
This talk presents an innovative online resource for sharing and accessing forensic linguistics data, the Forensic Linguistic Databank (FoLD - https://fold.aston.ac.uk), developed in the Aston Institute for Forensic Linguistics (AIFL) at Aston University, Birmingham. FoLD is a permanent, controlled access online repository for forensic linguistic data, including malicious communication data, investigative interview data, hate speech, and legal language.
Since access to relevant forensic linguistic data has been notoriously challenging since the conception of the discipline in the 1960s, FoLD represents the first attempt to provide researchers with the opportunity of sharing datasets of different levels of sensitivity and ethical concern.
In this talk we present the FoLD repository, how to donate data, and how to access already existing datasets from the website.
We further showcase a project carried out by researchers in the FoLD research centre at AIFL using data from FoLD.
This talk is a cross-over with FORGE, who provide seminars on forensic linguistics
Discourse studies as a broad field has demonstrated openness to incorporating mixed methodologies and perspectives to provide a range of insights into complex phenomena. This paper seeks to propose a new framework which brings together the diverse traditions of Discourse Theory (DT), Critical Discourse Studies (CDS) and Corpus Linguistics (CL). While there are some excellent examples of work combining two of these approaches, particularly CDS and CL (e.g., Subtirelu and Baker, 2018; Baker, 2012), and a growing discussion around the potential compatibility of DT and CDS (Brown, 2020; De Cleen et al., 2021), or DT and CL (Wilkinson, 2022; Nikisianis et al., 2019), there have been very few attempts to bring them all together into a coherent research programme. The aim here then, expanding on recent studies conducted using this framework (Brown and Mondon, 2020; Brown, Mondon and Winter, 2021), is to develop a detailed account of how this combination can be achieved and what benefits it brings to the field of discourse studies. To demonstrate the way this can be implemented in textual analysis, examples are drawn from a study of far-right Brexit discourse and the process of mainstreaming.
This talk describes the collection and analysis of the most recent edition of the Brown family, the BE21 corpus, consisting of 1 million words of written British English texts, published in 2021. Using measures of the Coefficient of Variance, the frequencies of part-of-speech tags in BE21 are compared against the other four British members of the Brown family (from 1931, 1961, 1991 and 2006). Part-of-speech tags that are steadily increasing or decreasing in all five or the latest three corpora are examined via concordance lines and their distributions in order to identify new and emerging trends in British English. The analysis points to the continuation of some trends (such as declines in modal verbs and titles of address), along with newer trends like the rise of first person pronouns. The analysis indicates that more general trends of densification, democratisation and colloquialisation are continuing in British English.