Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:

2020/2021

Week 28

Thursday 10th June 2021

1:00-2:00pm

Microsoft Teams - request a link via email

The Language of Risk-Taking in Bipolar Disorder

Daisy Harvey

Spectrum Centre for Mental Health Research Lancaster University

  • Abstract

The ever-increasing amount of language that is available online on social media sites and in Electronic Health Records (EHRs) provides a unique opportunity to understand more about mental health conditions. Applying Natural Language Processing (NLP) methods to this data allows us to study language empirically and provides an insight into typically hard to reach populations. This paper presents a PhD project which will investigate the lived experience of risk-taking for people living with bipolar disorder as presented in different contexts (on social media and in medical records), using a mixed methods approach encompassing qualitative and quantitative methods of analysis, and which aims to produce as the output of the research 1) a risk-taking lexicon for bipolar disorder 2) a corpus of risk-taking posts from bipolar subreddits and 3) linguistic analysis of the risk-taking behaviours extracted from free-text within anonymised EHRs. It is hoped that this research will shed light on how risk-taking behaviours manifest in bipolar disorder and whether there are differences in how these behaviours are described in online and offline settings.

Week 27

Thursday 3rd June 2021

1:00-2:00pm

Microsoft Teams - request a link via email

More than the sum of its parts: The textual and psychological reality of collocation networks

Hannah Schmueck

LAEL, Lancaster University

  • Abstract
TBC

Week 27

Thursday 3rd June 2021

1:00-2:00pm

Online

Traversing Language Structures: Creating, Exploring and Visualising Large Scale Linguistic Networks

Hanna Schmueck

LAEL, Lancaster University

  • Abstract

How does the language we are surrounded by differ from the language in our Mental Lexicon? We try to explore this question by developing new methods to display, analyse and compare large-scale linguistic networks using graph-theoretical parameters. The aim is to examine structural similarities and differences between a collocation network (based on the BNC2014 / BNC2014 Baby+(Brezina, 2019)) and a psycholinguistic network (based on cue-response pairs provided by over 90,000 participants for the SWOW-EN project (De Deyne et al., 2018)) in order to further our understanding of the relationship between language perception and language production. In addition to this, a new, dynamic visualisation of said networks furthermore allows for identifying "latent patterns" (Dong & Buckingham, 2018) in the data that would not have been observable when starting an analysis using pre-determined words of interest. In this presentation, the methodology and overarching justifications for the project will be presented alongside a demonstration of several custom network visualisations in Cytoscape (Shannon et al., 2003) and a case study exploring properties of the BNC2014 Baby+ and SWOW-EN

Week 26

Thursday 27th May 2021

1:00-2:00pm

Microsoft Teams

MasakhaNER: Named Entity Recognition for African Languages

David Ifeoluwa Adelani

Saarland University, Germany & Masakhane.

  • Abstract

We take a step towards addressing the under-representation of the African continent in NLP research, by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.

Link to Masakhane

Week 23

Thursday 6th May 2021

1:00-2:00pm

Microsoft Teams - request a link via email

Project Ziggurat and the future of Corpus Workbench and CQPweb

Andrew Hardie

CASS, Lancaster University

  • Abstract

The Open Corpus Workbench (CWB) software is a major component of the computational infrastructure of our field - including Lancaster's CASS research centre. It includes the powerful Corpus Query Processor (CQP), as well as the user-friendly CQPweb browser interface. The lead developers of CWB, Stefan Evert and Andrew Hardie, are presently working on a new wave of CWB development, to renew the power of the thirty-year-old system for the new century. In this (moderately informal) talk, Andrew Hardie will present a progress report on Project Ziggurat, a refreshed low level system for managing indexed corpus data, and on the current directions of work on CQPweb. There will be plenty of time for questions, discussion, and suggestions. The session will be of particular interest to users of CWB or CQPweb who would like to know a little more about how it works behind the scenes.

Week 20

Thursday 18th March 2021

1:00-3:00pm

Online: join mailing list or contact organisers to receive link

Visualise my Corpus

Mahmoud El-Haj

SCC, Lancaster University

  • Abstract

Are you a researcher (postgrad or staff) working with large data sets, critical discourse analysis, and/or use tools to analyse and interpret your data? Have you encountered limits of visualising your results?

Digital Humanities, Social Sciences, and Natural Language Processing (NLP) all use computational methods for corpus research but face similar problems when critically interpreting and subsequently visualising their empirical data.

In this presentation, I will start by taking you through some of the online tools and software that you can use to visualise your corpus.

The second part of the talk will be a step-by-step Python-3 tutorial for those who want an easy start to some common techniques in NLP, Text Analysis, Machine Learning, Topic Modelling, and Corpus Linguistics.

The Python tutorial will be made available on GitHub so you can practice those in your spare time.

Joint session with the SCC DSG group

Week 19

Thursday 11th March 2021

1:00-2:00pm

Online: join mailing list or contact organisers to receive link

Emotion Annotations: Understanding Annotators' Disagreements

Enrica Troiano

Institute for NLP, University of Stuttgart

  • Abstract

Analysing emotions in text consists in automatically understanding its emotional content. This includes a number of phenomena, from basic, discrete emotions to more fine-grained affective information, like intensity. Similar to most ML-based tasks, emotion analysis relies on manually annotated data, thus facing the problem of annotation subjectivity: it is particularly challenging to achieve substantial agreement on emotions.

In this presentation, I will address two annotation tasks, which face separate issues that lead to disagreements. In one setting, human judges infer emotion intensities, and in the other, they annotate specific emotion components (cognitive appraisal). I will show that annotations of intensity correlate both with the confidence of annotators and with their agreement. For cognitive appraisal annotations, I will discuss that reconstructing emotion components from descriptions of event is particularly challenging if annotators are not provided additional emotional information.

This is joint work with Jan Hofmann, Roman Klinger, and Sebastian Padó.

Week 18

Thursday 4th March 2021

1:00-2:00pm

Microsoft Teams - request a link via email

Corpus-based Contrastive Analysis and Reader engagement in academic writing: methodological and analytical perspectives

Niall Curry

Coventry University

  • Abstract

In this talk, I discuss an in-depth analysis of questions as reader engagement devices in economics research articles in English, French, and Spanish. Merging contrastive and corpus linguistic approaches, the study interrogates issues of comparability and establishes a base from which to draw meaningful comparisons between discourses within the global, multilingual academy. The corpus-based contrastive analysis approach is applied to the study of questions in the English and French economics subcorpora of KIAP (Fløttum et al. 2006), as well as a comparable Spanish subcorpus created for this study. Direct questions are identified through the use of a "?" and illocutionary force indicating devices are identified to extract indirect questions. In the analysis, each direct and indirect question that serves to allow the writer to interact with the reader is analysed in terms of the following equivalences: frequency, function, type and form, location, passivity, tense/aspect and verbal modality, and question sentence type. A second analysis is presented in terms of these same equivalences; however, the second analysis focuses on shared question function across languages. The findings of this study indicate key similarities and differences across languages and allow for engagement with wider conversations on academic language in the multilingual academy. In concluding this talk, the findings are considered in terms of their applicability to the teaching and learning of academic language as well as future directions in corpus-based contrastive linguistics.

References

Fløttum, K., Dahl, T., & Kinn, T. (2006). Academic voices: Across languages and disciplines (Vol. 148). John Benjamins Publishing.

Week 17

Thursday 25th February 2021

1:00-2:00pm

Online: join mailing list or contact organisers to receive link

Dismantling Online Dating Fraud

Matthew Edwards

University of Bristol

  • Abstract

Online romance scams are a prevalent form of mass-marketing fraud in the West. In this type of scam, fraudsters craft fake profiles and manually interact with their victims. Due to the characteristics of this type of fraud, and the peculiarities of how dating sites operate, traditional detection methods (e.g., those used in spam filtering) are ineffective.

This talk will report on our investigation into the archetype of online dating profiles used in this form of fraud, including their use of demographics, profile descriptions, and images, shedding light on both the strategies deployed by scammers to appeal to victims and the implicit traits of victims themselves. Our work is presented in the context of building and evaluating a machine-learning classifier for detecting spam profiles, and elaborates on our findings from investigating areas of under-performance.

joint talk with the SCC DSG group

Week 16

Thursday 18th February 2021

1:00-2:00pm

Online: join mailing list or contact organisers to receive link

"WE ARE SO ANGRY! #ibscandal": Covid-19 and the International Baccalaureate

Saira Fitzgerald

Visiting researcher at CASS

  • Abstract

In this talk, I will present work in progress and discuss some preliminary results of a study examining discourses in global press reports and Twitter relating to the International Baccalaureate (IB) final examination results. These reports and tweets appeared over a two-month period, July-September, 2020.

As a result of Covid-19 and worldwide school closures in 2020, the IB organization cancelled its high stakes diploma examination for the first time in its 52-year history and, in its place, devised an alternate form of assessment based on an algorithm. The results for 174,355 students in 146 countries were published on July 6, and showed large discrepancies between students' predicted and final grades, which placed the postsecondary aspirations for many in jeopardy. Students, parents, teachers, academics and journalists demanded to know how grades were calculated and what statistical model was used. An online petition calling for "Justice for May 2020 IB Graduates" with the hashtag #ibscandal collected 15,000 signatures within the first four days.

This study is part of a larger research project on discourses surrounding the IB, which up to now have shown an overwhelmingly positive prosody constructed through repetition and incremental effects. The present study aims to uncover values and attitudes associated with the IB in this new context that previously may have been hidden or taken for granted. Preliminary findings point to shifts in discourses that can be linked to events taking place in the wider world, providing a rare and important window into the impact of "the global education industry" on students.

Week 14

Thursday 4th February 2021

1:00-2:00pm

Online: join mailing list or contact organisers to receive link

The Application of Natural Language Processing in a Study of LGBTQ+ Cancer Experiences

Daisy Harvey

Spectrum Centre for Mental Health Research Lancaster University

  • Abstract

Current research suggests that there are knowledge gaps in institutional practices towards lesbian, gay, bisexual, transgender, and queer/questioning (LGBTQ) cancer patients, and that the LGBTQ+ community represent a 'growing and medically underserved population' (Quinn et al. 2015). In the context of cancer care, quantitative evidence shows that there are disparities in cancer outcomes between LGBTQ+ cancer patients and their 'heterosexual and cisgender counterparts' (Kamen et al. 2019), but there is a lack of qualitative research to address where services are lacking from the perspective of an LGBTQ+ service user.

This research demonstrates how the age of the internet can be utilised to provide an insight into underserved populations, and to gain empirical evidence and honest accounts from service users who might experience a fear of stigma or mistreatment in offline settings. The research explores the practical application of NLP and presents a methodology that encompasses web scraping, corpus creation, data annotation and anonymisation, a hybrid system for emotion detection utilising the NRC emotion intensity lexicon (Mohammad 2017) with machine learning methods, and topic modelling using Latent Dirichlet Allocation (LDA). The results of the research demonstrate an emotion detection classifier with a micro F1 score of 65%, and 8 clusters of topics that emerge from the topic modelling task. These topics provide insights that provoke further discussion, particularly within the theme of Diagnosis, Treatment and Sexuality, where excerpts describe that 'LGBT people with cancer can face discrimination and disqualification', and 'healthcare resources are all based on heteronormative assumptions'.

References

Kamen, C.S., Alpert, A., Margolies, L., Griggs, J.J., Darbes, L., Smith-Stoner, M., Lytle, M., Poteat, T., Scout, N.F.N. and Norton, S.A. (2019). "Treat us with dignity": a qualitative study of the experiences and recommendations of lesbian, gay, bisexual, transgender, and queer (LGBTQ) patients with cancer. Supportive Care in Cancer, 27(7), 2525-2532.

Mohammad, S. M. (2017). Word affect intensities. arXiv preprint arXiv:1704.08798.

Quinn, G.P., Sanchez, J.A., Sutton, S.K., Vadaparampil, S.T., Nguyen, G.T., Green, B.L., Kanetsky, P.A. and Schabath, M.B. (2015). Cancer and lesbian, gay, bisexual, transgender/transsexual, and queer/questioning (LGBTQ) populations. CA: a cancer journal for clinicians, 65(5), 384-400.

Week 9

Thursday 3rd December 2020

2:00-3:00pm

Online: join mailing list or contact organisers to receive link

Triangulating corpus linguistics and clinical psychology in a study of narratives of voice-hearers

Luke Collins1 & Elena Semino2

1CASS, Lancaster University  2LAEL, Lancaster University

  • Abstract

We present a collaborative work between the 'Hearing the Voice' project (Durham University) and the ESRC Centre for Corpus Approaches to Social Science (CASS) with colleagues Dr Zsófia Demjén and Dr Vaclav Brezina, investigating the reports of individuals who hear voices that others cannot hear. Focusing on the description of such voices as 'person-like', we demonstrate how methods from corpus linguistics can be triangulated with approaches in clinical psychology. We find that an approach to investigating personhood based on the selection of specific linguistic aspects of the reports is convergent with the characterisation of participant experiences as 'minimal' or 'complex', based on a manual coding scheme developed by our colleagues in psychology. Furthermore, our corpus-based approach provides further insights into degrees of complexity, provisionally outlining a 'complexity scale' and contributing to increased understanding of experiences of voice-hearing in terms of personification of voices. The implementation of corpus methods in this work also highlighted important methodological considerations for the wider application of corpus linguistics.

Week 8

Thursday 26th November 2020

12:00-1:00pm

Online: join mailing list or contact organisers to receive link

Usage-based perspective on the meaning-preserving hypothesis in voice alternation: Corpus linguistic and experimental studies in Indonesian

Gede Primahadi Wijaya RAJEG1, I Made RAJEG1 & I Wayan Arka2

1Universitas Udayana  2Australian National University & Universitas Udayana

  • Abstract

Voice alternation between active (AV) and passive (PASS) clauses is viewed as a "meaning-preserving alternation" (Kroeger, 2005, p. 271). It means that AV and PASS clauses based on the same verb should convey the same kind of event/meaning (cf. (1) & (2)).

1.Indonesian (ind_mixed_2012_1M-sentences.txt:755227)

murid Go bie-pay yang meng-(k)ena-kan baju warna hitam.

pupil NAME REL AV-hit-CAUS shirt colour black

'Go bie-pay's student who wears/puts on a black shirt'

2.Indonesian (ind_mixed_2012_1M-sentences.txt:802596)

Gaun yang di-kena-kan ber-warna hitam

dress REL PASS-hit-CAUS have.colour black

'The dress that is worn/put on is black'

Examples (1) and (2) convey the same event of wearing a clothing. The difference lies in the alignments of semantic roles and grammatical relations, especially that affecting the identity of the syntactic SUBJ(ect): in (2), the Theme (i.e. clothing) is PASS SUBJ, which is the direct OBJ(ect) in (1). Argument for the meaning-preserving status of AV-PASS alternation is typically illustrated using a pair of (often constructed) examples as in (1) and (2). Following up on our earlier work with the root kena 'hit' (Rajeg et al., 2020), we offer a usage-based, quantitative perspective in testing the meaning-preserving hypothesis in voice alternation, by bringing together evidence from (i) corpus analysis and (ii) sentence-production experiment (cf. Dąbrowska, 2009; Newman & Sorenson Duncan, 2019, for similar approach). We analysed the distribution of (non-)metaphoric senses of a set of Indonesian CAUSED FORWARD/BACKWARD motion verbs in AV-PASS alternation. Our study demonstrates that voice alternation can be sensitive to the senses of the verbs, given that a verb can be polysemous (cf. Bernolet & Colleman, 2016, for Dative alternation in Dutch). Quantitative findings indicate that voice alternation exhibits frequency effects (Diessel, 2016), such that certain senses strongly (dis)prefer one voice type over the other. These findings offer initial evidence to McDonnell's (2016) hypothesis on the role of semantic properties (e.g. senses) of a verb in accounting for the strong preference of that verb to occur in a given voice (cf. Gries & Stefanowitsch, 2004). Converging results between corpus and experimental data also suggest that speakers may store detailed semantic preference of the verb in a given voice type, contributing to the idea of item-specific knowledge in usage-based, Construction Grammar (Goldberg, 2006, pp. 49, 56; cf. Dąbrowska, 2009; Diessel, 2016)

References

Bernolet, S., & Colleman, T. (2016). Sense-based and lexeme-based alternation biases in the Dutch dative alternation. In J. Yoon & S. Th. Gries (Eds.), Corpus-based approaches to Construction Grammar (pp. 165-198). John Benjamins Publishing Company.

Dąbrowska, E. (2009). Words as constructions. In V. Evans & S. Pourcel (Eds.), New directions in cognitive linguistics (pp. 214-237). John Benjamins Pub. Co.

Diessel, H. (2016). Frequency and lexical specificity in grammar: A critical review. In H. Behrens & S. Pfänder (Eds.), Experience Counts: Frequency Effects in Language. De Gruyter. https://doi.org/10.1515/9783110346916-009

Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford University Press.

Gries, S. Th., & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on "alternations." International Journal of Corpus Linguistics, 9(1), 97-129.

Kroeger, P. R. (2005). Analyzing Grammar: An Introduction. Cambridge University Press.

McDonnell, B. (2016). Symmetrical voice constructions in Besemah: A usage-based approach [PhD dissertation, University of California, Santa Barbara]. https://www.alexandria.ucsb.edu/lib/ark:/48907/f3mp53bw

Newman, J., & Sorenson Duncan, T. (2019). The subject of ROAR in the mind and in the corpus: What divergent results can teach us. Linguistica Atlantica, 37(1), 1-27.

Rajeg, G. P. W., Rajeg, I. M., & Arka, I. W. (2020). Corpus-based approach meets LFG: Puzzling voice alternation in Indonesian. Paper Presented at the 25th International Lexical-Functional Grammar. Figshare. https://doi.org/10.6084/M9.FIGSHARE.12423788.V3

Week 7

Thursday 19th November 2020

2:00-3:00pm

Online: join mailing list or contact organisers to receive link

Affix substitution in Indonesian and its impact for discriminative learning

Karlina Denistia & R. Harald Baayen

Universität Tübingen

  • Abstract

This study explores computational modelling on two Indonesian nominal prefixes that realize similar function to the English suffix -er, PE- and PEN- (e.g., perenang `swimmer' and penari `dancer'). These prefixes are described as having very similar in form and meaning (Sneddon et al., 2010). Interestingly, PE- and PEN- often stand in a paradigmatic relation to verbal base words with the prefixes BER- and MEN- respectively (Dardjowidjojo, 1983). Thus, one could form a set of verb-noun words, such as berenang `to swim' - perenang `swimmer' and menari `to dance' - penari `dancer'. The central question addressed in the present study is whether the form similarities between PEN- (and its allomorphs) and MEN- (and its allomorphs) make this prefix easier to learn compared to PE-. To address this question, we made use of a computational model of lexical processing in the mental lexicon, the `discriminative lexicon' (DL) model introduced by (Baayen et al., 2019). Compiling the data from Leipzig Corpora Collection, a written Indonesian corpora (Goldhahn et al., 2012), we trained the model on 2517 word forms that were inflected or derived variants of 99 different base words. Of these 2517 word forms, 109 were nouns with PE- and 221 words were nouns with PEN-. Our results show that PE- is learnt somewhat better than PEN- for several reasons. As PE- is found to have a longer mean character length, it allows the model to discriminate better than PEN-. In the same vein, PEN- and MEN- has semantic cues competition, causing a less precision for the model to predict PEN-. Thus, the systematic paradigmatic similarities between PEN- and MEN- render these words more difficult for implicit lexical learning.

References

Baayen, R. H., Chuang, Y.-Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, pages 1-39.

Dardjowidjojo, S. (1983). Some Aspects of Indonesian Linguistics. Djambatan, Jakarta.

Goldhahn, D., Eckart, T., and Quasthoff, U. (2012). Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 1799-1802.

Sneddon, J. N., Adelaar, A., Djenar, D. N., and Ewing, M. C. (2010). Indonesian: A Comprehensive Grammar. Routledge, New York, second edition.

Week 6

Thursday 12th November 2020

2:00-3:00pm

Online: join mailing list or contact organisers to receive link

Measuring lexical complexity in L2 spoken production: Evidence from the Trinity Lancaster Corpus

Raffaella Bottini

CASS, Lancaster University

  • Abstract

The study validates lexical complexity measures for L2 spoken language using the 4.2-million-word Trinity Lancaster Corpus of L2 spoken English. Studies on learner language have shown that vocabulary knowledge is one of the best predictors of language use and overall proficiency (e.g. Milton, 2013). Different measures of vocabulary knowledge have been proposed in the field and lexical complexity plays a key role among them (e.g. Kim et al., 2018; Kyle & Crossley, 2015; Lu, 2012). However, little is known about different aspects of lexical complexity in L2 speech; also, there is no general agreement about which of the many existing complexity indices to use. This corpus-based study examines the reliability and validity of existing lexical measures - including indices which have not been validated before - and their relationship with learner characteristics (L1 and proficiency) and task-related features (topic familiarity). It introduces Lex Complexity Tool, a new tool which computes all the measures analysed and which includes a spoken wordlist from the Spoken BNC2014. The findings inform the choice of lexical indices tailored to research in second language acquisition and language testing, especially when L2 English speech is considered.

Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786.

Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners' oral narratives. The Modern Language Journal, 96(2), 190-208.

Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition, knowledge and use. New perspectives on assessment and corpus analysis (pp. 57-78). Eurosla.

This session is not going to be recorded

Week 5

Thursday 5th November 2020

2:00-3:00pm

Online: join mailing list or contact organisers to receive link

CLEC: Colombian Learner English Corpus

Maria Victoria Pardo, Antonio Tamayo, Manuel Alejandro Gómez & Nicolás Alberto Henao

Universidad del Norte

  • Abstract

The objective of this presentation is to introduce to the research community the CLEC (Colombian Learner English Corpus). This corpus was created following the guidelines of the Computational Corpus Linguistics (McEnery & Hardie, 2011) and according to the compilation parameters of corpus of learners defined as "electronic collections of natural or almost natural data produced by foreign or second language learners (L2) and gathered according to explicit design criteria "Granger (2002, p. 7), Gilquin (2015, p.1). The TNT (Translation and New Technologies) research group of the University of Antioquia created the CLEC. It is an application that compiles 515 written compositions of students of English as a foreign language at university level. The application allows the search for information in the tagged data, it filters error labels systematically by category or type and allows you to find the trend of learner errors. The resulting product is a web responsive application that completely performs searches and does analysis on the tagged corpus of errors.

Week 4

Thursday 29th October 2020

12:00-1:00pm

Online: join mailing list or contact organisers to receive link

Trainee EFL teachers' DDL lesson planning: Improving corpus-focused TPACK in Indonesia

Peter Crosthwaite

University of Queensland, Australia

  • Abstract

The use of corpora for language teaching/learning, via teacher-prepared corpus-assisted materials development or learners' direct use of corpus query software (commonly known as "data-driven learning", DDL) is gaining in popularity in pre-tertiary EFL contexts. However, improving trainee English teachers' technological and pedagogical content knowledge (TPACK) regarding integration of corpus tools/DDL pedagogy into classroom practice has received little attention from a language teacher education perspective.

This qualitative study therefore reports on a DDL lesson planning intervention for pre-service secondary school EFL teachers in Indonesia. I explore how trainee language teachers integrate DDL into their lesson planning following DDL training, and whether the trainees' LPs demonstrates appropriate TPACK required for successful future implementation. Nine pre-service EFL teacher trainees were enrolled in a teacher education program in Jakarta, Indonesia. The DDL training regimen included partial completion of a Short Private Online Course on DDL (Improving Writing Through Corpora, Crosthwaite, 2020) covering basic corpus techniques required for DDL (e.g. generating corpus queries, reading/manipulating concordances, understanding frequency information) using SKELL (Baisa & Suchomel, 2014) and SketchEngine (Kilgariff et. al, 2014). Trainees then submitted a sample lesson plan which was scrutinized for components where corpus data could enhance the proposed lesson's materials or where learners could engage in direct corpus consultation/DDL. Three, three-hour workshops on DDL were then conducted online via Zoom. Following these, trainees discussed their DDL training and lesson planning via Google Classroom chat, before working alone to create a new lesson plan including at least one direct DDL activity. Data includes the researcher's initial feedback regarding integration of DDL into trainees' original (non-DDL) lesson plans, trainees' Google Classroom chat logs, and trainees' completed lesson plans involving DDL resources/activities. Harris et. al's (2010) Technology Integration Observation Instrument was used to evaluate trainees' completed lesson plans for TPACK regarding curriculum goals and technologies; instructional strategies and technologies; Technology selection(s); and 'Fit'.

The data suggest trainees each integrated corpora/DDL into their lesson planning despite none reporting using a corpus prior to training. Submitted lesson plans featured DDL for language-related concerns (e.g. 'grammar focus'), and to support task-based genre-focused pedagogies as required by the Indonesian national curriculum. While submitted plans demonstrated high levels of 'fit' regarding curriculum goals and technology selection, some plans lacked DDL-relevant instructional strategies. However, TPACK scores for submitted lesson plans were generally high following only a short (but intensive) period of DDL training, underscoring the significant potential for integrating DDL into pre-tertiary classroom practice.

References

Baisa, V. & Suchomel, V. (2014) SkELL: Web interface for English language learning. In Horák, A. & Rychlý, P. (ed.), Proceedings of Recent Advances in Slavonic Natural Language Processing. Karlova Studánka, Czech Republic, 5-7 December, 63-70.

Crosthwaite, P. (2020). Taking DDL online: Designing, implementing and evaluating a SPOC on data-driven learning for tertiary L2 writing. Australian Review of Applied Linguistics, 43(2), 169-195.

Harris, J., Grandgenett, N., & Hofer, M. (2010). Testing a TPACK-based technology integration assessment rubric. In C. D. Maddux, D. Gibson, & B. Dodge (Eds.), Research highlights in technology and teacher education (pp. 323-331). Chesapeake, VA: Society for Information Technology & Teacher Education (SITE).

Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., & Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography, 1(1), 7-36.