Previous seminars

Seminars from previous years are still being added, the archive is still available on the old website.

Academic year:


Week 8

Thursday 26th November 2020


Online: join mailing list or contact organisers to receive link

Usage-based perspective on the meaning-preserving hypothesis in voice alternation: Corpus linguistic and experimental studies in Indonesian

Gede Primahadi Wijaya RAJEG1, I Made RAJEG1 & I Wayan Arka2

1Universitas Udayana  2Australian National University & Universitas Udayana

  • Abstract

Voice alternation between active (AV) and passive (PASS) clauses is viewed as a "meaning-preserving alternation" (Kroeger, 2005, p. 271). It means that AV and PASS clauses based on the same verb should convey the same kind of event/meaning (cf. (1) & (2)).

1.Indonesian (ind_mixed_2012_1M-sentences.txt:755227)

murid Go bie-pay yang meng-(k)ena-kan baju warna hitam.

pupil NAME REL AV-hit-CAUS shirt colour black

'Go bie-pay's student who wears/puts on a black shirt'

2.Indonesian (ind_mixed_2012_1M-sentences.txt:802596)

Gaun yang di-kena-kan ber-warna hitam

dress REL PASS-hit-CAUS have.colour black

'The dress that is worn/put on is black'

Examples (1) and (2) convey the same event of wearing a clothing. The difference lies in the alignments of semantic roles and grammatical relations, especially that affecting the identity of the syntactic SUBJ(ect): in (2), the Theme (i.e. clothing) is PASS SUBJ, which is the direct OBJ(ect) in (1). Argument for the meaning-preserving status of AV-PASS alternation is typically illustrated using a pair of (often constructed) examples as in (1) and (2). Following up on our earlier work with the root kena 'hit' (Rajeg et al., 2020), we offer a usage-based, quantitative perspective in testing the meaning-preserving hypothesis in voice alternation, by bringing together evidence from (i) corpus analysis and (ii) sentence-production experiment (cf. Dąbrowska, 2009; Newman & Sorenson Duncan, 2019, for similar approach). We analysed the distribution of (non-)metaphoric senses of a set of Indonesian CAUSED FORWARD/BACKWARD motion verbs in AV-PASS alternation. Our study demonstrates that voice alternation can be sensitive to the senses of the verbs, given that a verb can be polysemous (cf. Bernolet & Colleman, 2016, for Dative alternation in Dutch). Quantitative findings indicate that voice alternation exhibits frequency effects (Diessel, 2016), such that certain senses strongly (dis)prefer one voice type over the other. These findings offer initial evidence to McDonnell's (2016) hypothesis on the role of semantic properties (e.g. senses) of a verb in accounting for the strong preference of that verb to occur in a given voice (cf. Gries & Stefanowitsch, 2004). Converging results between corpus and experimental data also suggest that speakers may store detailed semantic preference of the verb in a given voice type, contributing to the idea of item-specific knowledge in usage-based, Construction Grammar (Goldberg, 2006, pp. 49, 56; cf. Dąbrowska, 2009; Diessel, 2016)


Bernolet, S., & Colleman, T. (2016). Sense-based and lexeme-based alternation biases in the Dutch dative alternation. In J. Yoon & S. Th. Gries (Eds.), Corpus-based approaches to Construction Grammar (pp. 165-198). John Benjamins Publishing Company.

Dąbrowska, E. (2009). Words as constructions. In V. Evans & S. Pourcel (Eds.), New directions in cognitive linguistics (pp. 214-237). John Benjamins Pub. Co.

Diessel, H. (2016). Frequency and lexical specificity in grammar: A critical review. In H. Behrens & S. Pfänder (Eds.), Experience Counts: Frequency Effects in Language. De Gruyter.

Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford University Press.

Gries, S. Th., & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on "alternations." International Journal of Corpus Linguistics, 9(1), 97-129.

Kroeger, P. R. (2005). Analyzing Grammar: An Introduction. Cambridge University Press.

McDonnell, B. (2016). Symmetrical voice constructions in Besemah: A usage-based approach [PhD dissertation, University of California, Santa Barbara].

Newman, J., & Sorenson Duncan, T. (2019). The subject of ROAR in the mind and in the corpus: What divergent results can teach us. Linguistica Atlantica, 37(1), 1-27.

Rajeg, G. P. W., Rajeg, I. M., & Arka, I. W. (2020). Corpus-based approach meets LFG: Puzzling voice alternation in Indonesian. Paper Presented at the 25th International Lexical-Functional Grammar. Figshare.

Week 7

Thursday 19th November 2020


Online: join mailing list or contact organisers to receive link

Affix substitution in Indonesian and its impact for discriminative learning

Karlina Denistia & R. Harald Baayen

Universität Tübingen

  • Abstract

This study explores computational modelling on two Indonesian nominal prefixes that realize similar function to the English suffix -er, PE- and PEN- (e.g., perenang `swimmer' and penari `dancer'). These prefixes are described as having very similar in form and meaning (Sneddon et al., 2010). Interestingly, PE- and PEN- often stand in a paradigmatic relation to verbal base words with the prefixes BER- and MEN- respectively (Dardjowidjojo, 1983). Thus, one could form a set of verb-noun words, such as berenang `to swim' - perenang `swimmer' and menari `to dance' - penari `dancer'. The central question addressed in the present study is whether the form similarities between PEN- (and its allomorphs) and MEN- (and its allomorphs) make this prefix easier to learn compared to PE-. To address this question, we made use of a computational model of lexical processing in the mental lexicon, the `discriminative lexicon' (DL) model introduced by (Baayen et al., 2019). Compiling the data from Leipzig Corpora Collection, a written Indonesian corpora (Goldhahn et al., 2012), we trained the model on 2517 word forms that were inflected or derived variants of 99 different base words. Of these 2517 word forms, 109 were nouns with PE- and 221 words were nouns with PEN-. Our results show that PE- is learnt somewhat better than PEN- for several reasons. As PE- is found to have a longer mean character length, it allows the model to discriminate better than PEN-. In the same vein, PEN- and MEN- has semantic cues competition, causing a less precision for the model to predict PEN-. Thus, the systematic paradigmatic similarities between PEN- and MEN- render these words more difficult for implicit lexical learning.


Baayen, R. H., Chuang, Y.-Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, pages 1-39.

Dardjowidjojo, S. (1983). Some Aspects of Indonesian Linguistics. Djambatan, Jakarta.

Goldhahn, D., Eckart, T., and Quasthoff, U. (2012). Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 1799-1802.

Sneddon, J. N., Adelaar, A., Djenar, D. N., and Ewing, M. C. (2010). Indonesian: A Comprehensive Grammar. Routledge, New York, second edition.

Week 6

Thursday 12th November 2020


Online: join mailing list or contact organisers to receive link

Measuring lexical complexity in L2 spoken production: Evidence from the Trinity Lancaster Corpus

Raffaella Bottini

CASS, Lancaster University

  • Abstract

The study validates lexical complexity measures for L2 spoken language using the 4.2-million-word Trinity Lancaster Corpus of L2 spoken English. Studies on learner language have shown that vocabulary knowledge is one of the best predictors of language use and overall proficiency (e.g. Milton, 2013). Different measures of vocabulary knowledge have been proposed in the field and lexical complexity plays a key role among them (e.g. Kim et al., 2018; Kyle & Crossley, 2015; Lu, 2012). However, little is known about different aspects of lexical complexity in L2 speech; also, there is no general agreement about which of the many existing complexity indices to use. This corpus-based study examines the reliability and validity of existing lexical measures - including indices which have not been validated before - and their relationship with learner characteristics (L1 and proficiency) and task-related features (topic familiarity). It introduces Lex Complexity Tool, a new tool which computes all the measures analysed and which includes a spoken wordlist from the Spoken BNC2014. The findings inform the choice of lexical indices tailored to research in second language acquisition and language testing, especially when L2 English speech is considered.

Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786.

Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners' oral narratives. The Modern Language Journal, 96(2), 190-208.

Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition, knowledge and use. New perspectives on assessment and corpus analysis (pp. 57-78). Eurosla.

This session is not going to be recorded

Week 5

Thursday 5th November 2020


Online: join mailing list or contact organisers to receive link

CLEC: Colombian Learner English Corpus

Maria Victoria Pardo, Antonio Tamayo, Manuel Alejandro Gómez & Nicolás Alberto Henao

Universidad del Norte

  • Abstract

The objective of this presentation is to introduce to the research community the CLEC (Colombian Learner English Corpus). This corpus was created following the guidelines of the Computational Corpus Linguistics (McEnery & Hardie, 2011) and according to the compilation parameters of corpus of learners defined as "electronic collections of natural or almost natural data produced by foreign or second language learners (L2) and gathered according to explicit design criteria "Granger (2002, p. 7), Gilquin (2015, p.1). The TNT (Translation and New Technologies) research group of the University of Antioquia created the CLEC. It is an application that compiles 515 written compositions of students of English as a foreign language at university level. The application allows the search for information in the tagged data, it filters error labels systematically by category or type and allows you to find the trend of learner errors. The resulting product is a web responsive application that completely performs searches and does analysis on the tagged corpus of errors.

Week 4

Thursday 29th October 2020


Online: join mailing list or contact organisers to receive link

Trainee EFL teachers' DDL lesson planning: Improving corpus-focused TPACK in Indonesia

Peter Crosthwaite

University of Queensland, Australia

  • Abstract

The use of corpora for language teaching/learning, via teacher-prepared corpus-assisted materials development or learners' direct use of corpus query software (commonly known as "data-driven learning", DDL) is gaining in popularity in pre-tertiary EFL contexts. However, improving trainee English teachers' technological and pedagogical content knowledge (TPACK) regarding integration of corpus tools/DDL pedagogy into classroom practice has received little attention from a language teacher education perspective.

This qualitative study therefore reports on a DDL lesson planning intervention for pre-service secondary school EFL teachers in Indonesia. I explore how trainee language teachers integrate DDL into their lesson planning following DDL training, and whether the trainees' LPs demonstrates appropriate TPACK required for successful future implementation. Nine pre-service EFL teacher trainees were enrolled in a teacher education program in Jakarta, Indonesia. The DDL training regimen included partial completion of a Short Private Online Course on DDL (Improving Writing Through Corpora, Crosthwaite, 2020) covering basic corpus techniques required for DDL (e.g. generating corpus queries, reading/manipulating concordances, understanding frequency information) using SKELL (Baisa & Suchomel, 2014) and SketchEngine (Kilgariff et. al, 2014). Trainees then submitted a sample lesson plan which was scrutinized for components where corpus data could enhance the proposed lesson's materials or where learners could engage in direct corpus consultation/DDL. Three, three-hour workshops on DDL were then conducted online via Zoom. Following these, trainees discussed their DDL training and lesson planning via Google Classroom chat, before working alone to create a new lesson plan including at least one direct DDL activity. Data includes the researcher's initial feedback regarding integration of DDL into trainees' original (non-DDL) lesson plans, trainees' Google Classroom chat logs, and trainees' completed lesson plans involving DDL resources/activities. Harris et. al's (2010) Technology Integration Observation Instrument was used to evaluate trainees' completed lesson plans for TPACK regarding curriculum goals and technologies; instructional strategies and technologies; Technology selection(s); and 'Fit'.

The data suggest trainees each integrated corpora/DDL into their lesson planning despite none reporting using a corpus prior to training. Submitted lesson plans featured DDL for language-related concerns (e.g. 'grammar focus'), and to support task-based genre-focused pedagogies as required by the Indonesian national curriculum. While submitted plans demonstrated high levels of 'fit' regarding curriculum goals and technology selection, some plans lacked DDL-relevant instructional strategies. However, TPACK scores for submitted lesson plans were generally high following only a short (but intensive) period of DDL training, underscoring the significant potential for integrating DDL into pre-tertiary classroom practice.


Baisa, V. & Suchomel, V. (2014) SkELL: Web interface for English language learning. In Horák, A. & Rychlý, P. (ed.), Proceedings of Recent Advances in Slavonic Natural Language Processing. Karlova Studánka, Czech Republic, 5-7 December, 63-70.

Crosthwaite, P. (2020). Taking DDL online: Designing, implementing and evaluating a SPOC on data-driven learning for tertiary L2 writing. Australian Review of Applied Linguistics, 43(2), 169-195.

Harris, J., Grandgenett, N., & Hofer, M. (2010). Testing a TPACK-based technology integration assessment rubric. In C. D. Maddux, D. Gibson, & B. Dodge (Eds.), Research highlights in technology and teacher education (pp. 323-331). Chesapeake, VA: Society for Information Technology & Teacher Education (SITE).

Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., & Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography, 1(1), 7-36.