Affix substitution in Indonesian and its impact for discriminative learning

Karlina Denistia & R. Harald Baayen

Universität Tübingen

This study explores computational modelling on two Indonesian nominal prefixes that realize similar function to the English suffix -er, PE- and PEN- (e.g., perenang `swimmer' and penari `dancer'). These prefixes are described as having very similar in form and meaning (Sneddon et al., 2010). Interestingly, PE- and PEN- often stand in a paradigmatic relation to verbal base words with the prefixes BER- and MEN- respectively (Dardjowidjojo, 1983). Thus, one could form a set of verb-noun words, such as berenang `to swim' - perenang `swimmer' and menari `to dance' - penari `dancer'. The central question addressed in the present study is whether the form similarities between PEN- (and its allomorphs) and MEN- (and its allomorphs) make this prefix easier to learn compared to PE-. To address this question, we made use of a computational model of lexical processing in the mental lexicon, the `discriminative lexicon' (DL) model introduced by (Baayen et al., 2019). Compiling the data from Leipzig Corpora Collection, a written Indonesian corpora (Goldhahn et al., 2012), we trained the model on 2517 word forms that were inflected or derived variants of 99 different base words. Of these 2517 word forms, 109 were nouns with PE- and 221 words were nouns with PEN-. Our results show that PE- is learnt somewhat better than PEN- for several reasons. As PE- is found to have a longer mean character length, it allows the model to discriminate better than PEN-. In the same vein, PEN- and MEN- has semantic cues competition, causing a less precision for the model to predict PEN-. Thus, the systematic paradigmatic similarities between PEN- and MEN- render these words more difficult for implicit lexical learning.

References

Baayen, R. H., Chuang, Y.-Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, pages 1-39.

Dardjowidjojo, S. (1983). Some Aspects of Indonesian Linguistics. Djambatan, Jakarta.

Goldhahn, D., Eckart, T., and Quasthoff, U. (2012). Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 1799-1802.

Sneddon, J. N., Adelaar, A., Djenar, D. N., and Ewing, M. C. (2010). Indonesian: A Comprehensive Grammar. Routledge, New York, second edition.

Week 7 2020/2021

Thursday 19th November 2020
2:00-3:00pm

Online: join mailing list or contact organisers to receive link