The Application of Natural Language Processing in a Study of LGBTQ+ Cancer Experiences

Daisy Harvey

Spectrum Centre for Mental Health Research Lancaster University

Current research suggests that there are knowledge gaps in institutional practices towards lesbian, gay, bisexual, transgender, and queer/questioning (LGBTQ) cancer patients, and that the LGBTQ+ community represent a 'growing and medically underserved population' (Quinn et al. 2015). In the context of cancer care, quantitative evidence shows that there are disparities in cancer outcomes between LGBTQ+ cancer patients and their 'heterosexual and cisgender counterparts' (Kamen et al. 2019), but there is a lack of qualitative research to address where services are lacking from the perspective of an LGBTQ+ service user.

This research demonstrates how the age of the internet can be utilised to provide an insight into underserved populations, and to gain empirical evidence and honest accounts from service users who might experience a fear of stigma or mistreatment in offline settings. The research explores the practical application of NLP and presents a methodology that encompasses web scraping, corpus creation, data annotation and anonymisation, a hybrid system for emotion detection utilising the NRC emotion intensity lexicon (Mohammad 2017) with machine learning methods, and topic modelling using Latent Dirichlet Allocation (LDA). The results of the research demonstrate an emotion detection classifier with a micro F1 score of 65%, and 8 clusters of topics that emerge from the topic modelling task. These topics provide insights that provoke further discussion, particularly within the theme of Diagnosis, Treatment and Sexuality, where excerpts describe that 'LGBT people with cancer can face discrimination and disqualification', and 'healthcare resources are all based on heteronormative assumptions'.


Kamen, C.S., Alpert, A., Margolies, L., Griggs, J.J., Darbes, L., Smith-Stoner, M., Lytle, M., Poteat, T., Scout, N.F.N. and Norton, S.A. (2019). "Treat us with dignity": a qualitative study of the experiences and recommendations of lesbian, gay, bisexual, transgender, and queer (LGBTQ) patients with cancer. Supportive Care in Cancer, 27(7), 2525-2532.

Mohammad, S. M. (2017). Word affect intensities. arXiv preprint arXiv:1704.08798.

Quinn, G.P., Sanchez, J.A., Sutton, S.K., Vadaparampil, S.T., Nguyen, G.T., Green, B.L., Kanetsky, P.A. and Schabath, M.B. (2015). Cancer and lesbian, gay, bisexual, transgender/transsexual, and queer/questioning (LGBTQ) populations. CA: a cancer journal for clinicians, 65(5), 384-400.

Week 14 2020/2021

Thursday 4th February 2021

Online: join mailing list or contact organisers to receive link