MSc Thesis: Contrastive Pre-Training for Radiology Reports

Jan 26, 2022

In recent years transformer-based language models have proven quite successful in the field of natural language processing (NLP). These models require huge amounts of training data and are therefore typically pre-trained on unlabelled datasets using self-supervised objectives like masked language modelling (MLM) as proposed in BERT [1]. While models like BioBERT [2] are pre-trained on the medical domain, the used pre-training objectives like MLM treat text as independent sentences and do not utilise the structure of medical documents. In this project we instead make use of the semi-structured nature of radiology reports and apply contrastive methods on the sections of these reports. Your task is the adaptation of such contrastive methods (e.g. SimCLR [3], BYOL [4], DINO [5], …) to be used effectively on language models.

What we offer

Close supervision and access to state-of-the-art computer hardware
A strong research group with lots of practical experience
Cutting-edge research in Medical NLP with the opportunity to publish your work

Requirements

Advanced programming skills in Python and deep learning frameworks like PyTorch, JAX, or Tensorflow
Strong background in deep learning, preferable (but not required) with experience in NLP
Basic familiarity with self-supervised methods like SimCLR is preferable but not required

References

[1] J. Devlin et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint [arXiv:1810.04805] (2018).
[2] J. Lee et al. “BioBERT: a pre-trained biomedical language representation model for biomedical text mining.” Bioinformatics 4.36 [link] (2020)
[3] T. Chen et al. “Big Self-Supervised Models are Strong Semi-Supervised Learners.” NeurIPS [arXiv:2006.10029] (2020)
[4] J. Grill et al. “Bootstrap Your Own Latent A New Approach to Self-Supervised Learning.” NIPS [link] (2020)
[5] M. Caron et al. “Emerging Properties in Self-Supervised Vision Transformers.” ICCV [arXiv:2104.14294] (2021)

MSc Thesis: Contrastive Pre-Training for Radiology Reports

What we offer

Requirements

References

Philip Müller

PhD Student