Vision-Language Pretraining for Bone Tumor Classification

Abstract:

Bone tumor classification presents significant challenges due to the subtle visual differences among tumor entities, even for expert radiologists. This thesis aims to enhance diagnostic capabilities using vision-language pretraining to classify bone tumors from X-ray images. By pretraining on large datasets and incorporating anatomical context through captions, this thesis seeks to address key limitations posed by data scarcity and anatomical heterogeneity in the field of bone tumors.

Methodology:

  • Literature review on the current state-of-the-art techniques for bone tumor classification and self-supervised vision-language pretraining.
  • Implement a supervised learning model for bone tumor classification using X-Rays which will serve as a baseline.
  • Pretrain a vision-language model in a self-supervised manner, which will serve as a general-purpose model for downstream task.
  • Test several fine-tuning strategies for bone tumor classification and test zero-shot capabilities.

Prerequisites:

  • Advanced knowledge of deep learning with imaging data.
  • Beneficial but not necessary: experience in medicine/oncology.
  • Preferred starting date: January-February 2025 (with flexibility).

What we offer:

  • Very rare medical data with high potential for publication.
  • Highly educated & interdisciplinary environment.
  • Top-level hardware for scientific computing.
  • Constant feedback from medical and computer science experts.

How to apply:

Send an email to anna.curto-vilalta@tum.de, with your CV and small introduction about you and your motivation.

References:

A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” Feb. 26, 2021, arXiv: arXiv:2103.00020. doi: https://doi.org/10.48550/arXiv.2103.00020.

H. Q. Vo et al., “Frozen Large-scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction,” in IEEE Journal of Biomedical and Health Informatics, doi: 10.1109/JBHI.2024.3507638.


Anna Curto Vilalta
Anna Curto Vilalta
PhD Student

Multi-Modal Deep Learning in Medical Imaging.