MSc Thesis: Large Language Models in Medicine

Description:

Large Language Models (LLMs) have shown exceptional capabilities in understanding and generating human-like text. In the medical field, these models hold the potential to revolutionize patient care, medical research, and healthcare administration. By leveraging LLMs, we can enhance diagnostic accuracy, streamline administrative tasks, and provide personalized patient care. However, the integration of these models into medical practice poses unique challenges and opportunities that warrant thorough investigation. Objective

The primary objective of this thesis is to develop and evaluate AI systems powered by Large Language Models that aid medical professionals in their work. Specific research topics and applications will be agreed upon with the student, allowing for a tailored approach that aligns with the student’s interests and expertise. Potential areas of focus include (but are not limited to):

  • LLMs and Knowledge Graphs: Integrating LLMs with medical knowledge graphs to improve information retrieval and decision support.
  • Retrieval-Augmented Generation: Enhancing the generation capabilities of LLMs by incorporating external data sources.
  • Explainability: Developing methods to interpret and explain the outputs of LLMs to ensure transparency and trustworthiness in medical applications.
  • Uncertainty Quantification: Quantifying and communicating the uncertainty in LLM predictions to assist medical professionals in making informed decisions.

Methodology:

  1. Literature review: A comprehensive literature review will be conducted to understand the current state of LLMs in medicine from the chosen perspective.
  2. Data collection and preprocessing: Utilizing real clinical data in German, provided by the institution. The data will be preprocessed and anonymized to ensure compliance with privacy regulations. If needed, this will be extended with public data.
  3. Explore, implement and compare different approaches: Train and fine-tune LLMs.
  4. Evaluation: Assessing the performance of the developed systems through quantitative metrics and qualitative feedback from medical experts.
  5. Discussion and presentation of results

Prerequisites:

  • Language skills: Good English and at least intermediate German language skills are necessary for working with clinical data.
  • Technical skills: Advanced knowledge of deep learning, proficiency in Python and experience with PyTorch is essential. Familiarity with the HuggingFace ecosystem is a plus.
  • Medical data experience: Previous experience with medical data is a beneficial.

What we offer:

  • Access to real clinical data: The thesis will provide access to unique clinical data with high potential for publication and impactful research.
  • Interdisciplinary environment: Students will work in a highly educated and interdisciplinary team, fostering collaboration between computer science and medical experts.
  • Advanced computing resources: The institution offers top-level hardware for scientific computing, ensuring efficient and effective model training and experimentation.
  • Expert feedback: Continuous feedback from both medical professionals and computer science experts will guide the research process, ensuring the development of relevant and high-quality solutions.

How to apply:

Preferred start date: September 2024, with flexibility

Send an email to marton.szep@tum.de, with your CV, transcript of records, and a small introduction about you and your motivation.

References:

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering.

Zhang, B., & Soh, H. (2024). Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction. arXiv preprint arXiv:2404.03868.

Cao, L., Sun, J., & Cross, A. (2024). AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models. arXiv preprint arXiv:2403.00953.

Kommineni, V. K., König-Ries, B., & Samuel, S. (2024). From human experts to machines: An LLM supported approach to ontology and knowledge graph construction. arXiv preprint arXiv:2403.08345.

Márton Szép
Márton Szép
PhD Student

My research focuses on natural language processing and large language models in medicine.