MSc Thesis: LLM/VLM-based AI Agents Workflow for Simplifying Medical Image Analysis
This project is collaborated with University of Oxford.
Background:
Building deep learning based medical image analysis pipelines can be a challenge for clinicians and medical science researchers due to the reliance on expertise with deep learning development, coupled with significant heterogeneity in real-world medical data and dynamic tasks centered at diverse research questions. Previous works on low-coding deep learning for medical image analysis approaches [1, 2] cannot effectively mitigate the knowledge and experience gaps for inexperienced users. Recent large language model (LLM) or vision-language model (VLM)-based agent systems [3, 4] offer a novel way to provide a higher level of autonomy and less reliance on expertise for the development medical image analysis pipelines [5, 6].
Your tasks:
First, you will familiar yourself with the basic deployment and development with LLM-agent frameworks (e.g., LangChain [3]). You will then familiar yourself with a few key considerations in a specific medical imaging analysis application (e.g., those for analyzing cardiac MRI), for which proper handling currently relies on data scientists’ experience (technical details will be briefed by the project advisors). You will then build an LLM/VLM-based agent system for automized building and self-refining for a toy medical image analysis pipeline. You will evaluate the performance of a few mainstream LLM/VLMs as agent backends on the success rate, cost-effectiveness, and reliance on the clarity of human prompts defining the task.
Your qualifications:
We are looking for a highly motivated Master’s student in CS, Physics, Engineering, or Mathematics with
-
Good understanding/experience with LLM/VLM-backed agent systems (development and/or deployment). Experience with frameworks such as Langchain [3], Textgrad [4], etc. is highly desirable.
-
Advanced programming skills in Python and common DL frameworks. Experience with working with multi-GPU developments and dockers.
-
Experience with DL for computer vision tasks; experience working with data preprocessing, model development, and validation for real-world medical images is a plus.
-
Strong interest in teamwork and inter-disciplinary research.
What we offer:
-
The opportunity to join an ongoing project with the aim of publishing a top tier conference paper.
-
An exciting research project with many possibilities to bring in your own ideas.
-
Potential transition into a PhD project at TUM / University of Oxford.
-
The possibility to bring in your own ideas and combine them with state-of-the-art algorithms.
-
Close supervision by an interdisciplinary network of computer vision / medical imaging experts from top-tier university.
Start date: 1st May/June, 2025
How to apply:
Please send your CV and transcript to Jiazhen Pan jiazhen.pan@tum.de. Links to previous work (e.g., your GitHub profile, papers) are highly appreciated.
References:
[1] Isensee et at. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 2021
[2] Ma et al. Segment anything in medical images, Nature Communication 2023
[3] LongChain Consortium https://github.com/langchain-ai/langchain
[4] Yuksekgonul et al. TextGrad: Automatic “Differentiation” via Text, Arxiv 2024
[5] Hoops et al. VoxelPrompt: A Vision-Language Agent for Grounded Medical Image Analysis, Arxiv 2024
[6] Feng et al. M^ 3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging, Arxiv 2025