MSc Thesis: Transformer-based Optical Flow Estimation in General Computer Vision


Deep learning has reached a new era in 2021, with Transformer-based networks making a name for themselves in Computer vision tasks, topping the Leaderboard in Recognition, Detection and Segmentation [1-3]. However, the power of Transformers has not been researched in optical flow estimation. Based on our current knowledge about optical flow and Transformers, we believe that Transformer has the potential to surpass the state-of-the-art convolution-based networks like [4-6] in the field of flow estimation. During this project, you will develop a brand new transformer-based neural network aiming at solving the flow estimation problem, and test them on leading benchmarks like Sintel [7] and KITTI [8]. Are you ready for this challenge?

We offer

  • A warm start of the project with the state-of-the-art knowledge of the group in this field
  • A chance to collaborate with international experts in Deep learning who have connected with our lab
  • A chance to publish if the work shines

We expect you have

  • Strong background in linear algebra and Deep Learning, familiar with the classic CNN backbones
  • proficiency in Python, experience with Tensorflow, Pytorch and/or JAX
  • Knowledge in Optical Flow Estimation and/or Transformer would be a big plus
  • Passions in Research and Computer vision (which is the most important thing)

If you are interested in this work and ready for a new challenge, please feel free to contact us:)


[1] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Confer- ence on Learning Representations, 2021.

[2] Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable DETR: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.

[3] Ze Liu, Yutong Lin, Yue Ca, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021.

[4] Dosovitskiy A., Fischer P., Ilg E., Häusser P., Hazırbas C., Golkov V., Smagt P., Cremers D., Brox T.: FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the Fifteenth IEEE International Conference on Computer Vision, pp. 2758–2766. Santiago, Chile, 2015

[5] Sun D., Yang X., Liu M.Y., Kautz J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8934–8943. Salt Lake City, Utah, 2018

[6] Teed Z., Deng J., RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In European Conference on Computer Vision, pp. 402-419, 2020

[7] Butler D.J., Wulff J., Stanley G.B., Black M.J.: A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision. pp. 611–625. Springer, 2012

[8] Geiger A., Lenz P., Stiller C., Urtasun R.: Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32(11), 1231–1237, 2013

Jiazhen Pan
Jiazhen Pan
PhD Student

My main research interests lie in medical imaging computing, semantic segmentation and flow estimation