MSc Thesis: Transformer-based Optical Flow Estimation in General Computer Vision
Deep learning has reached a new era in 2021, with Transformer-based networks making a name for themselves in Computer vision tasks, topping the Leaderboard in Recognition, Detection and Segmentation [1-3]. However, the power of Transformers has not been researched in optical flow estimation. Based on our current knowledge about optical flow and Transformers, we believe that Transformer has the potential to surpass the state-of-the-art convolution-based networks like [4-6] in the field of flow estimation. During this project, you will develop a brand new transformer-based neural network aiming at solving the flow estimation problem, and test them on leading benchmarks like Sintel  and KITTI . Are you ready for this challenge?
- A warm start of the project with the state-of-the-art knowledge of the group in this field
- A chance to collaborate with international experts in Deep learning who have connected with our lab
- A chance to publish if the work shines
We expect you have
- Strong background in linear algebra and Deep Learning, familiar with the classic CNN backbones
- proficiency in Python, experience with Tensorflow, Pytorch and/or JAX
- Knowledge in Optical Flow Estimation and/or Transformer would be a big plus
- Passions in Research and Computer vision (which is the most important thing)
If you are interested in this work and ready for a new challenge, please feel free to contact us:)
 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Confer- ence on Learning Representations, 2021.
 Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable DETR: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.
 Ze Liu, Yutong Lin, Yue Ca, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021. arxiv.org/abs/2103.14030
 Dosovitskiy A., Fischer P., Ilg E., Häusser P., Hazırbas C., Golkov V., Smagt P., Cremers D., Brox T.: FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the Fifteenth IEEE International Conference on Computer Vision, pp. 2758–2766. Santiago, Chile, 2015
 Sun D., Yang X., Liu M.Y., Kautz J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8934–8943. Salt Lake City, Utah, 2018
 Teed Z., Deng J., RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In European Conference on Computer Vision, pp. 402-419, 2020
 Butler D.J., Wulff J., Stanley G.B., Black M.J.: A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision. pp. 611–625. Springer, 2012
 Geiger A., Lenz P., Stiller C., Urtasun R.: Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32(11), 1231–1237, 2013