Transformers in image processing

In recent years, artificial intelligence (AI) methods have been increasingly used in various areas of industrial image processing. AI is increasingly enabling complex problems to be solved with a high degree of accuracy, leading to increased product quality, process optimization and cost savings. A new trend here is transformer architectures, which have also found their way into image processing via their spread in speech processing. The increasing popularity of transformer models, which can capture global context information thanks to their self-attention mechanism, is having a significant impact on the further development of AI technologies, including in the field of image processing. At the same time, the new methods present companies with new challenges and compete with conventional methods such as convolutional neural networks (CNNs).

The study therefore deals with the development in image processing and the critical comparison of transformer approaches to traditional CNNs. To this end, the study presents a comprehensive overview of the development of computer vision methods, from traditional CNNs to modern transformer architectures.

The results of the benchmarking experiments conducted as part of the study provide an objective quality comparison between Transformer models and CNNs. The comparison includes various use cases such as object recognition and segmentation in 2D and 3D contexts. Furthermore, the potentials and limitations of transformers are discussed, especially with regard to data requirements, computational effort and inference times. All of this is intended to facilitate the practical introduction to the use of transformer models for your own use cases.