Transfer Learning Basic Network Examination

Quick Check

Initial situation

The more often deep learning is used for projects, the quicker you realize that a suitable network is usually designed for exactly one task. This is also the case with object detection and recognition. The basic technology is there, but how do you quickly arrive at a suitable deep learning model for the new relevant task? Transfer learning has emerged as a good basis here, but suitable base networks must be available and, if possible, trained beforehand. This applies in particular with regard to the shortest possible time-to-market. It is necessary to examine how such base networks can be developed for the most relevant tasks and how they can be optimally reused. With a good basic model and a suitable transfer learning strategy, single character recognition is expected to be a special use case of object recognition for this project.

Solution idea

Optical character recognition (OCR) and recognition is a special application of object recognition. The problem with OCR is that it is difficult to find relevant and single-character data. Therefore, all kinds of existing models for object recognition and detection are investigated. The ability of each model to recognize smaller objects is crucial. Works based on the same model are used for OCR and further investigated. In addition to fine-tuning, new solutions are also being sought for transfer learning. The goal behind this is to look for ways to generate new data with minimal effort. Domain adaptation is a sub-study of this, which uses learned model parameters from annotated data and adapts their predictions for the learning task in a new use case. Domain adaptation could avoid the effort of generating part of the required new data.

KI-Fortschrittszentrum_270_QC_Transfer Learning Basic Network Examination_Abb1 — Figure 1: OCR and transfer learning, Alai Bürlike, Fraunhofer IPA

Benefit

The project provides an overview of existing models for object recognition and detection. In addition, the flexibility in terms of recognition capability of each model is investigated for the use case of optical character detection and recognition. The finally selected model, Faster RCNN with multiple RPN, tends to have the flexibility to output single character recognition with a word-level annotated input. This Faster RCNN with multiple RPN model is ready to be tested on a given dataset and optimized for foreground object recognition and single character recognition capability. The study results from the research part on transfer learning strategy provide the multiple state of the art, best fine tuning and heterogeneous transfer learning approaches.

Implementation of the AI application

In this project, all object recognition and detection models investigated are built using deep learning architectures, e.g. Faster-RCNN, Transformer, YOLO. In the transfer learning part, the finetuning and heterogeneous transfer learning approaches are based on deep learning models, e.g. Faster-RCNN and Transformer.

These two approaches can now be tested and compared with each other. Only when the heterogeneous transfer learning approach performs similarly to the traditional transfer learning approach with fine tuning can the additional effort of creating new training data be avoided.