Extracting information from medical publications is a demanding task that takes a lot of time. A tool that supports this task and helps us to significantly reduce the effort would be a great improvement.

Wörwag Pharma

Contact at the AI Innovation Center

Janina Bierkandt

Information extraction from medical publications

Quick Check

Initial situation

When preparing approval documents (core dossiers), specific information from medical studies must be compiled. This information includes the study design, the purpose of the study, the indication, the drugs used, the size of the participant group, the study results and the conclusion. Various terms and formulations are used in medical studies. Therefore, extracting this information from the documents requires expert knowledge in the pharmaceutical field and takes a lot of time. Tools that support the extraction process and can simplify and accelerate it are desirable.

Solution idea

In the Quick Check, a common understanding of the documents and the information they contain was first developed. Test documents were used to examine the extraction features and identify formulations that provide indications of certain feature values. Attention was also paid to the extent to which the document structure can be used to determine the best candidates for feature values. Furthermore, it was discussed how the extraction functions can be made available to users and how the extracted information should be stored in order to achieve the greatest possible benefit.

Benefit

The results achieved in the quick check show that it is possible to extract the desired information from the documents with the available AI approaches in order to support the medical writers in their tasks. The potential benefits should be evaluated in more detail using a larger volume of documents. In the first step, the aim should not be to extract the required features fully automatically. Instead, an assistance function with a correction option should be implemented. The correction data can later be used to further train the AI and achieve better results.

Implementation of the AI application

The existing methods for feature extraction were evaluated and the most suitable methods were selected for each feature. AI models were created for the features study design, medication and indication, adapted using freely available language models and made available for initial tests in Fraunhofer IAO's existing Thorpedo software framework. The AI models were combined with rule-based approaches (e.g. keyword lists) to create a hybrid approach in order to identify the relevant sections and sentences for the individual features.