Research @ PLANET AI


Enhancing PLANETBRAIN every day

Enhancing PLANETBRAIN every day

Recent progress in the areas of Artificial Intelligence (AI) and Machine Learning (ML) are tremendous. Almost monthly, we see reports announcing breakthroughs in different technological aspects of AI.

As an organization focussing on research and development, we can look back on an increasing number of awards, publications, and research projects.


Research Topics

We are pushing the state-of-the-art beyond human performance:

  • Automatic Text Recognition (ATR)

  • Language Modeling (LM)

  • Named-Entity Recognition (NER)

  • Visual Question Answering (VQA)

  • Image Segmentation (IS)

In Detail

Our team is working with and improving technologies such as:

  • Fully convolutional neural networks

  • Attention-based recurrent-free models as well as in combination with recurrent models

  • Graph neural networks (GNN)
  • Neural memory techniques
  • Unsupervised and self-supervised pre-training strategies
  • Improved learning strategies

With its standardized MRI datasets of the entire spine, the German National Cohort (GNC) has the potential to deliver standardized biometric reference values for intervertebral discs (VD), vertebral bodies (VB) and spinal canal (SC). To handle such large-scale big data, artificial intelligence (AI) tools are needed. In this manuscript, we will present an AI software tool to analyze spine MRI and generate normative standard values. 330 representative GNC MRI datasets were randomly selected in equal distribution regarding parameters of age, sex and height. By using a 3D U-Net, an AI algorithm was trained, validated and tested. Finally, the machine learning algorithm explored the full dataset (n = 10,215). VB, VD and SC were successfully segmented and analyzed by using an AI-based algorithm. A software tool was developed to analyze spine-MRI and provide age, sex, and height-matched comparative biometric data. Using an AI algorithm, the reliable segmentation of MRI datasets of the entire spine from the GNC was possible and achieved an excellent agreement with manually segmented datasets. With the analysis of the total GNC MRI dataset with almost 30,000 subjects, it will be possible to generate real normative standard values in the future.

: Felix Streckenbach (University Medical Center Rostock), Gundram Leifert (PLANET AI GmbH), Thomas Beyer (University Medical Center Rostock) et. al.

Journal: Healthcare 2022 (MDPI)


In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we achieve a competitive Character Error Rate (CER) of 2.95% when pretraining our model on synthetic data and including a character-based language model for contemporary English. Compared to other state-of-the-art approaches, our model requires about 10–20 times less parameters. Access our shared implementations via this link to GitHub.

Authors: Christoph Wick (PLANET AI GmbH), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)

Series: DAS 2022 – 15th IAPR International Workshop on Document Analysis Systems

DOI: 10.1007/978-3-031-06555-2_18

Currently, the most widespread neural network architecture for training language models is the so-called BERT, which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a BERT model, the better the results obtained in these NLP tasks. Unfortunately, the memory consumption and the training duration drastically increases with the size of these models. In this article, we investigate various training techniques of smaller BERT models: We combine different methods from other BERT variants, such as ALBERT, RoBERTa, and relative positional encoding. In addition, we propose two new fine-tuning modifications leading to better performance: Class-Start-End tagging and a modified form of Linear Chain Conditional Random Fields. Furthermore, we introduce Whole-Word Attention, which reduces BERTs memory usage and leads to a small increase in performance compared to classical Multi-Head-Attention. We evaluate these techniques on five public German Named Entity Recognition (NER) tasks, of which two are introduced by this article.

Authors: Jochen Zöllner (PLANET AI GmbH, University of Rostock), Konrad Sperfeld (University of Rostock), Christoph Wick (PLANET AI GmbH), Roger Labahn (University of Rostock)

Journal: MDPI Information

DOI: 10.3390/info12110443

In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) around 2% when applied out-of-the-box. Moreover, we show how this model can be further finetuned to specific classes of printings with little manual and computational effort. The mixed or polyfont model is trained on a wide variety of materials, in terms of age (from the 15th to the 19th century), typography (various types of Fraktur and Antiqua), and languages (among others, German, Latin, and French). To optimize the results we combined established techniques of OCR training like pretraining, data augmentation, and voting. In addition, we used various preprocessing methods to enrich the training data and obtain more robust models. We also implemented a two-stage approach which first trains on all available, considerably unbalanced data and then refines the output by training on a selected more balanced subset. Evaluations on 29 previously unseen books resulted in a CER of 1.73%, outperforming a widely used standard model with a CER of 2.84% by almost 40%. Training a more specialized model for some unseen Early Modern Latin books starting from our mixed model led to a CER of 1.47%, an improvement of up to 50% compared to training from scratch and up to 30% compared to training from the aforementioned standard model. Our new mixed model is made openly available to the community.

Authors: Christian Reul (University of Würzburg), Christoph Wick (PLANET AI GmbH), Maximilian Nöth, Andreas Büttner, Maximilian Wehner (all University of Würzburg), Uwe Springmann (LMU München)

Series: ICDAR 2021

Pages: 112 – 126

DOI: 10.1007/978-3-030-86334-0_8

Most recently, Transformers – which are recurrent-free neural network architectures – achieved tremendous performances on various Natural Language Processing (NLP) tasks. Since Transformers represent a traditional Sequence-To-Sequence (S2S)-approach they can be used for several different tasks such as Handwritten Text Recognition (HTR). In this paper, we propose a bidirectional Transformer architecture for line-based HTR that is composed of a Convolutional Neural Network (CNN) for feature extraction and a Transformer-based encoder/decoder, whereby the decoding is performed in reading-order direction and reversed. A voter combines the two predicted sequences to obtain a single result. Our network performed worse compared to a traditional Connectionist Temporal Classification (CTC) approach on the IAM-dataset but reduced the state-of-the-art of Transformers-based approaches by about 25% without using additional data. On a signi cantly larger dataset, the proposed Transformer significantly outperformed our reference model by about 26%. In an error analysis, we show that the Transformer is able to learn a strong language model which explains why a larger training dataset is required to outperform traditional approaches and discuss why Transformers should be used with caution for HTR due to several shortcomings such as repetitions in the text.

Authors: Christoph Wick (PLANET AI GmbH), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)

Series: ICDAR 2021

Pages: 112 – 126

In this paper, we propose a novel method for Automatic Text Recognition (ATR) on early printed books. Our approach significantly reduces the Character Error Rates (CERs) for book-specific training when only a few lines of Ground Truth (GT) are available and considerably outperforms previous methods. An ensemble of models is trained simultaneously by optimising each one independently but also with respect to a fused output obtained by averaging the individual confidence matrices. Various experiments on five early printed books show that this approach already outperforms the current state-of-the-art by up to 20% and 10% on average. Replacing the averaging of the confidence matrices during prediction with a con dence-based voting boosts our results by an additional 8% leading to a total average improvement of about 17%.

Authors: Christoph Wick (PLANET AI GmbH), Christian Reul (University of Würzburg)

Series: ICDAR 2021

Pages: 385 – 399

DOI: 10.1007/978-3-030-86549-8_25


Screening all relevant international research, extracting the essence for PLANETBRAIN and at the same time realizing our own ambitious research projects would never be possible without highly qualified and committed partners.

Additionally, we have been co-funded by the European Union for several years.

CITlab and PLANET AI have been joining their research forces for many years and within more than five large research projects aiming to enhance the state-of-the-art technology in the area of Artificial Intelligence and Cognitive Computing.

Joint workshops, monthly CITnet colloquiums, and frequent technology presentations are some examples of our exciting cooperation.

Research Projects

Doctor AI

… is a revolutionary healthcare solution that utilizes advanced AI technology to enhance the accuracy and efficiency of MRI diagnostic scans.

Doctor AI

IRA Spine visualizes every single result directly in the sectional image of the MRI scan, displays all measured values and relates them graphically to the reference group.

Deviations are marked by traffic light colors and sorted into the three classes “inoffensive”, “hint” and “warning”.

The visualization in the sense of “Explainable AI” supports physicians not only in diagnostics but also in communication with their patients.

Doctor AI

Security Engine

… is a software solution that utilizes AI-powered Object Detection to analyze X-Ray images and enhance threat detection at airports and other secure facilities.


Expanding lists of threats on multiple screens complicate manual baggagge controls at airports or government buildings. Our Security Engine tackles the challenge of monitoring and analyzing large volumes of images. Image Classification determines if there is a threat, whereas Object Detection recognizes the class of threat itself. New categories can be adapted easily, thanks to neural nets.

Publicly Funded Research Projects

Goal: Development and validation of a radiological AI assistance system to support dementia diagnosis

Duration: 3 years

Partner: Arivis AG, DZNE, Institute for Diagnostic and Interventional Radiology, Pediatric- und Neuroradiology

Goal: Real-time artificial intelligence annotation of multimodality endoscopy images in pancreatic cancer, allowing tumor cells to be detected during the examination and treated or removed directly

Duration: 3 years

Partner: PolyDiagnost GmbH, University Medical Center Göttingen, Institute for Diagnostic and Interventional Radiology, Faculty Engineering & Health of the University of Allied Science and Arts

Goal: Holistic view of data from several contexts (user data and control data), which has mostly been collected and processed seperatel up to now, and to evaluate it in a uniform system and software architecture

Duration: 3 years

Partner: EvoLogics GmbH, IAV GmbH, Fraunhofer IGD, University of Rostock, IOW

Goal: Extend existing environmental monitoring methods of aquatic habitats by new innovative analytical methods based on microbial nucleic acids (16S rRNA genes) and freely available environmental nucleic acids (eDNA; 18S rRNA genes) from water samples

Duration: 3 years

Partner: Leibnitz Institute for Baltic Sea Research Warnemünde, IOW, LGC Genomics, Hydrobios, Fraunhofer IGD

Goal: Evaluation of imaging modalities (X-ray, CT, MRI) using an AI assistant with a focus on thoracic scans and reasonable/explainable AI

Duration: 3 years

Partner: University of Rostock