Publications
Publications
Enhancing PLANETBRAIN every day
Enhancing PLANETBRAIN every day
Recent progress in the areas of Artificial Intelligence (AI) and Machine Learning (ML) are tremendous. Almost monthly, we see reports announcing breakthroughs in different technological aspects of AI.
As an organization focussing on research and development, we can look back on an increasing number of publications.
PUBLICATIONS
With its standardized MRI datasets of the entire spine, the German National Cohort (GNC) has the potential to deliver standardized biometric reference values for intervertebral discs (VD), vertebral bodies (VB) and spinal canal (SC). To handle such large-scale big data, artificial intelligence (AI) tools are needed. In this manuscript, we will present an AI software tool to analyze spine MRI and generate normative standard values. 330 representative GNC MRI datasets were randomly selected in equal distribution regarding parameters of age, sex and height. By using a 3D U-Net, an AI algorithm was trained, validated and tested. Finally, the machine learning algorithm explored the full dataset (n = 10,215). VB, VD and SC were successfully segmented and analyzed by using an AI-based algorithm. A software tool was developed to analyze spine-MRI and provide age, sex, and height-matched comparative biometric data. Using an AI algorithm, the reliable segmentation of MRI datasets of the entire spine from the GNC was possible and achieved an excellent agreement with manually segmented datasets. With the analysis of the total GNC MRI dataset with almost 30,000 subjects, it will be possible to generate real normative standard values in the future.
Authors: Felix Streckenbach (University Medical Center Rostock), Gundram Leifert (PLANET AI GmbH), Thomas Beyer (University Medical Center Rostock) et. al.
Journal: Healthcare 2022 (MDPI)
In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we achieve a competitive Character Error Rate (CER) of 2.95% when pretraining our model on synthetic data and including a character-based language model for contemporary English. Compared to other state-of-the-art approaches, our model requires about 10–20 times less parameters. Access our shared implementations via this link to GitHub.
Authors: Christoph Wick (PLANET AI GmbH), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)
Series: DAS 2022 – 15th IAPR International Workshop on Document Analysis Systems
DOI: 10.1007/978-3-031-06555-2_18
Currently, the most widespread neural network architecture for training language models is the so-called BERT, which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a BERT model, the better the results obtained in these NLP tasks. Unfortunately, the memory consumption and the training duration drastically increases with the size of these models. In this article, we investigate various training techniques of smaller BERT models: We combine different methods from other BERT variants, such as ALBERT, RoBERTa, and relative positional encoding. In addition, we propose two new fine-tuning modifications leading to better performance: Class-Start-End tagging and a modified form of Linear Chain Conditional Random Fields. Furthermore, we introduce Whole-Word Attention, which reduces BERTs memory usage and leads to a small increase in performance compared to classical Multi-Head-Attention. We evaluate these techniques on five public German Named Entity Recognition (NER) tasks, of which two are introduced by this article.
Authors: Jochen Zöllner (PLANET AI GmbH, University of Rostock), Konrad Sperfeld (University of Rostock), Christoph Wick (PLANET AI GmbH), Roger Labahn (University of Rostock)
Journal: MDPI Information
DOI: 10.3390/info12110443
In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) around 2% when applied out-of-the-box. Moreover, we show how this model can be further finetuned to specific classes of printings with little manual and computational effort. The mixed or polyfont model is trained on a wide variety of materials, in terms of age (from the 15th to the 19th century), typography (various types of Fraktur and Antiqua), and languages (among others, German, Latin, and French). To optimize the results we combined established techniques of OCR training like pretraining, data augmentation, and voting. In addition, we used various preprocessing methods to enrich the training data and obtain more robust models. We also implemented a two-stage approach which first trains on all available, considerably unbalanced data and then refines the output by training on a selected more balanced subset. Evaluations on 29 previously unseen books resulted in a CER of 1.73%, outperforming a widely used standard model with a CER of 2.84% by almost 40%. Training a more specialized model for some unseen Early Modern Latin books starting from our mixed model led to a CER of 1.47%, an improvement of up to 50% compared to training from scratch and up to 30% compared to training from the aforementioned standard model. Our new mixed model is made openly available to the community.
Authors: Christian Reul (University of Würzburg), Christoph Wick (PLANET AI GmbH), Maximilian Nöth, Andreas Büttner, Maximilian Wehner (all University of Würzburg), Uwe Springmann (LMU München)
Series: ICDAR 2021
Pages: 112 – 126
DOI: 10.1007/978-3-030-86334-0_8
Most recently, Transformers – which are recurrent-free neural network architectures – achieved tremendous performances on various Natural Language Processing (NLP) tasks. Since Transformers represent a traditional Sequence-To-Sequence (S2S)-approach they can be used for several different tasks such as Handwritten Text Recognition (HTR). In this paper, we propose a bidirectional Transformer architecture for line-based HTR that is composed of a Convolutional Neural Network (CNN) for feature extraction and a Transformer-based encoder/decoder, whereby the decoding is performed in reading-order direction and reversed. A voter combines the two predicted sequences to obtain a single result. Our network performed worse compared to a traditional Connectionist Temporal Classification (CTC) approach on the IAM-dataset but reduced the state-of-the-art of Transformers-based approaches by about 25% without using additional data. On a signi cantly larger dataset, the proposed Transformer significantly outperformed our reference model by about 26%. In an error analysis, we show that the Transformer is able to learn a strong language model which explains why a larger training dataset is required to outperform traditional approaches and discuss why Transformers should be used with caution for HTR due to several shortcomings such as repetitions in the text.
Authors: Christoph Wick (PLANET AI GmbH), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)
Series: ICDAR 2021
Pages: 112 – 126
In this paper, we propose a novel method for Automatic Text Recognition (ATR) on early printed books. Our approach significantly reduces the Character Error Rates (CERs) for book-specific training when only a few lines of Ground Truth (GT) are available and considerably outperforms previous methods. An ensemble of models is trained simultaneously by optimising each one independently but also with respect to a fused output obtained by averaging the individual confidence matrices. Various experiments on five early printed books show that this approach already outperforms the current state-of-the-art by up to 20% and 10% on average. Replacing the averaging of the confidence matrices during prediction with a con dence-based voting boosts our results by an additional 8% leading to a total average improvement of about 17%.
Authors: Christoph Wick (PLANET AI GmbH), Christian Reul (University of Würzburg)
Series: ICDAR 2021
Pages: 385 – 399
DOI: 10.1007/978-3-030-86549-8_25
Authors: Christoph Wick, Benjamin Kühn, Gundram Leifert (all PLANET AI GmbH), Konrad Sperfeld (CITlab, University of Rostock), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)
Journal: The Journal of Open Source Software (JOSS)
DOI: 10.21105/joss.03297
Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.
Authors: Gundram Leifert (PLANET AI GmbH), Joan Andreu Sànchez (Pattern Recognition and Human Language Technologies Center), Roger Labahn (Computational Intelligence Technology Lab)
Series: ICFHR ’20
Pages: To appear
Note: This work was partially funded by the Generalitat Valenciana under the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025 “Sistemas de fabricación inteligente para la indústria 4.0”. | in proceeding
Authors: Johannes Michael, Roger Labahn, Tobias Grüning, Jochen Zöllner
Booktitle: Proceedings of the 2019 15th International Conference on Document Analysis and Recognition
Series: ICDAR ’19
Pages: To appear
Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding
Authors: Gundram Leifert, Roger Labahn, Tobias Grüning, Svenja Leifert
Booktitle: Proceedings of the 2019 15th International Conference on Document Analysis and Recognition
Series: ICDAR ’19
Pages: To appear
Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding
We present a recognition and retrieval system for the ICDAR2017 Competition on Information Extraction in Historical Handwritten Records which successfully infers person names and other data from marriage records. The system extracts information from the line images with a high accuracy and outperforms the baseline. The optical model is based on Neural Networks. To infer the desired information, regular expressions are used to describe the set of feasible words sequences.
Authors: Tobias Strauß, Max Weidemann, Johannes Michael, Gundram Leifert, Tobias Grüning, Roger Labahn
Journal: CoRR
Volume: abs/1804.09943
Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ)
Accessibility of the valuable cultural heritage which is hidden in countless scanned historical documents is the motivation for the presented dissertation. The developed (fully automatic) text line extraction methodology combines state-of-the-art machine learning techniques and modern image processing methods. It demonstrates its quality by outperforming several other approaches on a couple of benchmarking datasets. The method is already being used by a wide audience of researchers from different disciplines and thus contributes its (small) part to the aforementioned goal.
Author: Tobias Grüning
Type: PhD thesis
School: Universität Rostock
Author: Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, Stefan Fiel
Booktitle: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
Pages: 351-356
Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding
DOI: 10.1109/DAS.2018.38
Authors: Tobias Grüning, Gundram Leifert, Tobias Strauß, Roger Labahn
Booktitle: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
Volume: 01
Pages: 351-356
Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | inproceeding
DOI: 10.1109/ICDAR.2017.47
Author: Tobias Strauß
Type: PhD thesis
School: Universität Rostock
Authors: Tobias Grüning, Gundram Leifert, Tobias Strauß, Roger Labahn
Booktitle: CLEF2016 Working Notes
Series: CEUR Workshop Proceedings
Publisher: CEUR-WS.org
Pages: 351-356
Note: Partially funded by grant no. KF2622304SS3 (Kooperationsprojekt) in Zentrales Innovationsprogramm Mittelstand (ZIM) by Bundesrepublik Deutschland (BMWi) and the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding
DOI: 10.1109/ICDAR.2017.47
Authors: Gundram Leifert, Tobias Strauß, Tobias Grüning, Welf Wustlich, Roger Labahn
Journal: Journal of Machine Learning Research
Volume: 17
Number: 97
Pages: 1-37
Authors: Tobias Strauß, Gundram Leifert, Tobias Grüning, Roger Labahn
Journal: Neural Networks
Volume: 79
Pages: 1 – 11
Note: Partially funded by grant no. KF2622304SS3 (Kooperationsprojekt) in Zentrales Innovationsprogramm Mittelstand (ZIM) by Bundesrepublik Deutschland (BMWi)
Authors: Gundram Leifert, Tobias Strauß, Tobias Grüning, Roger Labahn
Journal: CoRR
Volume: abs/1605.08412
Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ)
Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Roger Labahn
Journal: CoRR
Volume: abs/1412.3949
Note: Partially funded by research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds
Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Roger Labahn
Journal: CoRR
Volume: abs/1412.6012
Note: Partially funded by research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds
Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Roger Labahn
Journal: CoRR
Volume: abs/1412.6061
Note: Partially funded by research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds
This article develops approaches to generate dynamical reservoirs of echo state networks with desired properties reducing the amount of randomness. It is possible to create weight matrices with a predefined singular value spectrum. The procedure guarantees stability (echo state property). We prove the minimization of the impact of noise on the training process. The resulting reservoir types are strongly related to reservoirs already known in the literature. Our experiments show that well-chosen input weights can improve performance.
Authors: Tobias Strauß, Welf Wustlich, Roger Labahn
Journal: Neural Computation
Volume: 24
Number: 12
Pages: 3246-3276
Note: Partially funded by the research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds