Neuro-symbolic integration for ontology-based classification of structured objects

Reference ontologies play an essential role in organising knowledge in the life sciences and other domains. They are built and maintained manually. Since this is an expensive process, many reference ontologies only cover a small fraction of their domain. We develop techniques that enable the automatic extension of the coverage of a reference ontology by extending it with entities that have not been manually added yet. The extension shall be faithful to the (often implicit) design decisions by the developers of the reference ontology. While this is a generic problem, our use case addresses the Chemical Entities of Biological Interest (ChEBI) ontology with classes of molecules, since the chemical domain is particularly suited to our approach. ChEBI provides annotations that represent the structure of chemical entities (e.g., molecules and functional groups). We show that classical machine learning approaches can outperform ClassyFire, a rule-based system representing the state of the art for the task of classifying new molecules, and is already being used for the extension of ChEBI. Moreover, we develop RoBERTa and Electra transformer neural networks that achieve even better performance. In addition, the axioms of the ontology can be used during the training of prediction models as a form of semantic loss function. Furthermore, we show that ontology pre-training can improve the performance of transformer networks for the task of prediction of toxicity of chemical molecules. Finally, we show that our model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.
Till Mossakowski
Last modified: Tue Nov 13 2024