Neuro-symbolic integration for ontology-based classification of
structured objects
Reference ontologies play an essential role in organising knowledge in
the life sciences and other domains. They are built and maintained
manually. Since this is an expensive process, many reference ontologies
only cover a small fraction of their domain. We develop techniques that
enable the automatic extension of the coverage of a reference ontology
by extending it with entities that have not been manually added yet. The
extension shall be faithful to the (often implicit) design decisions by
the developers of the reference ontology. While this is a generic
problem, our use case addresses the Chemical Entities of Biological
Interest (ChEBI) ontology with classes of molecules, since the chemical
domain is particularly suited to our approach. ChEBI provides
annotations that represent the structure of chemical entities (e.g.,
molecules and functional groups).
We show that classical machine learning approaches can outperform
ClassyFire, a rule-based system representing the state of the art for
the task of classifying new molecules, and is already being used for the
extension of ChEBI. Moreover, we develop RoBERTa and Electra transformer
neural networks that achieve even better performance. In addition, the
axioms of the ontology can be used during the training of prediction
models as a form of semantic loss function. Furthermore, we show that
ontology pre-training can improve the performance of transformer
networks for the task of prediction of toxicity of chemical molecules.
Finally, we show that our model learns to focus attention on more
meaningful chemical groups when making predictions with ontology
pre-training than without, paving a path towards greater robustness and
interpretability. This strategy has general applicability as a
neuro-symbolic approach to embed meaningful semantics into neural networks.
Till Mossakowski
Last modified: Tue Nov 13 2024