World Logic Day | Logic-based Explanations for Neural Networks
Neural networks have been the key to solve a variety of different problems. However, neural network models are still regarded as black boxes, since they do not provide any human-interpretable evidence as to why they output a certain result. In this talk, we will explore a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Through a series of experiments, we discuss how to map the internal state of a neural network to the human-defined concepts, examine whether the results obtained by the established mappings match our understanding of the mapped concepts, and analyse the fidelity of the resulting theory and how it can be used to generate symbolic justifications for the output of neural network models.
This work was carried out in collaboration with Manuel de Sousa Ribeiro, João Ferreira, and Ricardo Gonçalves.