Detail

Publication date: 1 de June, 2021

Finding and Parsimoniously Generalizing Fuzzy Clusters using Hierarchical Taxonomies: a Case Study in Data Science

Taxonomies play a fundamental role in structuring concepts in knowledge domains such as Biology, Medical Sciences, Education, and Computer Science.
In this talk an algorithm is presented to lift a fuzzy cluster of topics to higher ranks in a hierarchical taxonomy. The algorithm, PARGen, minimizes a penalty function, balancing the number of introduced ‘head subjects’ and associated errors, ‘gaps’ (false positives) and ‘offshoots’ (false negatives), with proper weights. The result is a parsimonious generalization of the topic cluster in the taxonomy.
The PARGen algorithm is applied to a text collection of 17685 abstracts of research papers published in 17 Springer journals related to Data Science covering a 20 years period (1998-2017). The ground-truth is a hierarchical taxonomy of Data Science (TDS) taken from the 2012 ACM Computing Classification System (ACM-CCS). A discussion will be presented of the methodology to find fuzzy clusters of TDS leaf topics retrieved from text, lift them using PARGen, and find ‘head subjects’ that highlight research tendencies in Data Science.

Presenter


Date 30/10/2019
State Concluded