Detail

Publication date: 1 de June, 2021

Measuring the Structural Similarity of Semistructured Documents Using Entropy

We propose a technique for measuring the structural similarity
of semistructured documents based on entropy. After extracting the
structural information from two documents we use either Ziv-Lempel
encoding or Ziv-Merhav crossparsing to determine the entropy and
consequently the similarity between the documents. To the best of
our knowledge, this is the first linear-time approach for evaluating
structural similarity. In an experimental evaluation we
demonstrate that the results of our algorithm in terms of clustering
quality are on a par with or even better than existing approaches.

Presenter

Sven Helmer,

Date 02/03/2007
State Concluded