Dissertations details

  • Implementation for Spatial Data of the Shared Nearest Neighbour with Metric Data Structures
  • Nov 2012
  • The Shared Nearest Neighbour (SNN) is a data clustering algorithm that identifies noise in data and finds clusters with different densities, shapes and sizes. making the SNN a good candidate to deal with spatial data. The SNN time complexity can be a bottleneck in spatial data clustering, since it has a time complexity in the worst case evaluated in O(n2). In this thesis, it is proposed to use metric data structures to index spatial data and support the SNN in querying for the k-nearest neighbours. When dealing with spatial data, the time complexity in the average case of the SNN, using a metric data structure in primary storage (kd-Tree), is improved to at most O(n × log n). Furthermore, using a strategy to reuse the k-nearest neighbours between consecutive runs, it is possible to obtain a time complexity in the worst case of O(n). The experimental results were done using the kd-Tree and the DF-Tree, which work in primary and secondary storage, respectively.
  • Universidade Nova de Lisboa
  • Bruno Faustino
  • João Moura Pires