1377. sredin seminar: Iztok Savnik: Large-Scale Set Similarity Index
Large-Scale Set Similarity Index Iztok Savnik
The seminar presents the set-trie data structure, a trie-based structure adapted for storing and querying sets rather than sequences. The construction of a set-trie relies on a precomputed ordering of set elements that maximizes the number of common prefixes across a collection of sets.
The primary operation supported by the set-trie is set similarity search, where the Hamming distance is used to measure similarity between sets. The search algorithm employs an efficient depth-first traversal constrained by the ordering of set elements. The size of the explored portion of the set-trie depends on the required Hamming distance threshold.
Optimizations of the set-trie data structure include filtering based on set length, partitioning, selection of an optimal element ordering, handling very large sets, and a hardware-accelerated implementation in the C programming language.
The work is the result of a collaboration with Prof. Nikolaus Augsten from the University of Salzburg.
PS. Kdor bi rad kaj povedal na naslednjih seminarjih, naj mi sporoči naslov teme in doda kratek povzetek.