Skip to main content

1377. sredin seminar: Iztok Savnik: Large-Scale Set Similarity Index

Date of publication: 16. 2. 2026
Computer mathematics seminar (Wednesday seminar)
Wednesday
18
February
Time:
18:00 - 19:45
ID: 869 5394 3473 – Password: 778851
Sreda, 18. februar 2026, od 18:00 do 19:45, po Zoomu v angleščini

Large-Scale Set Similarity Index Iztok Savnik

The seminar presents the set-trie data structure, a trie-based structure adapted for storing and querying sets rather than sequences. The construction of a set-trie relies on a precomputed ordering of set elements that maximizes the number of common prefixes across a collection of sets.

The primary operation supported by the set-trie is set similarity search, where the Hamming distance is used to measure similarity between sets. The search algorithm employs an efficient depth-first traversal constrained by the ordering of set elements. The size of the explored portion of the set-trie depends on the required Hamming distance threshold.

Optimizations of the set-trie data structure include filtering based on set length, partitioning, selection of an optimal element ordering, handling very large sets, and a hardware-accelerated implementation in the C programming language.

The work is the result of a collaboration with Prof. Nikolaus Augsten from the University of Salzburg.


Seznam preteklih seminarjev

PS. Kdor bi rad kaj povedal na naslednjih seminarjih, naj mi sporoči naslov teme in doda kratek povzetek.