Hashing, Load Balancing and Multiple Choice

Author(s): Udi Wieder

Source:
    Journal:Foundations and Trends® in Theoretical Computer Science
    ISSN Print:1551-305X,  ISSN Online:1551-3068
    Publisher:Now Publishers
    Volume 12 Number 3-4,
Pages: 108(275-379)
DOI: 10.1561/0400000070

Abstract:

Many tasks in computer systems could be abstracted as distributing items into buckets, so that the allocation of items across buckets is as balanced as possible, and furthermore, given an item’s identifier it is possible to determine quickly to which bucket it was assigned. A canonical example is a dictionary data structure, where ‘items’ stands for key-value pairs and ‘buckets’ for memory locations. Another example is a distributed key-value store, where the buckets represent locations in disk or even whole servers. A third example may be a distributed execution engine where items represent processes and buckets compute devices, and so on. A common technique in this domain is the use of a hash-function that maps an item into a relatively short fixed length string. The hash function is then used in some way to associate the item to its bucket. The use of a hash function is typically the first step in the solution and additional algorithmic ideas are required to deal with collisions and the imbalance of hash values. In this monograph we survey some of these techniques. We focus on multiple choice schemes where items are placed into buckets via the use of several independent hash functions, and typically an item is placed at the least loaded bucket at the time of placement. We analyze the distributions obtained in detail, and show how these ideas could be used to design basic data structures. With respect to data structures we focus on dictionaries, presenting linear probing, cuckoo hashing and many of their variants.

Full Text