|
|
|
|
|
Hashing, Load Balancing and Multiple Choice
Author(s): Udi Wieder
Source: Journal:Foundations and Trends® in Theoretical Computer Science ISSN Print:1551-305X, ISSN Online:1551-3068 Publisher:Now Publishers Volume 12 Number 3-4, Pages: 108(275-379) DOI: 10.1561/0400000070
Abstract:
Many tasks in computer systems could be abstracted as distributing
items into buckets, so that the allocation of items across buckets is as
balanced as possible, and furthermore, given an item’s identifier it is
possible to determine quickly to which bucket it was assigned. A canonical
example is a dictionary data structure, where ‘items’ stands for
key-value pairs and ‘buckets’ for memory locations. Another example
is a distributed key-value store, where the buckets represent locations
in disk or even whole servers. A third example may be a distributed
execution engine where items represent processes and buckets compute
devices, and so on. A common technique in this domain is the use of
a hash-function that maps an item into a relatively short fixed length
string. The hash function is then used in some way to associate the
item to its bucket. The use of a hash function is typically the first step
in the solution and additional algorithmic ideas are required to deal
with collisions and the imbalance of hash values. In this monograph we
survey some of these techniques. We focus on multiple choice schemes
where items are placed into buckets via the use of several independent
hash functions, and typically an item is placed at the least loaded
bucket at the time of placement. We analyze the distributions obtained
in detail, and show how these ideas could be used to design basic data
structures. With respect to data structures we focus on dictionaries,
presenting linear probing, cuckoo hashing and many of their variants.
|
|
|
|