1 | This directory contains several hash-map implementations, similar in
|
---|
2 | API to SGI's hash_map class, but with different performance
|
---|
3 | characteristics. sparse_hash_map uses very little space overhead, 1-2
|
---|
4 | bits per entry. dense_hash_map is very fast, particulary on lookup.
|
---|
5 | (sparse_hash_set and dense_hash_set are the set versions of these
|
---|
6 | routines.) On the other hand, these classes have requirements that
|
---|
7 | may not make them appropriate for all applications.
|
---|
8 |
|
---|
9 | All these implementation use a hashtable with internal quadratic
|
---|
10 | probing. This method is space-efficient -- there is no pointer
|
---|
11 | overhead -- and time-efficient for good hash functions.
|
---|
12 |
|
---|
13 | COMPILING
|
---|
14 | ---------
|
---|
15 | To compile test applications with these classes, run ./configure
|
---|
16 | followed by make. To install these header files on your system, run
|
---|
17 | 'make install'. See INSTALL for more details.
|
---|
18 |
|
---|
19 | USING
|
---|
20 | -----
|
---|
21 | See the html files in the doc directory for small example programs
|
---|
22 | that use these classes. It's enough to just include the header file:
|
---|
23 |
|
---|
24 | #include <google/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
|
---|
25 | std::sparse_hash_set<int, int> number_mapper;
|
---|
26 |
|
---|
27 | and use the class the way you would other hash-map implementations.
|
---|
28 | (Though see "API" below for caveats.)
|
---|
29 |
|
---|
30 | By default (you can change it via a flag to ./configure), these hash
|
---|
31 | implementations are defined in the std namespace.
|
---|
32 |
|
---|
33 | API
|
---|
34 | ---
|
---|
35 | The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
|
---|
36 | dense_hash_set, are a superset of the API of SGI's hash_map class.
|
---|
37 | See doc/sparse_hash_map.html, et al., for more information about the
|
---|
38 | API.
|
---|
39 |
|
---|
40 | The usage of these classes differ from SGI's hash_map, and other
|
---|
41 | hashtable implementations, in the following major ways:
|
---|
42 |
|
---|
43 | 1) dense_hash_map requires you to set aside one key value as the
|
---|
44 | 'empty bucket' value, set via the set_empty_key() method. This
|
---|
45 | *MUST* be called before you can use the dense_hash_map. It is
|
---|
46 | illegal to insert any elements into a dense_hash_map whose key is
|
---|
47 | equal to the empty-key.
|
---|
48 |
|
---|
49 | 2) For both dense_hash_map and sparse_hash_map, if you wish to delete
|
---|
50 | elements from the hashtable, you must set aside a key value as the
|
---|
51 | 'deleted bucket' value, set via the set_deleted_key() method. If
|
---|
52 | your hash-map is insert-only, there is no need to call this
|
---|
53 | method. If you call set_deleted_key(), it is illegal to insert any
|
---|
54 | elements into a dense_hash_map or sparse_hash_map whose key is
|
---|
55 | equal to the deleted-key.
|
---|
56 |
|
---|
57 | 3) Both dense_hash_map and sparse_hash_map must take "plain old
|
---|
58 | data" for the key and value. For keys this is usually true: basic
|
---|
59 | C types and C++ types like string are both acceptable, for
|
---|
60 | instance. For values, this is often true. If it is not, consider
|
---|
61 | storing a pointer to the value in the hash-map, rather than the
|
---|
62 | value itself. This also yields speed improvements, since the
|
---|
63 | hashtable will have less memory to move around when it resizes.
|
---|
64 |
|
---|
65 | 4) These hash-map implementation support I/O. See below.
|
---|
66 |
|
---|
67 | There are also some smaller differences:
|
---|
68 |
|
---|
69 | 1) The constructor takes an optional argument that specifies the
|
---|
70 | number of elements you expect to insert into the hashtable. This
|
---|
71 | differs from SGI's hash_map implementation, which takes an optional
|
---|
72 | number of buckets.
|
---|
73 |
|
---|
74 | 2) erase() does not immediately reclaim memory. As a consequence,
|
---|
75 | erase() does not invalidate any iterators, making loops like this
|
---|
76 | correct:
|
---|
77 | for (it = ht.begin(); it != ht.end(); ++it)
|
---|
78 | if (...) ht.erase(it);
|
---|
79 | As another consequence, a series of erase() calls can leave your
|
---|
80 | hashtable using more memory than it needs to. The hashtable will
|
---|
81 | automatically compact() at the next call to insert(), but to
|
---|
82 | manually compact a hashtable, you can call
|
---|
83 | ht.resize(0)
|
---|
84 |
|
---|
85 | 3) While sparse_hash_map et al. accept an Allocator template argument,
|
---|
86 | they ignore it. They use malloc() and free() for all memory
|
---|
87 | allocations.
|
---|
88 |
|
---|
89 | 4) sparse_hash_map et al. do not use exceptions.
|
---|
90 |
|
---|
91 | I/O
|
---|
92 | ---
|
---|
93 | In addition to the normal hash-map operations, sparse_hash_map can
|
---|
94 | read and write hashtables to disk. (dense_hash_map also has the API,
|
---|
95 | but it has not yet been implemented, and writes will always fail.)
|
---|
96 |
|
---|
97 | In the simplest case, writing a hashtable is as easy as calling two
|
---|
98 | methods on the hashtable:
|
---|
99 | ht.write_metadata(fp);
|
---|
100 | ht.write_nopointer_data(fp);
|
---|
101 |
|
---|
102 | Reading in this data is equally simple:
|
---|
103 | sparse_hash_map<...> ht;
|
---|
104 | ht.read_metadata(fp);
|
---|
105 | ht.read_nopointer_data(fp);
|
---|
106 |
|
---|
107 | The above is sufficient if the key and value do not contain any
|
---|
108 | pointers: they are basic C types or agglomorations of basic C types.
|
---|
109 | If the key and/or value do contain pointers, you can still store the
|
---|
110 | hashtable by replacing write_nopointer_data() with a custom writing
|
---|
111 | routine. See sparse_hash_map.html et al. for more information.
|
---|
112 |
|
---|
113 | SPARSETABLE
|
---|
114 | -----------
|
---|
115 | In addition to the hash-map and hash-set classes, this package also
|
---|
116 | provides sparsetable.h, an array implementation that uses space
|
---|
117 | proportional to the number of elements in the array, rather than the
|
---|
118 | maximum element index. It uses very little space overhead: 1 bit per
|
---|
119 | entry. See sparsetable.html for the API.
|
---|
120 |
|
---|
121 | RESOURCE USAGE
|
---|
122 | --------------
|
---|
123 | * sparse_hash_map has memory overhead of about 2 bits per hash-map
|
---|
124 | entry.
|
---|
125 | * dense_hash_map has a factor of 2-3 memory overhead: if your
|
---|
126 | hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
|
---|
127 | total.
|
---|
128 |
|
---|
129 | Hashtables tend to double in size when resizing, creating an
|
---|
130 | additional 50% space overhead. dense_hash_map does in fact have a
|
---|
131 | significant "high water mark" memory use requirement.
|
---|
132 | sparse_hash_map, however, is written to need very little space
|
---|
133 | overhead when resizing: only a few bits per hashtable entry.
|
---|
134 |
|
---|
135 | PERFORMANCE
|
---|
136 | -----------
|
---|
137 | You can compile and run the included file time_hash_map.cc to examine
|
---|
138 | the performance of sparse_hash_map, dense_hash_map, and your native
|
---|
139 | hash_map implementation on your system. One test against the
|
---|
140 | SGI hash_map implementation gave the following timing information for
|
---|
141 | a simple find() call:
|
---|
142 | SGI hash_map: 22 ns
|
---|
143 | dense_hash_map: 13 ns
|
---|
144 | sparse_hash_map: 117 ns
|
---|
145 | SGI map: 113 ns
|
---|
146 |
|
---|
147 | See doc/performance.html for more detailed charts on resource usage
|
---|
148 | and performance data.
|
---|
149 |
|
---|
150 | ---
|
---|
151 | 16 March 2005
|
---|