[2162] | 1 | This directory contains several hash-map implementations, similar in
|
---|
| 2 | API to SGI's hash_map class, but with different performance
|
---|
| 3 | characteristics. sparse_hash_map uses very little space overhead, 1-2
|
---|
| 4 | bits per entry. dense_hash_map is very fast, particulary on lookup.
|
---|
| 5 | (sparse_hash_set and dense_hash_set are the set versions of these
|
---|
| 6 | routines.) On the other hand, these classes have requirements that
|
---|
| 7 | may not make them appropriate for all applications.
|
---|
| 8 |
|
---|
| 9 | All these implementation use a hashtable with internal quadratic
|
---|
| 10 | probing. This method is space-efficient -- there is no pointer
|
---|
| 11 | overhead -- and time-efficient for good hash functions.
|
---|
| 12 |
|
---|
| 13 | COMPILING
|
---|
| 14 | ---------
|
---|
| 15 | To compile test applications with these classes, run ./configure
|
---|
| 16 | followed by make. To install these header files on your system, run
|
---|
| 17 | 'make install'. See INSTALL for more details.
|
---|
| 18 |
|
---|
| 19 | USING
|
---|
| 20 | -----
|
---|
| 21 | See the html files in the doc directory for small example programs
|
---|
| 22 | that use these classes. It's enough to just include the header file:
|
---|
| 23 |
|
---|
| 24 | #include <google/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
|
---|
| 25 | std::sparse_hash_set<int, int> number_mapper;
|
---|
| 26 |
|
---|
| 27 | and use the class the way you would other hash-map implementations.
|
---|
| 28 | (Though see "API" below for caveats.)
|
---|
| 29 |
|
---|
| 30 | By default (you can change it via a flag to ./configure), these hash
|
---|
| 31 | implementations are defined in the std namespace.
|
---|
| 32 |
|
---|
| 33 | API
|
---|
| 34 | ---
|
---|
| 35 | The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
|
---|
| 36 | dense_hash_set, are a superset of the API of SGI's hash_map class.
|
---|
| 37 | See doc/sparse_hash_map.html, et al., for more information about the
|
---|
| 38 | API.
|
---|
| 39 |
|
---|
| 40 | The usage of these classes differ from SGI's hash_map, and other
|
---|
| 41 | hashtable implementations, in the following major ways:
|
---|
| 42 |
|
---|
| 43 | 1) dense_hash_map requires you to set aside one key value as the
|
---|
| 44 | 'empty bucket' value, set via the set_empty_key() method. This
|
---|
| 45 | *MUST* be called before you can use the dense_hash_map. It is
|
---|
| 46 | illegal to insert any elements into a dense_hash_map whose key is
|
---|
| 47 | equal to the empty-key.
|
---|
| 48 |
|
---|
| 49 | 2) For both dense_hash_map and sparse_hash_map, if you wish to delete
|
---|
| 50 | elements from the hashtable, you must set aside a key value as the
|
---|
| 51 | 'deleted bucket' value, set via the set_deleted_key() method. If
|
---|
| 52 | your hash-map is insert-only, there is no need to call this
|
---|
| 53 | method. If you call set_deleted_key(), it is illegal to insert any
|
---|
| 54 | elements into a dense_hash_map or sparse_hash_map whose key is
|
---|
| 55 | equal to the deleted-key.
|
---|
| 56 |
|
---|
| 57 | 3) Both dense_hash_map and sparse_hash_map must take "plain old
|
---|
| 58 | data" for the key and value. For keys this is usually true: basic
|
---|
| 59 | C types and C++ types like string are both acceptable, for
|
---|
| 60 | instance. For values, this is often true. If it is not, consider
|
---|
| 61 | storing a pointer to the value in the hash-map, rather than the
|
---|
| 62 | value itself. This also yields speed improvements, since the
|
---|
| 63 | hashtable will have less memory to move around when it resizes.
|
---|
| 64 |
|
---|
| 65 | 4) These hash-map implementation support I/O. See below.
|
---|
| 66 |
|
---|
| 67 | There are also some smaller differences:
|
---|
| 68 |
|
---|
| 69 | 1) The constructor takes an optional argument that specifies the
|
---|
| 70 | number of elements you expect to insert into the hashtable. This
|
---|
| 71 | differs from SGI's hash_map implementation, which takes an optional
|
---|
| 72 | number of buckets.
|
---|
| 73 |
|
---|
| 74 | 2) erase() does not immediately reclaim memory. As a consequence,
|
---|
| 75 | erase() does not invalidate any iterators, making loops like this
|
---|
| 76 | correct:
|
---|
| 77 | for (it = ht.begin(); it != ht.end(); ++it)
|
---|
| 78 | if (...) ht.erase(it);
|
---|
| 79 | As another consequence, a series of erase() calls can leave your
|
---|
| 80 | hashtable using more memory than it needs to. The hashtable will
|
---|
| 81 | automatically compact() at the next call to insert(), but to
|
---|
| 82 | manually compact a hashtable, you can call
|
---|
| 83 | ht.resize(0)
|
---|
| 84 |
|
---|
| 85 | 3) While sparse_hash_map et al. accept an Allocator template argument,
|
---|
| 86 | they ignore it. They use malloc() and free() for all memory
|
---|
| 87 | allocations.
|
---|
| 88 |
|
---|
| 89 | 4) sparse_hash_map et al. do not use exceptions.
|
---|
| 90 |
|
---|
| 91 | I/O
|
---|
| 92 | ---
|
---|
| 93 | In addition to the normal hash-map operations, sparse_hash_map can
|
---|
| 94 | read and write hashtables to disk. (dense_hash_map also has the API,
|
---|
| 95 | but it has not yet been implemented, and writes will always fail.)
|
---|
| 96 |
|
---|
| 97 | In the simplest case, writing a hashtable is as easy as calling two
|
---|
| 98 | methods on the hashtable:
|
---|
| 99 | ht.write_metadata(fp);
|
---|
| 100 | ht.write_nopointer_data(fp);
|
---|
| 101 |
|
---|
| 102 | Reading in this data is equally simple:
|
---|
| 103 | sparse_hash_map<...> ht;
|
---|
| 104 | ht.read_metadata(fp);
|
---|
| 105 | ht.read_nopointer_data(fp);
|
---|
| 106 |
|
---|
| 107 | The above is sufficient if the key and value do not contain any
|
---|
| 108 | pointers: they are basic C types or agglomorations of basic C types.
|
---|
| 109 | If the key and/or value do contain pointers, you can still store the
|
---|
| 110 | hashtable by replacing write_nopointer_data() with a custom writing
|
---|
| 111 | routine. See sparse_hash_map.html et al. for more information.
|
---|
| 112 |
|
---|
| 113 | SPARSETABLE
|
---|
| 114 | -----------
|
---|
| 115 | In addition to the hash-map and hash-set classes, this package also
|
---|
| 116 | provides sparsetable.h, an array implementation that uses space
|
---|
| 117 | proportional to the number of elements in the array, rather than the
|
---|
| 118 | maximum element index. It uses very little space overhead: 1 bit per
|
---|
| 119 | entry. See sparsetable.html for the API.
|
---|
| 120 |
|
---|
| 121 | RESOURCE USAGE
|
---|
| 122 | --------------
|
---|
| 123 | * sparse_hash_map has memory overhead of about 2 bits per hash-map
|
---|
| 124 | entry.
|
---|
| 125 | * dense_hash_map has a factor of 2-3 memory overhead: if your
|
---|
| 126 | hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
|
---|
| 127 | total.
|
---|
| 128 |
|
---|
| 129 | Hashtables tend to double in size when resizing, creating an
|
---|
| 130 | additional 50% space overhead. dense_hash_map does in fact have a
|
---|
| 131 | significant "high water mark" memory use requirement.
|
---|
| 132 | sparse_hash_map, however, is written to need very little space
|
---|
| 133 | overhead when resizing: only a few bits per hashtable entry.
|
---|
| 134 |
|
---|
| 135 | PERFORMANCE
|
---|
| 136 | -----------
|
---|
| 137 | You can compile and run the included file time_hash_map.cc to examine
|
---|
| 138 | the performance of sparse_hash_map, dense_hash_map, and your native
|
---|
| 139 | hash_map implementation on your system. One test against the
|
---|
| 140 | SGI hash_map implementation gave the following timing information for
|
---|
| 141 | a simple find() call:
|
---|
| 142 | SGI hash_map: 22 ns
|
---|
| 143 | dense_hash_map: 13 ns
|
---|
| 144 | sparse_hash_map: 117 ns
|
---|
| 145 | SGI map: 113 ns
|
---|
| 146 |
|
---|
| 147 | See doc/performance.html for more detailed charts on resource usage
|
---|
| 148 | and performance data.
|
---|
| 149 |
|
---|
| 150 | ---
|
---|
| 151 | 16 March 2005
|
---|