source: GTP/trunk/Lib/Vis/Preprocessing/src/sparsehash/README @ 2162

Revision 2162, 6.2 KB checked in by mattausch, 18 years ago (diff)

improved hash performance with google hashmap

Line 
1This directory contains several hash-map implementations, similar in
2API to SGI's hash_map class, but with different performance
3characteristics.  sparse_hash_map uses very little space overhead, 1-2
4bits per entry.  dense_hash_map is very fast, particulary on lookup.
5(sparse_hash_set and dense_hash_set are the set versions of these
6routines.)  On the other hand, these classes have requirements that
7may not make them appropriate for all applications.
8
9All these implementation use a hashtable with internal quadratic
10probing.  This method is space-efficient -- there is no pointer
11overhead -- and time-efficient for good hash functions.
12
13COMPILING
14---------
15To compile test applications with these classes, run ./configure
16followed by make.  To install these header files on your system, run
17'make install'.  See INSTALL for more details.
18
19USING
20-----
21See the html files in the doc directory for small example programs
22that use these classes.  It's enough to just include the header file:
23
24   #include <google/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
25   std::sparse_hash_set<int, int> number_mapper;
26
27and use the class the way you would other hash-map implementations.
28(Though see "API" below for caveats.)
29
30By default (you can change it via a flag to ./configure), these hash
31implementations are defined in the std namespace.
32
33API
34---
35The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
36dense_hash_set, are a superset of the API of SGI's hash_map class.
37See doc/sparse_hash_map.html, et al., for more information about the
38API.
39
40The usage of these classes differ from SGI's hash_map, and other
41hashtable implementations, in the following major ways:
42
431) dense_hash_map requires you to set aside one key value as the
44   'empty bucket' value, set via the set_empty_key() method.  This
45   *MUST* be called before you can use the dense_hash_map.  It is
46   illegal to insert any elements into a dense_hash_map whose key is
47   equal to the empty-key.
48
492) For both dense_hash_map and sparse_hash_map, if you wish to delete
50   elements from the hashtable, you must set aside a key value as the
51   'deleted bucket' value, set via the set_deleted_key() method.  If
52   your hash-map is insert-only, there is no need to call this
53   method.  If you call set_deleted_key(), it is illegal to insert any
54   elements into a dense_hash_map or sparse_hash_map whose key is
55   equal to the deleted-key.
56
573) Both dense_hash_map and sparse_hash_map must take "plain old
58   data" for the key and value.  For keys this is usually true: basic
59   C types and C++ types like string are both acceptable, for
60   instance.  For values, this is often true.  If it is not, consider
61   storing a pointer to the value in the hash-map, rather than the
62   value itself.  This also yields speed improvements, since the
63   hashtable will have less memory to move around when it resizes.
64
654) These hash-map implementation support I/O.  See below.
66
67There are also some smaller differences:
68
691) The constructor takes an optional argument that specifies the
70   number of elements you expect to insert into the hashtable.  This
71   differs from SGI's hash_map implementation, which takes an optional
72   number of buckets.
73
742) erase() does not immediately reclaim memory.  As a consequence,
75   erase() does not invalidate any iterators, making loops like this
76   correct:
77      for (it = ht.begin(); it != ht.end(); ++it)
78        if (...) ht.erase(it);
79   As another consequence, a series of erase() calls can leave your
80   hashtable using more memory than it needs to.  The hashtable will
81   automatically compact() at the next call to insert(), but to
82   manually compact a hashtable, you can call
83      ht.resize(0)
84
853) While sparse_hash_map et al. accept an Allocator template argument,
86   they ignore it.  They use malloc() and free() for all memory
87   allocations.
88
894) sparse_hash_map et al. do not use exceptions.
90
91I/O
92---
93In addition to the normal hash-map operations, sparse_hash_map can
94read and write hashtables to disk.  (dense_hash_map also has the API,
95but it has not yet been implemented, and writes will always fail.)
96
97In the simplest case, writing a hashtable is as easy as calling two
98methods on the hashtable:
99   ht.write_metadata(fp);
100   ht.write_nopointer_data(fp);
101
102Reading in this data is equally simple:
103   sparse_hash_map<...> ht;
104   ht.read_metadata(fp);
105   ht.read_nopointer_data(fp);
106
107The above is sufficient if the key and value do not contain any
108pointers: they are basic C types or agglomorations of basic C types.
109If the key and/or value do contain pointers, you can still store the
110hashtable by replacing write_nopointer_data() with a custom writing
111routine.  See sparse_hash_map.html et al. for more information.
112
113SPARSETABLE
114-----------
115In addition to the hash-map and hash-set classes, this package also
116provides sparsetable.h, an array implementation that uses space
117proportional to the number of elements in the array, rather than the
118maximum element index.  It uses very little space overhead: 1 bit per
119entry.  See sparsetable.html for the API.
120
121RESOURCE USAGE
122--------------
123* sparse_hash_map has memory overhead of about 2 bits per hash-map
124  entry.
125* dense_hash_map has a factor of 2-3 memory overhead: if your
126  hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
127  total.
128
129Hashtables tend to double in size when resizing, creating an
130additional 50% space overhead.  dense_hash_map does in fact have a
131significant "high water mark" memory use requirement.
132sparse_hash_map, however, is written to need very little space
133overhead when resizing: only a few bits per hashtable entry.
134
135PERFORMANCE
136-----------
137You can compile and run the included file time_hash_map.cc to examine
138the performance of sparse_hash_map, dense_hash_map, and your native
139hash_map implementation on your system.  One test against the
140SGI hash_map implementation gave the following timing information for
141a simple find() call:
142   SGI hash_map:     22 ns
143   dense_hash_map:   13 ns
144   sparse_hash_map: 117 ns
145   SGI map:         113 ns
146
147See doc/performance.html for more detailed charts on resource usage
148and performance data.
149
150---
15116 March 2005
Note: See TracBrowser for help on using the repository browser.