Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

README @ 2162

Revision 2162, 6.2 KB checked in by mattausch, 18 years ago (diff)
improved hash performance with google hashmap

Line
1	This directory contains several hash-map implementations, similar in
2	API to SGI's hash_map class, but with different performance
3	characteristics. sparse_hash_map uses very little space overhead, 1-2
4	bits per entry. dense_hash_map is very fast, particulary on lookup.
5	(sparse_hash_set and dense_hash_set are the set versions of these
6	routines.) On the other hand, these classes have requirements that
7	may not make them appropriate for all applications.
8
9	All these implementation use a hashtable with internal quadratic
10	probing. This method is space-efficient -- there is no pointer
11	overhead -- and time-efficient for good hash functions.
12
13	COMPILING
14	---------
15	To compile test applications with these classes, run ./configure
16	followed by make. To install these header files on your system, run
17	'make install'. See INSTALL for more details.
18
19	USING
20	-----
21	See the html files in the doc directory for small example programs
22	that use these classes. It's enough to just include the header file:
23
24	#include <google/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
25	std::sparse_hash_set<int, int> number_mapper;
26
27	and use the class the way you would other hash-map implementations.
28	(Though see "API" below for caveats.)
29
30	By default (you can change it via a flag to ./configure), these hash
31	implementations are defined in the std namespace.
32
33	API
34	---
35	The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
36	dense_hash_set, are a superset of the API of SGI's hash_map class.
37	See doc/sparse_hash_map.html, et al., for more information about the
38	API.
39
40	The usage of these classes differ from SGI's hash_map, and other
41	hashtable implementations, in the following major ways:
42
43	1) dense_hash_map requires you to set aside one key value as the
44	'empty bucket' value, set via the set_empty_key() method. This
45	MUST be called before you can use the dense_hash_map. It is
46	illegal to insert any elements into a dense_hash_map whose key is
47	equal to the empty-key.
48
49	2) For both dense_hash_map and sparse_hash_map, if you wish to delete
50	elements from the hashtable, you must set aside a key value as the
51	'deleted bucket' value, set via the set_deleted_key() method. If
52	your hash-map is insert-only, there is no need to call this
53	method. If you call set_deleted_key(), it is illegal to insert any
54	elements into a dense_hash_map or sparse_hash_map whose key is
55	equal to the deleted-key.
56
57	3) Both dense_hash_map and sparse_hash_map must take "plain old
58	data" for the key and value. For keys this is usually true: basic
59	C types and C++ types like string are both acceptable, for
60	instance. For values, this is often true. If it is not, consider
61	storing a pointer to the value in the hash-map, rather than the
62	value itself. This also yields speed improvements, since the
63	hashtable will have less memory to move around when it resizes.
64
65	4) These hash-map implementation support I/O. See below.
66
67	There are also some smaller differences:
68
69	1) The constructor takes an optional argument that specifies the
70	number of elements you expect to insert into the hashtable. This
71	differs from SGI's hash_map implementation, which takes an optional
72	number of buckets.
73
74	2) erase() does not immediately reclaim memory. As a consequence,
75	erase() does not invalidate any iterators, making loops like this
76	correct:
77	for (it = ht.begin(); it != ht.end(); ++it)
78	if (...) ht.erase(it);
79	As another consequence, a series of erase() calls can leave your
80	hashtable using more memory than it needs to. The hashtable will
81	automatically compact() at the next call to insert(), but to
82	manually compact a hashtable, you can call
83	ht.resize(0)
84
85	3) While sparse_hash_map et al. accept an Allocator template argument,
86	they ignore it. They use malloc() and free() for all memory
87	allocations.
88
89	4) sparse_hash_map et al. do not use exceptions.
90
91	I/O
92	---
93	In addition to the normal hash-map operations, sparse_hash_map can
94	read and write hashtables to disk. (dense_hash_map also has the API,
95	but it has not yet been implemented, and writes will always fail.)
96
97	In the simplest case, writing a hashtable is as easy as calling two
98	methods on the hashtable:
99	ht.write_metadata(fp);
100	ht.write_nopointer_data(fp);
101
102	Reading in this data is equally simple:
103	sparse_hash_map<...> ht;
104	ht.read_metadata(fp);
105	ht.read_nopointer_data(fp);
106
107	The above is sufficient if the key and value do not contain any
108	pointers: they are basic C types or agglomorations of basic C types.
109	If the key and/or value do contain pointers, you can still store the
110	hashtable by replacing write_nopointer_data() with a custom writing
111	routine. See sparse_hash_map.html et al. for more information.
112
113	SPARSETABLE
114	-----------
115	In addition to the hash-map and hash-set classes, this package also
116	provides sparsetable.h, an array implementation that uses space
117	proportional to the number of elements in the array, rather than the
118	maximum element index. It uses very little space overhead: 1 bit per
119	entry. See sparsetable.html for the API.
120
121	RESOURCE USAGE
122	--------------
123	* sparse_hash_map has memory overhead of about 2 bits per hash-map
124	entry.
125	* dense_hash_map has a factor of 2-3 memory overhead: if your
126	hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
127	total.
128
129	Hashtables tend to double in size when resizing, creating an
130	additional 50% space overhead. dense_hash_map does in fact have a
131	significant "high water mark" memory use requirement.
132	sparse_hash_map, however, is written to need very little space
133	overhead when resizing: only a few bits per hashtable entry.
134
135	PERFORMANCE
136	-----------
137	You can compile and run the included file time_hash_map.cc to examine
138	the performance of sparse_hash_map, dense_hash_map, and your native
139	hash_map implementation on your system. One test against the
140	SGI hash_map implementation gave the following timing information for
141	a simple find() call:
142	SGI hash_map: 22 ns
143	dense_hash_map: 13 ns
144	sparse_hash_map: 117 ns
145	SGI map: 113 ns
146
147	See doc/performance.html for more detailed charts on resource usage
148	and performance data.
149
150	---
151	16 March 2005

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: GTP/trunk/Lib/Vis/Preprocessing/src/sparsehash/README @ 2162

Download in other formats: