Context Navigation

source: trunk/VUT/doc/SciReport/online.tex @ 243

Revision 243, 49.6 KB checked in by bittner, 19 years ago (diff)
SciReport? template

Rev	Line
[243]	1	\chapter{Online Visibility Culling}
	2	%*****************************************************************************
	3
	4	\section{Introduction}
	5
	6	Visibility culling is one of the major acceleration techniques for
	7	the real-time rendering of complex scenes. The ultimate goal of
	8	visibility culling techniques is to prevent invisible objects from
	9	being sent to the rendering pipeline. A standard visibility-culling
	10	technique is {\em view-frustum culling}, which eliminates objects
	11	outside of the current view frustum. View-frustum culling is a fast
	12	and simple technique, but it does not eliminate objects in the view
	13	frustum that are occluded by other objects. This can lead to significant
	14	{\em overdraw}, i.e., the same image area gets covered more than
	15	once. The overdraw causes a waste of computational effort both in the
	16	pixel and the vertex processing stages of modern graphic hardware.
	17	%that is, we render each pixel in the image several times. For many complex
	18	%scenes with high occlusion, the effort wasted on overdraw cannot simply be
	19	%eliminated by the increasing the raw computational power of the hardware.
	20	%This problem becomes even more apparent on recent graphics hardware with
	21	%programmable vertex and pixel shaders: all complex per-vertex or per-pixel
	22	%computations on object get completely wasted if the object is occluded.
	23	The elimination of occluded objects is addressed by {\em occlusion
	24	culling}. In an optimized rendering pipeline, occlusion culling complements
	25	other rendering acceleration techniques such as levels of detail or
	26	impostors.
	27
	28	Occlusion culling can either be applied offline or online. When
	29	applied offline as a preprocess, we compute a potentially visible set
	30	(PVS) for cells of a fixed subdivision of the scene. At runtime, we
	31	can quickly identify a PVS for the given viewpoint. However, this
	32	approach suffers from four major problems: (1) the PVS is valid only
	33	for the original static scene configuration, (2) for a given
	34	viewpoint, the corresponding cell-based PVS can be overly
	35	conservative, (3) computing all PVSs is computationally expensive, and (4)
	36	an accurate PVS computation is difficult to implement for general
	37	scenes. Online occlusion culling can solve these problems at the cost
	38	of applying extra computations at each frame. To make these additional
	39	computations efficient, most online occlusion culling methods rely on a
	40	number of assumptions about the scene structure and its occlusion
	41	characteristics (e.g. presence of large occluders, occluder
	42	connectivity, occlusion by few closest depth layers).
	43
	44	Recent graphics hardware natively supports an \emph{occlusion query}
	45	to detect the visibility of an object against the current contents of the
	46	z-buffer. Although the query itself is processed quickly using the
	47	raw power of the graphics processing unit (GPU), its result is not
	48	available immediately due to the delay between issuing the query and
	49	its actual processing in the graphics pipeline. As a result, a naive
	50	application of occlusion queries can even decrease the overall
	51	application performance due the associated CPU stalls and GPU
	52	starvation. In this paper, we present an algorithm that aims to
	53	overcome these problems by reducing the number of issued queries and
	54	eliminating the CPU stalls and GPU starvation. To schedule the
	55	queries, the algorithm makes use of both the spatial and the temporal
	56	coherence of visibility. A major strength of our technique is its
	57	simplicity and versatility: the method can be easily integrated in
	58	existing real-time rendering packages on architectures supporting the
	59	underlying occlusion query.
	60
	61	%Using spatial and assuming temporal coherence
	62
	63	\section{Related Work}
	64
	65	With the demand for rendering scenes of ever increasing size, there
	66	have been a number of visibility culling methods developed in the
	67	last decade. A comprehensive survey of visibility culling methods
	68	was presented by Cohen-Or et al.~\cite{Cohen:2002:survey}. Another
	69	recent survey of Bittner and Wonka~\cite{bittner03:jep} discusses
	70	visibility culling in a broader context of other visibility problems.
	71
	72	According to the domain of visibility computation, we distinguish
	73	between {\em from-point} and {\em from-region} visibility algorithms.
	74	From-region algorithms compute a PVS and are applied offline in a
	75	preprocessing phase~\cite{Airey90,Teller91a,Leyvand:2003:RSF}.
	76	From-point algorithms are applied online for each particular
	77	viewpoint~\cite{Greene93a,Hudson97,EVL-1997-163,bittner98b,Wonka:1999:OSF,Klosowski:2001:ECV}.
	78	In our further discussion we focus on online occlusion culling methods
	79	that exploit graphics hardware.
	80
	81	A conceptually important online occlusion culling method is the
	82	hierarchical z-buffer introduced by Greene et al.~\cite{Greene93a}. It
	83	organizes the z-buffer as a pyramid, where the standard z-buffer is
	84	the finest level. At all other levels, each z-value is the farthest in
	85	the window corresponding to the adjacent finer level. The hierarchical
	86	z-buffer allows to quickly determine if the geometry in question is
	87	occluded. To a certain extent this idea is used in the current
	88	generation of graphics hardware by applying early z-tests of
	89	fragments in the graphics pipeline (e.g., Hyper-Z technology of ATI or
	90	Z-cull of NVIDIA). However, the geometry still needs to be sent to the
	91	GPU, transformed, and coarsely rasterized even if it is later
	92	determined invisible.
	93
	94	Zhang~\cite{EVL-1997-163} proposed hierarchical occlusion maps, which
	95	do not rely on the hardware support for the z-pyramid, but instead
	96	make use of hardware texturing. The hierarchical occlusion map is
	97	computed on the GPU by rasterizing and down sampling a given set of
	98	occluders. The occlusion map is used for overlap tests whereas the
	99	depths are compared using a coarse depth estimation buffer. Wonka and
	100	Schmalstieg~\cite{Wonka:1999:OSF} use occluder shadows to compute
	101	from-point visibility in \m25d scenes with the help of the GPU. This
	102	method has been further extended to online computation of from-region
	103	visibility executed on a server~\cite{wonka:01:eg}.
	104
	105	%% In
	106	%% parallel to rendering, the visibility server calculates visibility for
	107	%% the neighborhood of the given viewpoint and sends them back to the
	108	%% display host.
	109
	110	%% A similar method designed by Aila and
	111	%% Miettinen~\cite{aila:00:msc}. The incremental occlusion maps are
	112	%% created on the CPU, which takes the CPU computational time but it does
	113	%% not suffer the problem of slow read-back from the GPU.
	114
	115	%Klosowski:2001:ECV
	116
	117	Bartz et al.~\cite{Bartz98} proposed an OpenGL extension for occlusion
	118	queries along with a discussion concerning a potential realization in
	119	hardware. A first hardware implementation of occlusion queries came
	120	with the VISUALIZE fx graphics hardware~\cite{Scott:1998:OVF}. The
	121	corresponding OpenGL extension is called \hpot{}. A more recent OpenGL
	122	extension, \nvoq{}, was introduced by NVIDIA with the GeForce 3 graphics
	123	card and it is now also available as an official ARB extension.
	124
	125	%Both
	126	%extensions have been used in several several occlusion methods
	127	%algorithms~\cite{Meissner01,Hillesland02,Staneker:2004:OCO}.
	128	%\cite{Meissner01}
	129
	130	Hillesland et al.~\cite{Hillesland02} have proposed an algorithm
	131	which employs the \nvoq. They subdivide the scene using a uniform
	132	grid. Then the cubes are traversed in slabs roughly perpendicular to
	133	the viewport. The queries are issued for all cubes of a slab at once,
	134	after the visible geometry of this slab has been rendered. The method
	135	can also use nested grids: a cell of the grid contains another grid
	136	that is traversed if the cell is proven visible. This method however
	137	does not exploit temporal and spatial coherence of visibility and it
	138	is restricted to regular subdivision data structures. Our new method
	139	addresses both these problems and provides natural extensions to
	140	balance the accuracy of visibility classification and the associated
	141	computational costs.
	142
	143	Recently, Staneker et al.~\cite{Staneker:2004:OCO} developed a method
	144	integrating occlusion culling into the OpenSG scene graph
	145	framework. Their technique uses occupancy maps maintained in software
	146	to avoid queries on visible scene graph nodes, and temporal coherence
	147	to reduce the number of occlusion queries. The drawback of the method
	148	is that it performs the queries in a serial fashion and thus it
	149	suffers from the CPU stalls and GPU starvation.
	150
	151	%% The methods proposed in this paper can be used to make use of temporal
	152	%% and spatial coherence in the scope of existing visibility algorithms,
	153	%% that utilise a spatial hierarchy. Examples of these are algorithms
	154	%% based on hierarchical occlusion maps~\cite{Zhang97}, coverage
	155	%% masks~\cite{Greene:1996:HPT}, shadow frusta~\cite{Hudson97}, and
	156	%% occlusion trees~\cite{bittner98b_long}.
	157
	158	On a theoretical level, our paper is related to methods aiming to
	159	exploit the temporal coherence of visibility. Greene et
	160	al.~\cite{Greene93a} used the set of visible objects from one frame to
	161	initialize the z-pyramid in the next frame in order to reduce the
	162	overdraw of the hierarchical z-buffer. The algorithm of Coorg and
	163	Teller~\cite{Coorg96b} restricts the hierarchical traversal to nodes
	164	associated with visual events that were crossed between successive
	165	viewpoint positions. Another method of Coorg and Teller~\cite{Coorg97}
	166	exploits temporal coherence by caching occlusion relationships.
	167	Chrysanthou and Slater have proposed a probabilistic scheme for
	168	view-frustum culling~\cite{Chrysanthou:97}.
	169
	170	The above mentioned methods for exploiting temporal coherence are
	171	tightly interwoven with the particular culling algorithm. On the
	172	contrary, Bittner et al.~\cite{bittner01:jvca} presented a general
	173	acceleration technique for exploiting spatial and temporal coherence
	174	in hierarchical visibility algorithms. The central idea, which is also
	175	vital for this paper, is to avoid repeated visibility tests of
	176	interior nodes of the hierarchy. The problem of direct adoption of
	177	this method is that it is designed for the use with instantaneous CPU
	178	based occlusion queries, whereas hardware occlusion queries exhibit
	179	significant latency. The method presented herein efficiently overcomes
	180	the problem of latency while keeping the benefits of a generality and
	181	simplicity of the original hierarchical technique. As a result we
	182	obtain a simple and efficient occlusion culling algorithm utilizing
	183	hardware occlusion queries.
	184
	185
	186	%In this paper we show how to overcome this problem while
	187	%keeping the benefits of exploiting.
	188
	189
	190	The rest of the paper is organized as follows:
	191	Section~\ref{sec:hwqueries} discusses hardware supported occlusion
	192	queries and a basic application of these queries using a kD-tree.
	193	Section~\ref{sec:hoq} presents our new algorithm and
	194	Section~\ref{sec:optimizations} describes several additional
	195	optimizations. Section~\ref{sec:results} presents results obtained by
	196	experimental evaluation of the method and discusses its
	197	behavior. Finally Section~\ref{sec:conclusion} concludes the paper.
	198
	199
	200
	201
	202	\section{Hardware Occlusion Queries}
	203	\label{sec:hwqueries}
	204
	205
	206	Hardware occlusion queries follow a simple pattern: To test visibility
	207	of an occludee, we send its bounding volume to the GPU. The volume is
	208	rasterized and its fragments are compared to the current contents of
	209	the z-buffer. The GPU then returns the number of visible
	210	fragments. If there is no visible fragment, the occludee is invisible
	211	and it need not be rendered.
	212
	213	\subsection{Advantages of hardware occlusion queries}
	214
	215	There are several advantages of hardware occlusion queries:
	216
	217	\begin{itemize}
	218
	219	\item {\em Generality of occluders.} We can use the original scene geometry
	220	as occluders, since the queries use the current contents of the z-buffer.
	221
	222
	223	\item {\em Occluder fusion.} The occluders are merged in the z-buffer,
	224	so the queries automatically account for occluder fusion. Additionally
	225	this fusion comes for free since we use the intermediate result of the
	226	rendering itself.
	227
	228	\item {\em Generality of occludees.} We can use complex
	229	occludees. Anything that can be rasterized quickly is suitable.
	230
	231	\item {\em Exploiting the GPU power.} The queries take full advantage of
	232	the high fill rates and internal parallelism provided by modern GPUs.
	233
	234	\item {\em Simple use.} Hardware occlusion queries can be easily
	235	integrated into a rendering algorithm. They provide a powerful tool to
	236	minimize the implementation effort, especially when compared to
	237	CPU-based occlusion culling.
	238
	239
	240	\end{itemize}
	241
	242
	243
	244	\subsection{Problems of hardware occlusion queries}
	245
	246	Currently there are two main hardware supported variants of occlusion
	247	queries: the HP test (\hpot) and the more recent NV query (\nvoq, now
	248	also available as \arboq). The most important difference between the
	249	HP test and the NV query is that multiple NV queries can be
	250	issued before asking for their results, while only one HP test is
	251	allowed at a time, which severely limits its possible algorithmic
	252	usage. Additionally the NV query returns the number of visible pixels
	253	whereas the HP test returns only a binary visibility
	254	classification.
	255
	256	%% Issuing multiple independent NV queries is crucial for
	257	%% the design of algorithms that strive to exploit power of the GPU.
	258
	259	The main problem of both the HP test and the NV query is the latency
	260	between issuing the query and the availability of the result. The
	261	latency occurs due to the delayed processing of the query in a long
	262	graphics pipeline, the cost of processing the query itself, and the
	263	cost of transferring the result back to the CPU. The latency causes
	264	two major problems: CPU stalls and GPU starvation. After issuing the
	265	query, the CPU waits for its result and does not feed the GPU with new
	266	data. When the result finally becomes available, the GPU pipeline can
	267	already be empty. Thus the GPU needs to wait for the CPU to process
	268	the result of the query and to feed the GPU with new data.
	269
	270	A major challenge when using hardware occlusion queries is to avoid
	271	the CPU stalls by filling the latency time with other tasks, such as
	272	rendering visible scene objects or issuing other, independent
	273	occlusion queries (see Figure~\ref{fig:latency})
	274
	275
	276	\begin{figure}[htb]
	277	\centerline{
	278	\includegraphics[width=0.5\textwidth,draft=\DRAFTFIGS]{figs/latency2}
	279	}
	280
	281	\caption{(top) Illustration of CPU stalls and GPU starvation.
	282	Qn, Rn, and Cn denote querying, rendering, and culling of object
	283	n, respectively. Note that object 5 is found invisible by Q5 and
	284	thus not rendered. (bottom) More efficient query scheduling. The
	285	scheduling assumes that objects 4 and 6 will be visible in the
	286	current frame and renders them without waiting for the result of
	287	the corresponding queries. }
	288	\label{fig:latency}
	289	\end{figure}
	290
	291
	292
	293
	294
	295	\subsection{Hierarchical stop-and-wait method}
	296	\label{sec:basic}
	297
	298	Many rendering algorithms rely on hierarchical structures in order to
	299	deal with complex scenes. In the context of occlusion culling, such a
	300	data structure allows to efficiently cull large scene blocks, and thus
	301	to exploit spatial coherence of visibility and provide a key to
	302	achieving output sensitivity.
	303
	304	This section outlines a naive application of occlusion queries in the
	305	scope of a hierarchical algorithm. We refer to this approach as the
	306	{\em hierarchical stop-and-wait} method. Our discussion is based on
	307	kD-trees, which proved to be efficient for point location, ray
	308	tracing, and visibility
	309	culling~\cite{MacDonald90,Hudson97,Coorg97,bittner01:jvca}. The
	310	concept applies to general hierarchical data structures as well, though.
	311
	312	The hierarchical stop-and-wait method proceeds as follows: Once a
	313	kD-tree node passes view-frustum culling, it is tested for
	314	occlusion by issuing the occlusion query and waiting for its
	315	result. If the node is found visible, we continue by recursively
	316	testing its children in a front-to-back order. If the node is a
	317	leaf, we render its associated objects.
	318
	319	The problem with this approach is that we can continue the tree
	320	traversal only when the result of the last occlusion query becomes
	321	available. If the result is not available, we have to stall the CPU,
	322	which causes significant performance penalties. As we document in
	323	Section~\ref{sec:results}, these penalties together with the overhead
	324	of the queries themselves can even decrease the overall application
	325	performance compared to pure view-frustum culling. Our new method aims
	326	to eliminate this problem by issuing multiple occlusion queries for
	327	independent scene parts and exploiting temporal coherence of
	328	visibility classifications.
	329
	330
	331	\section{Coherent Hierarchical Culling}
	332
	333	\label{sec:hoq}
	334
	335	In this section we first present an overview of our new
	336	algorithm. Then we discuss its steps in more detail.
	337
	338	%and present a
	339	%generalization of the method to other spatial data structures.
	340
	341	\subsection{Algorithm Overview}
	342
	343	\label{sec:overview}
	344
	345
	346	Our method is based on exploiting temporal coherence of visibility
	347	classification. In particular, it is centered on the following
	348	three ideas:
	349
	350	\begin{itemize}
	351	\item We initiate occlusion queries on nodes of the hierarchy where
	352	the traversal terminated in the last frame. Thus we avoid queries
	353	on all previously visible interior nodes~\cite{bittner01:jvca}.
	354	\item We assume that a previously visible leaf node remains visible
	355	and render the associated geometry without waiting for the result
	356	of the corresponding occlusion query.
	357	\item Issued occlusion queries are stored in a query queue until they are known to be carried out by the GPU. This allows
	358	interleaving the queries with the rendering of visible geometry.
	359	\end{itemize}
	360
	361
	362	The algorithm performs a traversal of the hierarchy that is
	363	terminated either at leaf nodes or nodes that are classified as
	364	invisible. Let us call such nodes the {\em termination nodes}, and
	365	interior nodes that have been classified visible the {\em opened
	366	nodes}. We denote sets of termination and opened nodes in the $i$-th
	367	frame $\mathcal{T}_i$ and $\mathcal{ O}_i$, respectively. In the
	368	$i$-th frame, we traverse the kD-tree in a front-to-back order, skip
	369	all nodes of $\mathcal{ O}_{i-1}$ and apply occlusion queries first on
	370	the termination nodes $\mathcal{ T}_{i-1}$. When reaching a
	371	termination node, the algorithm proceeds as follows:
	372
	373	\begin{itemize}
	374	\item For a previously visible node (this must be a leaf), we issue
	375	the occlusion query and store it in the query queue. Then we
	376	immediately render the associated geometry without waiting for the
	377	result of the query.
	378	\item For a previously invisible node, we issue the query and store
	379	it in the query queue.
	380	\end{itemize}
	381
	382	\begin{figure}[htb]
	383	{\footnotesize
	384	\input{code/pseudocode2}
	385	}
	386	\caption{Pseudo-code of coherent hierarchical culling.}
	387	\label{fig:pseudocode}
	388	\end{figure}
	389
	390	When the query queue is not empty, we check if the result of the
	391	oldest query in the queue is already available. If the query result is
	392	not available, we continue by recursively processing other nodes of
	393	the kD-tree as described above. If the query result is available, we
	394	fetch the result and remove the node from the query queue. If the node
	395	is visible, we process its children recursively. Otherwise, the whole
	396	subtree of the node is invisible and thus it is culled.
	397
	398
	399	In order to propagate changes in visibility upwards in the hierarchy,
	400	the visibility classification is \emph{pulled up} according to the
	401	following rule: An interior node is invisible only if all its children
	402	have been classified invisible. Otherwise, it remains visible and thus
	403	opened. The pseudo-code of the complete algorithm is given in
	404	Figure~\ref{fig:pseudocode}. An example of the behavior of the method
	405	on a small kD-tree for two subsequent frames is depicted
	406	Figure~\ref{fig:cut}.
	407
	408	\begin{figure*}[htb]
	409	\centerline{
	410	\includegraphics[width=0.73\textwidth,draft=\DRAFTFIGS]{figs/cut} }
	411	\caption{(left) Visibility classification of a node of the kD-tree and
	412	the termination nodes. (right) Visibility classification after the
	413	application of the occlusion test and the new set of termination nodes.
	414	Nodes on which occlusion queries were applied are depicted with a
	415	solid outline. Note the pull-up and pull-down due to visibility changes.}
	416	\label{fig:cut}
	417	\end{figure*}
	418
	419
	420
	421	The sets of opened nodes and termination nodes need not be maintained
	422	explicitly. Instead, these sets can be easily identified by
	423	associating with each node an information about its visibility and an
	424	id of the last frame when it was visited. The node is an opened node
	425	if it is an interior visible node that was visited in the last frame
	426	(line 23 in the pseudocode). Note that in the actual implementation of
	427	the pull up we can set all visited nodes to invisible by default and
	428	then pull up any changes from invisible to visible (lines 25 and line
	429	12 in Figure~\ref{fig:pseudocode}). This modification eliminates
	430	checking children for invisibility during the pull up.
	431
	432
	433
	434	%% \begin{figure*}[htb]
	435	%% \centerline{\includegraphics[width=0.8\textwidth,draft=\DRAFTFIGS]{figs/basic}}
	436	%% \caption{Illustration of the hierarchy traversal. Initially the algorithm
	437	%% starts at the root of the hierarchy (left). In the second frame the
	438	%% opened nodes $\mathcal{ O}_0$ are skipped and the occlusion queries
	439	%% are first applied on the termination nodes $\mathcal{T}_0$. Visibility changes are propagated upwarsds in the hierarchy and a
	440	%% new set of termination nodes $\mathcal{ T}_1$ is established.}
	441	%% \label{fig:basic}
	442	%% \end{figure*}
	443
	444
	445	\subsection{Reduction of the number of queries}
	446
	447	%Skipping the opened nodes and issuing occlusion queries for the
	448	%termination nodes assists in exploiting two main characteristics of
	449	%scene visibility:
	450
	451	%Identifying the set of termination nodes assists in finding a subdivision
	452
	453	Our method reduces the number of visibility queries in two ways:
	454	Firstly, as other hierarchical culling methods we consider only a
	455	subtree of the whole hierarchy (opened nodes + termination
	456	nodes). Secondly, by avoiding queries on opened nodes we eliminate
	457	part of the overhead of identification of this subtree. These
	458	reductions reflect the following coherence properties of scene
	459	visibility:
	460
	461	\begin{itemize}
	462
	463
	464	\item {\em Spatial coherence.} The invisible termination nodes
	465	approximate the occluded part of the scene with the smallest number of
	466	nodes with respect to the given hierarchy, i.e., each invisible
	467	termination node has a visible parent. This induces an adaptive
	468	spatial subdivision that reflects spatial coherence of visibility,
	469	more precisely the coherence of occluded regions. The adaptive nature
	470	of the subdivision allows to minimize the number of subsequent
	471	occlusion queries by applying the queries on the largest spatial
	472	regions that are expected to remain occluded.
	473
	474	%, and visible termination
	475	%nodes correspond to the smallest unoccluded regions (visible
	476	%leafs)
	477
	478	\item {\em Temporal coherence.} If visibility remains constant the set
	479	of termination nodes needs no adaptation. If an occluded node becomes
	480	visible we recursively process its children (pull-down). If a visible
	481	node becomes occluded we propagate the change higher in the hierarchy
	482	(pull-up). A pull-down reflects a spatial growing of visible
	483	regions. Similarly, a pull-up reflects a spatial growing of occluded
	484	regions.
	485
	486
	487	%The first case corresponds to pull-down of the cut, the latter to pull-up.
	488
	489	\end{itemize}
	490
	491	By avoiding queries on the opened nodes, we can save $1/k$ of the queries
	492	for a hierarchy with branching factor $k$ (assuming visibility remains
	493	constant). Thus for the kD-tree, up to half of the queries can be
	494	saved. The actual savings in the total query time are even larger: the
	495	higher we are at the hierarchy, the larger boxes we would have to
	496	check for occlusion. Consequently, the higher is the fill rate that
	497	would have been required to rasterize the boxes. In particular,
	498	assuming that the sum of the screen space projected area for nodes at
	499	each level of the kD-tree is equal and the opened nodes form a
	500	complete binary subtree of depth $d$, the fill rate is reduced $(d+2)$
	501	times.
	502
	503
	504
	505	\subsection{Reduction of CPU stalls and GPU starvation}
	506
	507	\label{sec:latency}
	508
	509	The reduction of CPU stalls and GPU starvation is achieved by
	510	interleaving occlusion queries with the rendering of visible geometry. The
	511	immediate rendering of previously visible termination nodes and the
	512	subsequent issuing of occlusion queries eliminates the requirement of
	513	waiting for the query result during the processing of the initial
	514	depth layers containing previously visible nodes. In an optimal case,
	515	new query results become available in between and thus we completely
	516	eliminate CPU stalls. In a static scenario, we achieve exactly the
	517	same visibility classification as the hierarchical stop-and-wait
	518	method.
	519
	520	If the visibility is changing, the situation can be different: if the
	521	results of the queries arrive too late, it is possible that we
	522	initiated an occlusion query on a previously occluded node $A$ that is
	523	in fact occluded by another previously occluded node $B$ that became
	524	visible. If $B$ is still in the query queue, we do not capture a
	525	possible occlusion of $A$ by $B$ since the geometry associated with
	526	$B$ has not yet been rendered. In Section~\ref{sec:results} we show
	527	that the increase of the number of rendered objects compared to the
	528	stop-and-wait method is usually very small.
	529
	530	%% It is possible that we have already traversed all previously
	531	%% visible nodes and we must stall the CPU by waiting for the result
	532	%% of the oldest occlusion query. A technique completely eliminating
	533	%% CPU stalls at the cost of an increase of the number of rendered
	534	%% objects will be discussed in Section~\ref{sec:full}.
	535
	536	\subsection{Front-to-back scene traversal}
	537
	538	For kD-trees the front-to-back scene traversal can be easily
	539	implemented using a depth first
	540	traversal~\cite{bittner01:jvca}. However, at a modest increase in
	541	computational cost we can also use a more general breadth-first
	542	traversal based on a priority queue. The priority of the node then
	543	corresponds to an inverse of the minimal distance of the viewpoint and
	544	the bounding box associated with the given node of the
	545	kD-tree~\cite{Klosowski:2001:ECV,Staneker:2004:OCO}.
	546
	547	In the context of our culling algorithm, there are two main advantages
	548	of the breadth-first front-to-back traversal :
	549
	550	\begin{itemize}
	551	\item {\em Better query scheduling.} By spreading the traversal of the
	552	scene in a breadth-first manner, we process the scene in depth
	553	layers. Within each layer, the node processing order is practically
	554	independent, which minimizes the problem of occlusion query
	555	dependence. The breadth-first traversal interleaves
	556	occlusion-independent nodes, which can provide a more accurate
	557	visibility classification if visibility changes quickly. In
	558	particular, it reduces the problem of false classifications due to
	559	missed occlusion by nodes waiting in the query queue (discussed in
	560	Section~\ref{sec:latency}).
	561
	562	\item {\em Using other spatial data structures.} By using a
	563	breadth-first traversal, we are no longer restricted to the
	564	kD-tree. Instead we can use an arbitrary spatial data structure
	565	such as a bounding volume hierarchy, octree, grid, hierarchical grid,
	566	etc. Once we compute a distance from a node to the viewpoint, the
	567	node processing order is established by the priority queue.
	568	\end{itemize}
	569
	570	When using the priority queue, our culling algorithm can also be
	571	applied directly to the scene graph hierarchy, thus avoiding the
	572	construction of any auxiliary data structure for spatial
	573	partitioning. This is especially important for dynamic scenes, in
	574	which maintenance of a spatial classification of moving objects can be
	575	costly.
	576
	577	\subsection{Checking the query result}
	578
	579	The presented algorithm repeatedly checks if the result of the
	580	occlusion query is available before fetching any node from the
	581	traversal stack (line 6 in Figure~\ref{fig:pseudocode}). Our
	582	practical experiments have proven that the cost of this check is
	583	negligible and thus it can used frequently without any performance
	584	penalty. If the cost of this check were significantly higher, we could
	585	delay asking for the query result by a time established by empirical
	586	measurements for the particular hardware. This delay should also
	587	reflect the size of the queried node to match the expected
	588	availability of the query result as precise as possible.
	589
	590
	591	\section{Further Optimizations}
	592
	593	\label{sec:optimizations}
	594
	595	This section discusses a couple of optimizations of our method that
	596	can further improve the overall rendering performance. In contrast to
	597	the basic algorithm from the previous section, these optimizations
	598	rely on some user specified parameters that should be tuned for a
	599	particular scene and hardware configuration.
	600
	601
	602	\subsection{Conservative visibility testing}
	603
	604	The first optimization addresses the reduction of the number of
	605	visibility tests at the cost of a possible increase in the number of
	606	rendered objects. This optimization is based on the idea of skipping
	607	some occlusion tests of visible nodes. We assume that whenever a node
	608	becomes visible, it remains visible for a number of frames. Within the
	609	given number of frames we avoid issuing occlusion queries and simply
	610	assume the node remains visible~\cite{bittner01:jvca}.
	611
	612	This technique can significantly reduce the number of visibility tests
	613	applied on visible nodes of the hierarchy. Especially in the case of
	614	sparsely occluded scenes, there is a large number of visible nodes
	615	being tested, which does not provide any benefit since most of them
	616	remain visible. On the other hand, we do not immediately capture all
	617	changes from visibility to invisibility, and thus we may render
	618	objects that have already become invisible from the moment when the
	619	last occlusion test was issued.
	620
	621	In the simplest case, the number of frames a node is assumed visible
	622	can be a predefined constant. In a more complicated scenario this
	623	number should be influenced by the history of the success of occlusion
	624	queries and/or the current speed of camera movement.
	625
	626
	627	\subsection{Approximate visibility}
	628
	629	The algorithm as presented computes a conservative visibility
	630	classification with respect to the resolution of the z-buffer. We
	631	can easily modify the algorithm to cull nodes more aggressively in
	632	cases when a small part of the node is visible. We compare the
	633	number of visible pixels returned by the occlusion query with a
	634	user specified constant and cull the node if this number drops
	635	below this constant.
	636
	637
	638	\subsection{Complete elimination of CPU stalls}
	639
	640	\label{sec:full}
	641
	642	The basic algorithm eliminates CPU stalls unless the traversal stack
	643	is empty. If there is no node to traverse in the traversal stack and
	644	the result of the oldest query in the query queue is still not
	645	available, it stalls the CPU by waiting for the query result. To
	646	completely eliminate the CPU stalls, we can speculatively render some
	647	nodes with undecided visibility. In particular, we select a node from
	648	the query queue and render the geometry associated with the node (or
	649	the whole subtree if it is an interior node). The node is marked as
	650	rendered but the associated occlusion query is kept in the queue to
	651	fetch its result later. If we are unlucky and the node remains
	652	invisible, the effort of rendering the node's geometry is wasted. On
	653	the other hand, if the node has become visible, we have used the time
	654	slot before the next query arrives in an optimal manner.
	655
	656	To avoid the problem of spending more time on rendering invisible
	657	nodes than would be spent by waiting for the result of the query, we
	658	select a node with the lowest estimated rendering cost and compare
	659	this cost with a user specified constant. If the cost is larger than
	660	the constant we conclude that it is too risky to render the node and
	661	wait till the result of the query becomes available.
	662
	663	\section{Results}
	664	\label{sec:results}
	665
	666	We have incorporated our method into an OpenGL-based scene graph
	667	library and tested it on three scenes of different types. All tests
	668	were conducted on a PC with a 3.2GHz P4, 1GB of memory, and a
	669	GeForce FX5950 graphics card.
	670
	671	\subsection{Test scenes}
	672
	673	The three test scenes comprise a synthetic arrangement of 5000
	674	randomly positioned teapots (11.6M polygons); an urban environment (1M
	675	polygons); and the UNC power plant model (13M polygons). The test
	676	scenes are depicted in Figure~\ref{fig:scenes}. All scenes were
	677	partitioned using a kD-tree constructed according to the surface-area
	678	heuristics~\cite{MacDonald90}.
	679
	680
	681	Although the teapot scene would intuitively offer good occlusion, it
	682	is a complicated case to handle for occlusion culling. Firstly, the
	683	teapots consist of small triangles and so only the effect of fused
	684	occlusion due to a large number of visible triangles can bring a
	685	culling benefit. Secondly, there are many thin holes through which it
	686	is possible to see quite far into the arrangement of teapots. Thirdly,
	687	the arrangement is long and thin and so we can see almost
	688	half of the teapots along the longer side of the arrangement.
	689
	690	%Although the teapot scene would intuitively seem to offer good
	691	%occlusion, it is one of the more complicated cases for occlusion
	692	%culling. Due to the high number of very dense objects, a very fine
	693	%kD-tree subdivision was necessary, which in turn leads to higher costs
	694	%for testing many nodes in the hierarchy. Additionally, misclassifying
	695	%a single teapot can lead to a significant increase in frame time due
	696	%to the high polygon density (2,304 polygons per teapot). There are also
	697	%many locations where it is possible to see very far into the block of
	698	%teapots. This can be seen in the kD-tree node classification which is
	699	%shown in the accompanying video.
	700
	701	The complete power plant model is quite challenging even to load into memory,
	702	but on the other hand it offers good
	703	occlusion. This scene is an interesting candidate for testing not only
	704	due to its size, but also due to significant changes in visibility and depth complexity in
	705	its different parts.
	706
	707	%as can also be seen in the
	708	%walkthrough. A relatively coarse subdivision of the model triangles
	709	%into about 4,600 nodes was sufficient to capture most occlusion
	710	%interdependencies.
	711
	712	The city scene is a classical target for occlusion culling
	713	algorithms. Due to the urban structure consisting of buildings and
	714	streets, most of the model is occluded when viewed from the
	715	streets. Note that the scene does not contain any detailed geometry
	716	inside the buildings. See Figure~\ref{fig:city_vis} for a
	717	visualization of the visibility classification of the kD-tree nodes
	718	for the city scene.
	719
	720	%that
	721	%could further
	722
	723
	724
	725	%% However, we still included a section in the walkthrough where the
	726	%% viewpoint is elevated above the building roofs in order to show that
	727	%% the algorithm continues to work without modification, albeit at some
	728	%% expense. It will be an interesting topic of research to automatically
	729	%% recognize such situations and decrease the frequency of visibility
	730	%% queries or turn the occlusion culling algorithm off completely.
	731
	732
	733
	734	\begin{figure}
	735	\centering \includegraphics[width=0.32\textwidth,draft=\DRAFTIMAGES]{images/city_vis}
	736	\caption{Visibility classification of the kD-tree nodes in the city scene.
	737	The orange nodes were found visible, all the other depicted nodes are invisible.
	738	Note the increasing size of the occluded nodes with increasing distance from the visible set.}
	739	\label{fig:city_vis}
	740	\end{figure}
	741
	742
	743	\subsection{Basic tests}
	744
	745
	746	We have measured the frame times for rendering with only view-frustum culling
	747	(VFC), the hierarchical stop-and-wait method (S\&W), and our new
	748	coherent hierarchical culling method (CHC). Additionally, we have
	749	evaluated the time for an ``ideal'' algorithm. The ideal algorithm
	750	renders the visible objects found by the S\&W algorithm without
	751	performing any visibility tests. This is an optimal solution with
	752	respect to the given hierarchy, i.e., no occlusion culling algorithm
	753	operating on the same hierarchy can be faster. For the basic tests we
	754	did not apply any of the optimizations discussed in
	755	Section~\ref{sec:optimizations}, which require user specified
	756	parameters.
	757
	758	%% In order to obtain repeatable and comparable timings, the process
	759	%% priority of our application was set to high, timings were obtained
	760	%% using the CPU cycle counter instruction, and the OpenGL {\tt Finish()}
	761	%% instruction was executed before each invocation of the timer at the
	762	%% start and the end of a frame. Furthermore, we did not include clear
	763	%% and buffer swapping times in our results, since they depend strongly
	764	%% on the windowing mode used.
	765
	766	For each test scene, we have constructed a walkthrough which is shown
	767	in full in the accompanying
	768	video. Figures~\ref{fig:teapot_walkthrough},~\ref{fig:city_walkthrough},
	769	and~\ref{fig:plant_walkthrough} depict the frame times measured for
	770	the walkthroughs. Note that Figure~\ref{fig:plant_walkthrough} uses a
	771	logarithmic scale to capture the high variations in frame times during
	772	the power plant walkthrough.
	773	\begin{figure}
	774	\centering
	775	\includegraphics[width=0.5\textwidth,draft=\DRAFTFIGS]{images/teapotgraph1}
	776	\caption{Frame times for the teapot scene.}\label{fig:teapot_walkthrough}
	777	\end{figure}
	778	\begin{figure}
	779	\centering
	780	\includegraphics[width=0.5\textwidth,draft=\DRAFTFIGS]{images/citygraph1}
	781	\caption{Frame times for the city walkthrough. Note
	782	the spike around frame 1600, where the viewpoint was elevated above the roofs,
	783	practically eliminating any occlusion.}\label{fig:city_walkthrough}
	784	\end{figure}
	785	\begin{figure}
	786	\centering
	787	\includegraphics[width=0.5\textwidth,draft=\DRAFTFIGS]{images/plantgraph1log}
	788	\caption{Frame times for the power plant walkthrough.
	789	The plot shows the weakness of the S\&W method: when there is not much occlusion it becomes slower than VFC (near frame 2200).
	790	The CHC can keep up even in these situations and in the same time it can exploit occlusion when it appears
	791	(e.g. near frame 3700).}
	792	\label{fig:plant_walkthrough}
	793	\end{figure}
	794	To better demonstrate the behavior of our algorithm, all walkthroughs
	795	contain sections with both restricted and unrestricted visibility. For
	796	the teapots, we viewed the arrangement of teapots along the longer
	797	side of the arrangement (frames 25--90). In the city we elevated the
	798	viewpoint above the roofs and gained sight over most of the city
	799	(frames 1200--1800). The power plant walkthrough contains several
	800	viewpoints from which a large part of the model is visible (spikes in
	801	Figure~\ref{fig:plant_walkthrough} where all algorithms are slow),
	802	viewpoints along the border of the model directed outwards with low depth complexity (holes in
	803	Figure~\ref{fig:plant_walkthrough} where all algorithms are fast), and
	804	viewpoints inside the power plant with high depth complexity where occlusion culling produces a
	805	significant speedup over VFC (e.g. frame 3800).
	806
	807	As we can see for a number frames in the walkthroughs, the CHC method
	808	can produce a speedup of more than one order of magnitude compared to
	809	VFC. The maximum speedup for the teapots, the city, and the power
	810	plant walkthroughs is 8, 20, and 70, respectively. We can also observe
	811	that CHC maintains a significant gain over S\&W and in many cases it
	812	almost matches the performance of the ideal algorithm. In complicated
	813	scenarios the S\&W method caused a significant slowdown compared to
	814	VFC (e.g. frames 1200--1800 of
	815	Figure~\ref{fig:city_walkthrough}). Even in these cases, the CHC
	816	method maintained a good speedup over VFC except for a small number of
	817	frames.
	818
	819	\begin{table*}
	820	\centering \footnotesize
	821	\begin{tabular}{\|c\|c\|c\|c\|c\|c\|c\|c\|}
	822	\hline\hline
	823	scene & method & \#queries & wait time [ms] & rendered triangles & frame time [ms] & speedup \\\hline\hline
	824
	825	Teapots & VFC & --- & --- & 11,139,928 & 310.42 & 1.0 \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	826	11,520,000 triangles & S\&W & 4704 & 83.19 & 2,617,801 & 154.95 & 2.3 \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	827	21,639 kD-Tree nodes & {\bf CHC} &{\bf 2827} & {\bf 1.31} & {\bf 2,852,514 } & {\bf 81,18 } & {\bf 4.6 } \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	828	& Ideal & --- & --- & 2,617,801 & 72.19 & 5.2 \\
	829	\hline\hline
	830	City & VFC & --- & --- & 156,521 & 19.79 & 1.0 \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	831	1,036,146 triangles & S\&W & 663 & 9.49 & 30,594 & 19.9 & 1.5 \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	832	33,195 kD-Tree nodes &{\bf CHC} & {\bf 345} & {\bf 0.18} & {\bf 31,034} & {\bf 8.47} & {\bf 4.0} \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	833	& Ideal & --- & --- & 30,594 & 4.55 & 6.6 \\
	834	\hline\hline
	835	Power Plant & VFC & --- & --- & 1,556,300 & 138.76 & 1.0 \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	836	12,748,510 triangles & S\&W & 485 & 16.16 & 392,962 & 52.29 & 3.2 \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	837	18,719 kD-Tree nodes &{\bf CHC} & {\bf 263} & {\bf 0.70} & {\bf 397,920} & {\bf 38.73} & {\bf 4.7} \\\hhline{~\|-\|-\|-\|-\|-\|-\|-\|}
	838	& Ideal & --- & --- & 392,962 & 36.34 & 5.8 \\\hline\hline
	839	\end{tabular}
	840	\caption{Statistics for the three test scenes. VFC is rendering with only view-frustum culling, S\&W is the
	841	hierarchical stop and wait method, CHC is our new method, and Ideal
	842	is a perfect method with respect to the given hierarchy. All values are averages over
	843	all frames (including the speedup).}
	844	\label{tab:averages}
	845	\end{table*}
	846
	847	%VFC S&W CHC Ideal
	848	%(01:50:59) salzrat: 1 3,243823523 4,744645293 5,765981148
	849	%(01:51:04) salzrat: for the powerplant
	850	% for the powerplant
	851	%(01:53:59) salzrat: 1 1,519898136 4,01897468 6,599413211
	852	%(01:54:01) salzrat: for the city
	853	%(01:55:15) salzrat: and
	854	%(01:55:15) salzrat: 1 2,321137905 4,646185621 5,199086993
	855
	856	%salzrat: teapot: 1,977162565 1,128631431
	857	%(02:01:52) salzrat: city: 2,607396227 1,674188137
	858	%(02:02:59) salzrat: powerplant: 1,635036084 1,244446469
	859
	860
	861	Next, we summarized the scene statistics and the average values per
	862	frame in Table~\ref{tab:averages}. The table shows the number of
	863	issued occlusion queries, the wait time representing the CPU stalls,
	864	the number of rendered triangles, the total frame time, and the
	865	speedup over VFC.
	866
	867
	868	We can see that the CHC method practically eliminates the CPU stalls
	869	(wait time) compared to the S\&W method. This is paid for by a slight
	870	increase in the number of rendered triangles. For the three
	871	walkthroughs, the CHC method produces average speedups of 4.6, 4.0, and
	872	4.7 over view frustum culling and average speedups of 2.0, 2.6, and
	873	1.6 over the S\&W method. CHC is only 1.1, 1.7, and 1.2 times
	874	slower than the ideal occlusion culling algorithm. Concerning the
	875	accuracy, the increase of the average number of rendered triangles for
	876	CHC method compared to S\&W was 9\%, 1.4\%, and 1.3\%. This increase
	877	was always recovered by the reduction of CPU stalls for the tested
	878	walkthroughs.
	879
	880	%% While this number may be very low, the algorithm also incurs a
	881	%% non-negligible overhead which is caused by the graphics card
	882	%% reconfiguration required for the queries and rasterization of the
	883	%% bounding volume geometries.
	884
	885	% triangles
	886	% 9%
	887	% 1.4%
	888	% 1.3%
	889
	890
	891	% speedup over SW
	892	% 1.9
	893	% 2.32
	894	% 1.33
	895
	896	% compare to ideal
	897	% 1.13
	898	% 1.9
	899	% 1.05
	900
	901
	902
	903	\subsection{Optimizations}
	904
	905	First of all we have observed that the technique of complete
	906	elimination of CPU stalls discussed in Section~\ref{sec:full} has a
	907	very limited scope. In fact for all our tests the stalls were almost
	908	completely eliminated by the basic algorithm already (see wait time in
	909	Table~\ref{tab:averages}). We did not find constants that could
	910	produce additional speedup using this technique.
	911
	912	The measurements for the other optimizations discussed in
	913	Section~\ref{sec:optimizations} are summarized in
	914	Table~\ref{tab:assumed}. We have measured the average number of issued queries
	915	and the average frame time in dependence on the number of frames a node is
	916	assumed visible and the pixel threshold of approximate visibility. We
	917	have observed that the effectiveness of the optimizations depends strongly on the scene. If
	918	the hierarchy is deep and the geometry associated with a leaf node is
	919	not too complex, the conservative visibility testing produces a
	920	significant speedup (city and power plant). For the teapot scene the
	921	penalty for false rendering of actually occluded objects became larger
	922	than savings achieved by the reduction of the number of queries. On the other
	923	hand since the teapot scene contains complex visible geometry the
	924	approximate visibility optimization produced a significant
	925	speedup. This is however paid for by introducing errors in the image
	926	proportional to the pixel threshold used.
	927
	928	%Good parameters for these
	929	%optimizations have to be estimated on a scene-by-scene basis.
	930
	931
	932
	933
	934	%% , where we compare the reference frame time and
	935	%% number of queries already shown in the previous table to the
	936	%% approximate visibility optimization using a higher pixel threshold,
	937	%% and to the conservative visibility testing optimization assuming a
	938	%% node remains visible for 2 frames, both for the teapot and the city
	939	%% scene. These tests clearly show that a high visibility threshold
	940	%% improves frame times, albeit at the expense of image quality. The
	941	%% benefit of the conservative visibility optimization, is however
	942	%% limited. While the number of queries can be reduced significantly, the
	943	%% number of misclassified objects rises and increases the frame time
	944	%% again.
	945
	946
	947
	948	%% \begin{table}
	949	%% \centering \footnotesize
	950	%% \begin{tabular}{\|l\|c\|c\|}
	951	%% \hline\hline
	952	%% & Teapots & City \\\hline\hline
	953	%% frame time CHC & 130.42 & 13.91\\\hline
	954	%% \#queries & 2267 & 94.04\\\hline\hline
	955	%% \multicolumn{3}{\|l\|}{assume 2 frames visible}\\\hline
	956	%% frame time & 123.84 & 13.27 \\\hline
	957	%% \#queries & 1342 & 42 \\\hline\hline
	958	%% \multicolumn{3}{\|l\|}{pixel threshold 50 pixels and assume 2 frames visible}\\\hline
	959	%% frame time & 73.22 & 11.47 \\\hline
	960	%% \#queries & 1132 & 50 \\\hline\hline
	961	%% \end{tabular}
	962	%% \caption{Influence of optimizations on two test scenes. All values are averages
	963	%% over all frames, frame times are in [ms].}\label{tab:assumed}
	964	%% \end{table}
	965
	966
	967	\begin{table}
	968	\centering \footnotesize
	969	\begin{tabular}{\|l\|c\|c\|c\|c\|}
	970	\hline\hline
	971	scene & t$_{av}$ & n$_{vp}$ & \#queries & frame time [ms] \\ \hline\hline
	972	\multirow{3}{*}{Teapots} & 0 & 0 & 2827 & 81.18 \\ \hhline{~----}
	973	& 2 & 0 & 1769 & 86.31 \\ \hhline{~----}
	974	& 2 & 25 & 1468 & 55.90 \\ \hline\hline
	975	\multirow{3}{*}{City} & 0 & 0 & 345 & 8.47 \\ \hhline{~----}
	976	& 2 & 0 & 192 & 6.70 \\ \hhline{~----}
	977	& 2 & 25 & 181 & 6.11 \\ \hline\hline
	978	\multirow{3}{*}{Power Plant} & 0 & 0 & 263 & 38.73 \\ \hhline{~----}
	979	& 2 & 0 & 126 & 31.17 \\ \hhline{~----}
	980	& 2 & 25 & 120 & 36.62 \\ \hline\hline
	981	\end{tabular}
	982
	983	\caption{Influence of optimizations on the CHC method.
	984	$t_{av}$ is the number of assumed visibility frames for conservative
	985	visibility testing, n$_{vp}$ is the pixel threshold for
	986	approximate visibility.}\label{tab:assumed}
	987	\end{table}
	988
	989
	990	\subsection{Comparison to PVS-based rendering}
	991
	992	We also compared the CHC method against precalculated visibility. In
	993	particular, we used the PVS computed by an offline visibility
	994	algorithm~\cite{wonka00}. While the walkthrough using the PVS was
	995	1.26ms faster per frame on average, our method does not require costly
	996	precomputation and can be used at any general 3D position in the
	997	model, not only in a predefined view space.
	998
	999
	1000	\section{Conclusion}
	1001	\label{sec:conclusion}
	1002
	1003	We have presented a method for the optimized scheduling of hardware
	1004	accelerated occlusion queries. The method schedules occlusion queries
	1005	in order to minimize the number of the queries and their latency.
	1006	This is achieved by exploiting spatial and temporal coherence of
	1007	visibility. Our results show that the CPU stalls and GPU starvation
	1008	are almost completely eliminated at the cost of a slight increase in
	1009	the number of rendered objects.
	1010
	1011	%Additionally it schedules the queries in an order
	1012	%minimizing the overhead due to the latency of availability of query
	1013	%results.
	1014
	1015	Our technique can be used with practically arbitrary scene
	1016	partitioning data structures such as kD-trees, bounding volume
	1017	hierarchies, or hierarchical grids. The implementation of the method
	1018	is straightforward as it uses a simple OpenGL interface to the
	1019	hardware occlusion queries. In particular, the method requires no
	1020	complicated geometrical operations or data structures. The algorithm
	1021	is suitable for application on scenes of arbitrary structure and it
	1022	requires no preprocessing or scene dependent tuning.
	1023
	1024	We have experimentally verified that the method is well suited to the
	1025	\nvoq{} supported on current consumer grade graphics hardware. We
	1026	have obtained an average speedup of 4.0--4.7 compared to pure view-frustum
	1027	culling and 1.6--2.6 compared to the hierarchical
	1028	stop-and-wait application of occlusion queries.
	1029
	1030	The major potential in improving the method is a better estimation
	1031	of changes in the visibility classification of hierarchy nodes. If
	1032	nodes tend to be mostly visible, we could automatically decrease the
	1033	frequency of occlusion tests and thus better adapt the method to the
	1034	actual occlusion in the scene. Another possibility for improvement is
	1035	better tuning for a particular graphics hardware by means of more
	1036	accurate rendering cost estimation. Skipping occlusion tests for
	1037	simpler geometry can be faster than issuing comparably expensive
	1038	occlusion queries.
	1039
	1040
	1041
	1042	\begin{figure*}[htb]
	1043	\centerline{
	1044	\hfill
	1045	\includegraphics[height=0.2\textwidth,draft=\DRAFTFIGS]{images/teapots}
	1046	\hfill
	1047	\includegraphics[height=0.2\textwidth,draft=\DRAFTFIGS]{images/city}
	1048	\hfill
	1049	\includegraphics[height=0.2\textwidth,draft=\DRAFTFIGS]{images/pplant}
	1050	\hfill
	1051	}
	1052	\caption{The test scenes: the teapots, the city, and the power plant.}
	1053	\label{fig:scenes}
	1054	\end{figure*}
	1055
	1056	%% \begin{figure}[htb]
	1057	%% \centerline{
	1058	%% \includegraphics[width=0.2\textwidth,draft=\DRAFTFIGS]{images/city}
	1059	%% }
	1060	%% \caption{The the city scene.}
	1061	%% \label{fig:images1}
	1062	%% \end{figure}
	1063
	1064	%% \begin{figure}[htb]
	1065	%% \centerline{
	1066	%% \includegraphics[width=0.25\textwidth,draft=\DRAFTFIGS]{images/teapots}
	1067	%% }
	1068	%% \caption{The the teapot scene.}
	1069	%% \label{fig:images2}
	1070	%% \end{figure}
	1071
	1072	%% \begin{figure}[htb]
	1073	%% \centerline{
	1074	%% \includegraphics[width=0.2\textwidth,draft=\DRAFTFIGS]{images/pplant}
	1075	%% }
	1076	%% \caption{The the power plant scene.}
	1077	%% \label{fig:images3}
	1078	%% \end{figure}
	1079
	1080

Note: See TracBrowser for help on using the repository browser.

Download in other formats: