Lecture 8 : High-Performance Ray Tracing
Download as PDF


For extremely large rendering projects (e.g. an feature-length film), I would expect that using a task abstraction would make submitting the rendering project to a map-reduce system like Hadoop trivial. The map operation is the problem of dividing the project into tasks, and the reduce operation is that of combining the results of tasks to produce a larger picture. In addition to handling the problem of load balancing automatically, high-quality map-reduce implementations have also solved issues like instrumentation (statistics and timing) and fault-tolerance.


What ties the implementation of the atomic types to associative operations? I looked at a few C++11 references and saw that atomic is only implemented for integral, pointer, and Boolean types, but I didn't find anything about associativity being essential to the guarantee of atomicity. (Java also doesn't offer atomic floating-point types, and I've always wondered why not.)


It's mostly an issue of ensuring consistent results across different execution schedules. In other words, one definitely could implement hardware that provided atomic operations for non-associative operations (like floating-point addition), but then a program using them could generate different results across different runs (which is an undesirable property, in most cases.)

In other words, if some memory location had a value A and then two threads did atomic floating-point addition of values B and C, respectively, to A, one execution schedule could cause a series of operations ordered like:

A = (A + B) + C

and another schedule could cause:

A = (A + C) + B

which may give a different final value for A, with floating-point. For rendering, we may not care about this, but I think the general philosophy is that most application areas do care about consistency in this area.

FWIW in core/parallel.h in the PBRT code there's an atomic floating-point add built on top of atomic compare and exchange, which everyone does support for floats (since it's all just bits in that respect...)


E also overlaps triangle 5. Why don't we add 5 to E in the tree?


Good question; the structure depicted here is a bounding volume hierarchy (vs. a kd-tree), so it's based on subdividing primitives into disjoint sets. An implication of this approach is that the bounding boxes of the two children of a node may overlap.

So, here, the construction algorithm decided to group (3,4) together and to put (3,4) and (5) as children of node D. It happens to be that E and F's bounding boxes overlap, which is fine; this just means that some rays that visit D (like r7) will visit both children of D.

We don't want to add 5 to E, since then rays like r4 that hit E but aren't close to 5 would be tested against 5.