Assignment 2: Fast Construction of BVHs

Due Date: Tuesday, May 6, 11:59PM.

04/29 - Added additional 'villa' test scene.


In this assignment, you will implement an efficient algorithm for building BVH trees for ray intersection acceleration. The algorithm you implement should be faster than the surface area heuristic-based approach currently implemented in pbrt, though the trees it builds generally aren't as high quality.

Step 1: Read and digest


For both of these papers, ignore the discussion of the issues related to mapping the algorithm to GPUs; just focus on the core ideas of Morton codes and sorting them to build BVHs (the Lauterbach et al paper) and the top-down construction algorithm based on finding where particular bits change (Garanzha). Further, don't worry about the HLBVH variant or the hybrid SAH-LBVH algorithms.

Step 2: Review pbrt's current BVH construction code

Review section 4.4 of Physically Based Rendering, which describes the BVH accelerator in pbrt, and review the source files accelerators/bvh.h and accelerators/bvh.cpp.

Step 3: Build and run pbrt

See the page on building pbrt for information about downloading and building pbrt. Make sure that you can compile an unmodified version of the system, render a simple scene, and view the rendered image on your system before continuing.

Step 4: Implement the LBVH algorithm in pbrt

In pbrt scene description files, the algorithm used for building BVHs is specified via a string parameter to the Accelerator directive. For example,

  Accelerator "bvh" "string splitmethod" "sah"

First, you'll want to add support for a “lbvh” method; you'll need to modify the BVHAccel constructor to call your code when the splitmethod string (named “sm” in the constructor's code) is “lbvh”.

Recall that when BVHs are built in pbrt, the construction code first creates a pointer-based BVH of BVHBuildNode structures; these are then turned into a more compact representation of LinearBVHNodes. You only need to worry about building a tree of BVHBuildNodes–thus, in the constructor, you'll want to call your own code instead of recursiveBuild() when “lbvh” is selected. Your function should in turn return a BVHBuildNode * that points to the root of the tree.

These are the main steps you should implement:

  1. Compute a bounding box of the primitive centroids.
  2. Compute quantized integer (x,y,z) coordinates for each primitive with respect to the bounding box. Use m=10 bits for each component, which means you should have values between 0 and 1023.
  3. Convert the coordinates to be Morton coded.
  4. Sort the Morton coded indices using a radix sort. (Note that you'll need to keep track of which Primitive each Morton code is associated with!)
  5. A top-down build of the BVH can be done by, starting with the highest bit:
    1. Searching for the offset in the array of Morton indices where the high bit goes from zero to one.
    2. Splitting the primitives in half at that offset.
    3. Recursively building sub-trees with the two sets of primitives, moving to the next lower bit.
    4. Stopping the recursion when a small number of primitives are left.

See Figure 3 and 4 and discussion in section 3.2 of Garanzha et al's paper to better understand how the primitives are split.

Step 5: Test your implementation

Download the test scenes; each scene has two variants, one with an “Accelerator” statement to use a SAH-based BVH and one to use your LBVH.

One more, extra-complicated test scene: villa.tar.gz

Render each scene with both variants and compare the resulting images. You may find the exrdiff program from the pbrt distribution to be useful compare the two images for each of the two renderings. (It's ok if your images have a small number of differences as long as those differences are not noticeable when compared visually.) If geometry has obviously disappeared, then you have a bug to chase down.

Step 6: Evaluation

Evaluate the performance of your implementation versus the SAH-construction algorithm. You will probably find the Timer class in core/timer.h useful. For example, you might add code like this to the BVHAccel constructor. Note that you should start the timer right before recursiveBuild() or your LBVH building code is called.

Timer myTimer;
// build LBVH or call regular recursiveBuild(), depending...
printf("Elapsed time to build LBVH: %.2f seconds\n", myTimer.Time());

For each scene, record the results for building the BVH as well as the time to render the scene with the BVH.

In your writeup, report and discuss these performance numbers. Can you explain why they vary for different scenes?


Carefully review the mechanics of the code in the recursiveBuild() function. Important things to note (and do in your BVH construction code!) include:

  1. Call buildArena.Alloc<BVHBuildNode>() to allocate a new BVHBuildNode *. (Don't use new, which will be much slower.)
  2. Call (*totalNodes)++ for each node you create so that the final node count is correct.
  3. When you create a leaf node, add the primitives in the leaf to the orderedPrims vector.
  4. Make sure you compute correct primitive bounding boxes to pass to InitLeaf() when you create a leaf node.

It's easiest to implement a radix sort that operates on 1 bit at a time and loops over the data 30 times (30 from 3 dimensions * m = 10). A more efficient radix sort would probably consider more but not too many bits at a time and make fewer passes over the data. You may want to investigate this trade-off.

A linear search isn't the fastest way to find the offset where a given bit has changed in a sorted array.

It's useful to understand the observation that if we have 3D points represented with 3m Morton-interleaved bits, then this corresponds to a grid with 2^m voxels in each dimension. Further, if we only consider the 3n high-order bits, this corresponds to a lower-resolution grid superimposed on the high-resolution one with 2^n voxels in each dimension. Understanding this point well will be very helpful for debugging.


To submit your work, do the following:

  1. Make a wiki page containing your writeup and evaluation (as described above) at user:<yourUserName>:assignment2. Put a link to this page on your user home page (user:<yourUserName>:home).
  2. Create a .zip or .tar.gz file that contains your accelerators/bvh.h and accelerators/bvh.cpp files.
  3. Upload this archive to user:<yourUserName>:assignment2.{zip|tar.gz}. Put a link to this file on the assignment 2 page you made in step 1.

You can submit multiple times if you wish to make changes after your initial submission; we will grade the submission with the latest timestamp.


In addition to the correctness of your code, the quality of analysis in your writeup will determine your final grade.

0 points: Code doesn't compile, crashes when run, or produces no output.

1 point: Code does not work correctly: crashes or generates invalid BVHs.

2 points: Code works correctly with some but not all scenes.

3 points: Correct implementation, but minimal writeup (e.g. missing questions, etc), or no analysis of the observed results.

4 points: Completion of all requirements for 3 points as well as a clear writeup and analysis of the observed performance.

Extra credit:

One point of extra credit will go to the student who implements the fastest (correct!) LBVH builder, as measured by your TA on a machine of his choosing, using a different scene than the ones we provided for your writeup.

For one point of extra credit, modify your LBVH construction code to be multi-threaded, using the EnqueueTasks() and WaitForAllTasks() functions from core/parallel.h in pbrt. Add information to your writeup describing your strategy for decomposing the various phases of the algorithm into tasks and compare BVH construction time with your parallel implementation to your serial implementation. How close is your observed speedup to ideal? What could be the cause of any discrepancies seen here?