SIGGRAPH 2012 – Monday

Pointed Illumination session http://s2012.siggraph.org/attendees/sessions/100-119

Progressive Lightcuts for GPU (Progressive Lightcuts for GP Homepage – ACM digital library version)

The first presentation of the session was about offloading the lightcuts process (explained the previous day) to the GPU and producing a progressive result since the progressive result converges to the final result quite quickly. The obvious idea of caching where the tree was cut per pixel was thrown out due to the massive amount of memory required to store that state. Instead they average several Lightcuts images based on different sets of VPLs. The system is limited to using a heapless traversal on GPU and schedules CPU work if the depth of the cut is too great. This presentation also demonstrated a new way to clamp the lighting contribution that helps define the number of iterations required.

It seemed that the major contribution was the new clamping method which was skated over quite quickly referring to “Mathematica happens”.

SGRT: A Scalable Mobile GPU Architecture Based on Ray Tracing (ACM digital library version)

In this presentation, Won Jong Lee presented a ray tracing based GPU. He started by showing that ray tracing requires more FLOPs than available on current mobile GPUs and the underlying process doesn’t map well to multithreaded SIMD operations due to the incoherence between rays. Their first question was whether to go fixed function or programmable. Fixed function is lower power but programmable elements are required for things like ray generation and surface shaders.

The underlying hardware is split into several parts. The fixed function traversal and intersection system is based on T&I Engine: Traversal and Intersection Engine for Hardware Accelerated Ray Tracing presented at SIGGRAPH Asia last year. Internally the system supports optionally restarting traversal based on storing a short stack of kd-tree nodes. The Ray Accumulation Unit gathers rays that hit the same cache line but are still waiting for that data.

The numbers given for 2 test scenes rendered at 800×480 were as follows:
Fairy – 255M rays/sec, 87 fps
Ferrari – 170M rays/sec, 67 fps
These compare favorably with Kepler ray tracing figures of 156-317M rays/sec.

Overall, the system looks interesting although there’s several interesting things.
* The example was based on Samsung’s reconfigurable processor which represented the bulk of the posters at HPG (for example A Scalable GPU Architecture based on Dynamically Reconfigurable Embedded Processor). It appears to be an area of research investment at Samsung that is being reused in many different hardware projects.
* The BVH is currently generated CPU side and they’ve not investigated too far into that so there’s no support for dynamic scenes.
* The clock rate being discussed was 1Ghz which would result in burnt fingers if the battery lasts long enough.
* The presenter asserted many times that this was the first known hardware implementation but I thought Imagination (of PowerVR fame) had already been looking at something called Caustic and I managed to find the Imagination Caustic PowerVR ray-tracing hardware reference platform as announced at SIGGRAPH this year.

Point-Based Global Illumination Directional Importance Mapping

This presentation by Eric Tabellion showed 2 applications of importance mapping to Point-Based Global Illumination (PBGI). For a quick recap, PBGI involves creating point samples of a scene, clustering them and then sampling the clusters based on a solid angle error to gather global illumination. Eric showed some live demos of the points selected for rendering from the point of view of a position being illuminated and that gave a good idea of the blockiness of the point sprites used in rasterization of the cubemap that represents the GI.

The first use of importance mapping showed that more important directions need higher quality and the BRDF bounds a cone of normals based on the roughness of the surface. This means that the solid angle error metric can be varied proportional to the roughness in order to rasterize finer points for smoother surfaces.

The second use was similar, but was based on the use of high dynamic range environment maps. Although mipmapping the HDR envmap for quick lookup already, they also updated the system to base the solid angle error on the luminance sampled from the envmap.

Following from the PBGI presentations at HPG/EGSR I found this a really interesting talk with a takeaway that could be applied by anyone with a working PBGI implementation. The results also highlighted the time that can be saved when working with importance mapping, cutting a 4 hour render to 90 minutes.

Ill-Loom-inating Handmade Fabric in “Brave”

This presentation demonstrated the techniques used to solve the complex requirements of the fabric in the film Brave.

The first attempt was a ray marcher in tangent space against distance field representing the fibers of the fabric (as soon as I heard distance field I thought it was possibly due to Inigo Quilez having joined Pixar and his history of ray marching demos). The example shown was highly detailed, going so far as being able to see the fibers separate when the material was compressed. The first problem was a lack of a silhouette on curved areas, so the marching was changed to bend the ray (rather than the distance field) based on the curvature, as is usual in distance field renderers. One advantage of the distance field system is that it can be used to calculate local ambient occlusion. However, there were several disadvantages such as AA support, shadows and support for the existing lighting model.

The second attempt was based on the use of RiCurves in Pixar’s RenderMan and was named Loom. The system procedurally generated the fabric based on subds. At render time, the aim is to only generate the geometry required for one face at a time and the neighbors in an effort to reduce memory usage. However there were still problems with memory usage as well as geometry LODding and shadows. This technique was used a lot in the final film, with the presenter focussing on a tapestry which appeared to be heavily used in the film.

As part of the Q&A, someone asked about how his first technique compared with Relief Mapping. The presenter highlighted that he had only a procedural description of the surface and Relief Mapping typically works from a 2.5D description.

Although not as practical as the preceding PBGI talk, this presentation was more of an inspiration and a review of a set of different ways of achieving the same goal.

Virtual Texturing in Software and Hardware http://s2012.siggraph.org/attendees/sessions/virtual-texturing-software-and-hardware

All of the slides for this course are on J.M.P. van Waveren’s site.

The first time I actually understood exactly how Sparse Virtual Textures worked was at Sean Barrett’s (@nothings) GDC 2008 talk (site here) which was performed from a set of slides stored in a megatexture.

The session started with an overview of the technique, reusing some of the recognizable images from the GDC 2008 talk and you could read either version to catch up. The session then moved on to the practical use of software to implement virtual textures in Rage.

Most of the Rage content has also been covered before in J.M.P. van Waveren’s id Tech 5 Challenges talk at SIGGRAPH 09 with this year’s talk mostly focussing on the work addressing the virtual texture, and the issues with filtered texture sampling from virtual textures, especially in the case of anisotropic filtering which can apparently be artifact-y.

The meat of the talk was AMD’s presentation of how their Partial Resident Texture (PRT) system works. In a similar way that Sean Barrett defined sparse virtual texturing in terms of Virtual Memory (VM), the presenter started with an overview of the hardware Virtual Memory subsystem on a modern GPU, and a description of how every texture sampling operation uses the VM to find the physical address for the texture data. PRT works by allowing you to allocate virtual address space and only map the pages of physical memory you want to access for your texture, meaning that your actual memory usage can be lower or you can use the indirection to relocate blocks of your texture. However, this means when you sample your texture, it’s possible to hit unmapped pages and this error condition needs to be returned to the caller of texture lookup and in theory handled in some way. This section of the talk finished describing the kind of work required inside a driver to expose this functionality.

Next up was the AMD_sparse_texture extension to OpenGL which is AMD’s method for exposing this functionality (covered in some detail here). The extension adds:

1) glTexStorageSparseAMD() as a simple replacement for glTexStorage2D() when you want to allocate virtual memory space to use as a PRT.
2) New queries for texture page sizes since they vary with format (glGetInternalFormativ() with GL_VIRTUAL_PAGE_SIZE_X/Y/Z_AMD) and GL_MIN_SPARSE_LEVEL_AMD to get the level at which mips are packed.
3) An extra texture parameter, GL_MIN_WARNING_LOD_AMD, defining the low water mark for mips – which I assume means the point at which you need higher res mipmaps, since I’m not sure of the value of the alternative. If the watermark is hit, a warning is returned to the shader, but at least the data is valid. (It would be nice if this were stored to a flag on the texture).
4) GLSL sparseTexture() sampling methods which return flags and additional methods to check the flags. The previous texture() methods still work, but they have undetermined results if the texture is not resident, so you still need to manage that externally.

You are also able to use the sparse storage for render targets where writes to unmapped memory are just discarded. A list of possible issues such as running out of virtual or physical memory, or GL texture size limitations (for example 4Kx4K) were also covered, although in the case of GL, you could allocate large texture arrays or volumes. As future extensions, the most interesting was extending this VM support to other GL types such as the ubiquitous buffers.

As part of the Q&A session, the questions was one I’d already noted to ask: What happens when filtering if one of the samples is unavailable? The lookup fails.

The last part of the session was an example of Rage running with PRTs which was based on running a version of the engine that could swap between software and hardware virtual texturing. Once the required assets were cached (once for each) you could swap between the SW and HW versions. This was the first chance to see the artifacts from the anisotropic sampling, thanks to the super-magnifying glass that they applied to the SW version to find a microscopic line of pixel fail – something I’d never seen before on the PS3 version. They also demonstrated the better texture sampling result when using the HW since they could increase the anisotropy level when sampling higher frequency textures such as tire tracks.

A couple of things came up here and in the final Q&A:

1) Rage still uses a prepass to detect required pages, rather than using the errors returned when texture sampling.
2) Sampling from a lower mip on lookup fail requires looping in the shader.
3) There’s no real LRU support in the PRT system – you’d still need to go back to the feedback pass method.
4) In the case of the 3 textures per surface in Rage, for SW virtual textures, there’s one VM (texture) lookup before sampling, but in the HW version, there’s a VM (HW) lookup per texture. (Of course the SW version has 3 HW VM lookups too).

Overall, I’m glad I understand how it works, but I’m not sure of the practical use. My only ideas so far:

1) The obvious megatexturing.
2) Streaming of higher resolution mips.

I’m not sure of the mileage you’d get trying to apply PRTs to sparse voxel routines since it all really would add is a high level error return. I’m looking forward to seeing more practical applications.

Surf & Turf http://s2012.siggraph.org/attendees/sessions/100-121

From a Calm Puddle to a Stormy Ocean: Rendering Water in Uncharted (NaughtyDog) (ACM digital library version – similar to the GDC 2012 presentation)

The presentation started with the water tech from the earlier versions of Uncharted based on multiple layers, such as refraction based on depth, soft shadows, and foam. The system blended 2 textures advected by flow and offset in phase (very similar to Valve’s Water Flow in Portal 2 presented at SIGGRAPH 2010). For the triangle mesh representing the surface, they also mentioned that they move the vertices in circles.

The talk then moved on to the ocean in Uncharted 3. Rather than use Gerstner or Tessendorf, they wanted something simpler and settled on Wave Particles presented at SIGGRAPH 2007 which offered a lot of advantages, although the only ones I recall were SPU friendliness and tile-ability. Another example from Uncharted 3 was the flood in the corridors of the ocean liner. The flood itself was based on a simulation in Houdini and was rendered based on a skin with a set of joints (apparently enough joints to reach the limits of the animation system). The particle effects during the flood were actually placed by hand. They also covered the forces applied to the objects in the water which was simply based on an average position and normal.

The whole talk showed how Naughty Dog are a team that combines artistic vision with a strong technical implementation.

What if the Earth Was Flat: The Globe UI System in SSX (EA) (ACM digital library version)

This talk presented the mechanisms used to render the globe in SSX which had a specific set of limitations: It should be available from anywhere in the game and it could only use a minimal amount memory. This meant that they need to go procedural, basically render a quad and populate with a heavy shader – apparently it ended up being a 600 line shader and was described as “not pretty” and at the middle of development it was 17ms on 360 and 23ms on PS3.

The mountains were rendered with the Relaxed Cone Stepping for Relief Mapping technique. The actual memory used was limited to 2 textures, which contained the diffuse image of the earth’s surface, a cloud channel, the cone information and the height. Other effects were based on simple tricks using this information, such as using the height map to generate a specular value, ie. low is ocean.

Other random things that came up:
* They missed the perspective transform of the globe when rendering the quad – similar to what is covered here.
* The quad was changed to an octagon to avoid rendering pixels that didn’t represent the globe – a trick mentioned elsewhere at SIGGRAPH this year by Emil Persson.
* Many hacks were applied, such as passing through cloud, when transitioning from globe to close up to avoid artifacts.

Although the problem being solved was interesting, none of the techniques used were that new or revolutionary, and this was just a presentation of producing nice effects within a set of minimal requirements.

Adaptive Level-of-Detail System for End of Nations (Petroglyph Games) (ACM digital library version)

This presentation was about automatic control of the use of level of detail (LOD) in an RTS environment where there’s a high object count and varying load with a large possible number of players (56 in this case). The game includes a frame rate monitor that is able to manipulate various detail settings in an attempt to maintain at least a minimum frame rate. When the system was implemented, the developers informed the QA and art teams so that they would be aware of possible quality drops in response to low frame rates. The changes made to the levels of detail are based on voting and a hysteresis is applied to avoid toggling the changes each frame in response to the frame rate improvement.

The finale of the talk was an example of the LOD control system being applied was in a frantic scene with a lot of units mutually attacking each other overlaid with various effects. In my eyes, the effects of the LOD system couldn’t be seen until they got to the point of their “nuclear option” which was toggling the list of units to get rendered each frame, resulting in flickering units. I think this is because most of the content in an RTS after units and projectiles are arbitrary eye candy and their loss doesn’t affect someone playing the game.

Screen Space Decals in Warhammer 40,000: Space Marine (Relic) (ACM digital library version)

I missed this due to needing to leave as it began.

Electronic Theater

The Electronic Theater is one of my best sources of inspiration at the SIGGRAPH conference. Some of my favourites:

For The Remainder
I found this animation beautiful and wished I was one of programmers involved since the credits mention NPR software and tools programming.

How to Eat Your Apple
This was an intricate video which reminded me of Lil and Laarg from Escape Plan.

Mac ‘n’ Cheese
A frenetic chase with great animation.

Wanted Melody (Extract (NSFW))
A weird one that had the audience going ‘huh’ at the start and laughing like drains by the end.

Leave a comment