HPG 2013

This year HPG took place in Anaheim on July 19th-21st, collocating and running just prior to SIGGRAPH. The program is here.

Friday July 19

Advanced Rasterization

Moderator: Charles Loop, Microsoft Research

Theory and Analysis of Higher-Order Motion Blur Rasterization Site  Slides
Carl Johan Gribel, Lund University; Jacob Munkberg, Intel Corporation; Jon Hasselgren, Intel Corporation; Tomas Akenine-Möller, Lund University/Intel Corporation

The conference started with a return to Intel’s work on Higher Order Rasterization. The presentation highlighted that motion is typically curved rather than linear and is therefore better represented by quadratics. The next part showed how to change the common types of traversal to handle this curved motion. The presenter demonstrated Interval and Tile based methods and how to extend them to handle quadratic motion. This section introduced Subdividable Linear Efficient Function Enclosures (SLEFES) which I’d not heard of before. SLEFES allows you to give tighter bounds on a function over an interval which are better than the convex hull of control points that you’d typically use – definitely something to look at later.

PixelPie: Maximal Poisson-disk Sampling with Rasterization Paper Slides (should be)
Cheuk Yiu Ip, University of Maryland, College Park; M. Adil Yalçi, University of Maryland, College Park; David Luebke, NVIDIA Research; Amitabh Varshney, University of Maryland, College Park

All Poisson-disk sampling talks start with a discussion of the basic dart-throwing and rejection based implementation first put forward in 1986, before going into the details of their own implementation. The contribution of this talk was the idea of using rasterization to maintain the minimum distance requirement. This is handled by rendering disks which will occlude each other if overlapping, where overlapping means too close – simple but effective. Of course there’s a couple of issues. Firstly there’s some angular bias due to the rasterization if the radius is small because of the projection of the disk’s edge to the pixels. The other problem was that even once you have a good set of initial points, there’s extra non-rasterization compute work to handle the empty space via stream compaction. One extra feature you get cheaply is support for importance sampling since you can change the size of each disk based on some additional input. This was shown by using the technique to select points that map to features on images – something I’d not seen before.

Out-of-Core Construction of Sparse Voxel Octrees Paper Slides
Jeroen Baert, Department of Computer Science, KU Leuven; Ares Lagae, Department of Computer Science, KU Leuven; Philip Dutré, Department of Computer Science, KU Leuven

The fundamental contribution from this talk was the use of Morton ordering when partitioning the mesh to minimize the amount of local memory when voxelising. One interesting side effect of this memory reduction is improved locality resulting in faster voxelization. In the example cases, this meant that the tests with 128MB were quicker than 1GB or 4GB. The laid back nature of the presenter and the instant results made it feel like you could go implement it right now, but then the source was made available taking the fun out of that idea!

Shadows

Moderator: Samuli Laine, NVIDIA Research

Screen-Space Far-Field Ambient Obscurance Slides Site including source Paper (Video)
Ville Timonen, Åbo Akademi University

The first thing to note is the difference between occlusion and obscurance; obscurance includes a falloff term such as a distance weight. The aim is to find a technique that can operate over greater distances, highlighting the issues previous techniques where direct sampling misses important values and the alternative of mipmapping average, minimum or maximum depth result in either flattening, or over or under occlusion. The contribution of this talk was to focus on the details important for AO based on scanning the depth map in multiple directions. This information is then converted into prefix sums to easily get the range of important height samples across a sector. The results of the technique were shown to be closer to ray traces of a depth buffer than the typical mipmap technique. One other thing I noticed was the use of a 10% guard band, so from 1280×720 (921600 pixels) to 1536×864 (1327104), a 44% increase in pixels! Another useful result was a comment from the presenter that it’s better to treat possibly occluding surfaces as a thin shell rather than a full volume since the eye notices incorrect shadowing before incorrect lighting.

Imperfect Voxelized Shadow Volumes Paper
Chris Wyman, NVIDIA; Zeng Dai, University of Iowa

The aim of this paper was interactive performance or better when generating a shadow volume per virtual point light (VPL) on an area light. The initial naive method, one voxelized shadow volume per point light, ran at less than 1 FPS. The problem is how to handle many VPLs. The first part of the solution is imperfect shadow maps (ISMs), a technique for calculating and storing lots of small shadow maps generated from point splats within the scene with the gaps filled in (Area Lights are actually described as another application in the ISM paper). After creating an ISM, each shadow sub-map can processed in parallel. The results looked good with a lot of maps and there’s the ability to balance the number of maps against their required size in the ISM. For example, a sharper point light could use the entire ISM space for a single map for sharpness, but a more diffuse light with many samples could pack more smaller maps into the ISM.

Panel: High-Performance Graphics in Film

Moderator: Matt Pharr

Dreamworks, Eric Tabellion; Weta Digital, Luca Fascione; Disney Animation, David Adler / Rasmus Tamstorf; Solid Angle, Thiago Ize / Marcos Fajardo

Introductions:

Disney
Use OpenGL in some preview tools
Major GPU challenges are development and deployment
They are interested in the use of compute and are hiring a research scientist

PDI Dreamworks
OpenGL display pipeline for tools
Useful for early iterations
Also mentioned Amorphous – An OpenGL Sparse Volume Renderer

Weta
Highlighted that the production flow included kickback loop where everything fed back to an earlier stage
Not seeing GPU as an option

Arnold
Long code life – can’t be updated to each new language/driver
Reuse of hardware too
Highlighted that 6GB GPUs cost $2k (and I was thinking a PS4 was much less than that and had more memory)
Preview lighting must be accurate including errors of final render

Questions: (replies annotated with speaker/company where possible)

How much research is reused in Film?
Tabellion: The relevant research is used.
Disney: Other research used, not just rendering i.e. physics
Thiago: Researchers need access to data
Kayvon: Providing content to researchers has come up before. And the access to the environment too – lots of CPUs.
Tabellion: Feels that focus on research may be more towards games at HPG
Need usable licenses and no patents
Lots of work focused on polys and not on curves
Need to consider performance and memory usage of larger solutions

Convergence between films and games
Tabellion: Content production – game optimize for scene, film is many artists in parallel with no optimisation
Rasmus: Both seeing complexity increase
Weta: More tracking than convergence. Games have to meet hard limit of frame time

Discussion of Virtual Production
Real time preview of mocap in scene
With moveable camera tracked in the stage

Separate preview renderer?
Have to maintain 2 renderers
Using same [huge] assets – sometimes not just slow to render but load too
Difficult to match final in real time now moving to GI and ray tracing

Work to optimise management of data
Lots of render nodes want the same data
Disney: Just brute forces it
Weta: Don’t know of scheduler that knows about the data required. Can solve abstractly but not practically. Saw bittorrent-like example.

What about exploiting coherence?
Some renders could take 6-10 hours, but need the result next day so can’t try putting two back-to-back

Do you need all of the data all of the time? Could you tile the work to be done?
Not in Arnold – need all of the data for possible intersections
Needs pipeline integration, render management

Example of non water tight geometry – solving in Arnold posted to JCGT (Robust BVH Ray Traversal)
Missing ray intersection can add minutes of pre processing and gigs of memory

Double precision?
Due to some hacks when using floats, you could have done it just as fast in double instead
Arnold: Referred to JCGT paper
Disney: Don’t have to think when using doubles
Tabellion: Work in camera space or at focal point
Expand bvh by double precision – fail – look up JCGT paper

Saturday July 20

Keynote 1: Michael Shebanow (Samsung): An Evolution of Mobile Graphics Slides

Not a lot to report here and the slides cover a lot of what was said.

Fast Interactive Systems

Moderator: Timo Alia, NVIDIA Research

Lazy Incremental Computation for Efficient Scene Graph Rendering Slides Paper
Michael Wörister, VRVis Research Center; Harald Steinlechner, VRVis Research Center; Stefan Maierhofer, VRVis Research Center; Robert F. Tobler, VRVis Research Center

The problem with the scenegraph traversal in this case was the cost. The aim was to reduce this cost by maintaining an external optimized structure and propagate changes from the scenegraph to this structure. Most of the content was based on how to minimize the cost of keeping the two structures synchronized and the different techniques. Overall, using the caching did improve performance since it enabled a set of optimizations. Despite the relatively small amount of additional memory required, I did note a 50% increase in startup time was mentioned.

Real-time Local Displacement using Dynamic GPU Memory Management Site
Henry Schäfer, University of Erlangen-Nuremberg; Benjamin Keinert, University of Erlangen-Nuremberg; Marc Stamminger, University of Erlangen-Nuremberg

The examples for this paper were footsteps in terrain, sculpturing and vector displacement. The displacements are stored in a buffer dynamically allocated from a larger memory area and then sampled when rendering. The storage of the displacement is based on an earlier work by the same authors: Multiresolution Attributes for Tessellated Meshes. The memory management part of the work seems quite familiar having seen quite a few presentations on partially resident textures. The major advantage is that the management can take place GPU side, rather than needing a CPU to update memory mapping tables.

Real-Time High-Resolution Sparse Voxelization with Application to Image Based Modeling (similar site)
Charles Loop, Microsoft Research; Cha Zhang, Microsoft Research; Zhengyou Zhang, Microsoft Research

Ths presentation introduced an MS research project using multiple cameras to generate a voxel representation of a scene that could be textured. The aim was a possible future use as a visualization of remote scenes for something like teleconferencing. The voxelization is performed on GPU based on the images from the cameras and the results appear very plausible with only minor issues on common problem areas such as hair. It looks like fun going on the videos of the testers using it.

Building Acceleration Structures for Ray Tracing

Moderator: Warren Hunt, Google

Efficient BVH Construction via Approximate Agglomerative Clustering Slides Paper
Yan Gu, Carnegie Mellon University; Yong He, Carnegie Mellon University; Kayvon Fatahalian, Carnegie Mellon University; Guy Blelloch, Carnegie Mellon University

This work extends the agglomerative clustering described in Bruce Walter et al’s 2008 paper Fast Agglomerative Clustering for Rendering to improve performance by exposing additional parallelism. The parallelism comes from partitioning the primitives to allow multiple instances of the agglomeration to run in their own local partition. This provides a greater win at the lower level where most of the time is typically spent. The sizing of the partitions and number of clusters in each partition leads to a parameters that can be tweaked to provide choices between speed and quality.

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Slides Page
Tero Karras, NVIDIA; Timo Aila, NVIDIA

This presentation started with the idea of effective performance, based on the number of rays traced per unit rendering time, but rendering time includes the time to build your bounding volume hierarchy as well as the time to intersect rays with that hierarchy, so you need to balance speed and quality of the BVH. This work takes the idea of building a fast low quality BVH (from the same presenter at last year’s HPG – Maximizing Parallelism in the Construction of BVHs, Octrees, and kd Trees) and then improving the BVH by optimizing treelets, subtrees of internal nodes. Perfect optimization of these treelets is NP-hard based on the size of the treelets so instead they iterate 3 times on treelets with a maximum size of 7 nodes – which actually has 10K possible layouts! This gives a good balance between performance and diminishing returns. The presentation also covers a practical implementation of splitting triangles with bounding boxes that are a poor approximation to the underlying triangle.

On Quality Metrics of Bounding Volume Hierarchies Slides Page
Timo Aila, NVIDIA; Tero Karras, NVIDIA; Samuli Laine, NVIDIA

This presentation started with an overview of the Surface Area Heuristic (SAH), which gives great results despite the questionable assumptions on which it rests. To check how well the SAH actually correlates with performance, they tested multiple top-down BVH builders and calculated how the surface area heuristic predicted the ray intersection performance of the BVH from the builder for multiple scenes. A lot of the results correlated well, but the San Miguel and Hairball scenes typically showed a loss of correlation which indicated that maybe SAH doesn’t give a complete picture of performance. Reconsidering the work done in ray tracing, an additional End Point Overlap metric was introduced for handling the points at each end of the ray which appears to improve the correlation. This was then further supplemented with another possible contribution to the cost, leaf variability, which was introduced to account for how the resulting BVH affects SIMD traversal. This paper reminded me of the Power Efficiency for Software Algorithms running on Graphics Processors paper from the previous year, leading us to question the basis for how we evaluate our work.

Hot3D

Michael Mantor, Senior Fellow Architect (AMD): The Kabini/Temash APU: bridging the gap between tablets, hybrids and notebooks
Marco Salvi (Intel): Haswell Processor Graphics
John Tynefield & Xun Wang (NVIDIA): GPU Hardware and Remote Interaction in the Cloud

Hot3D is a session that typically gives a lot of low level details about the latest hardware or tech. AMD started by introducing the Kabini/Temash APU. This was the most technical of the talks, discussing the HD 8000 GPU which features their Graphics Core Next (GCN) architecture and asynchronous compute engines – all seems quite familiar really. Intel were next discussing Haswell and covering some of the mechanisms used for lowering power usage and allowing better power control, such as moving the voltage regulator from motherboard. Marco also mentioned the new Pixel Sync features of Haswell which was covered at many times during HPG and SIGGRAPH. NVIDIA were last in this section and they presented some of their cloud computing work.

Sunday July 21st

Keynote 2: Steve Seitz (U. Washington (and Google)): A Trillion Photos (Slides very similar to EPFL 2011)

Very similar to Alexei’s presentation from EGSR last year (Big Data and the Pursuit of Visual Realism), Steve wowed the audience with the possibilities available when you have the entirety of the images from Flickr available and know the techniques you need to match them up. Scale-invariant feature transform (SIFT) was introduced first. This (apparently patented) technique detects local features in images then uses this description to identify similar features in other images. The description of the features was described as a histogram of edges. This was shown applied to images from the NASA Mars Rover to match locations across images. Next Steve introduced Structure from Motion which allows the reconstruction of an approximate 3D environment based on multiple 2D images. This allowed the Building Rome in a day project which reconstructed the landmarks of Rome based on the the million photos of Rome in Flickr in 24 hours! This was later followed by a Rome on a Cloudless day project that produced much denser geometry and appearance information. Steve also referenced other work by Yasutaka Furukawa on denser geometry generation such as Towards Internet-scale Multi-view Stereo which later lead to the tech for GL maps in Google Maps. One of the last examples was a 3D Wikipedia that could cross reference text with a 3D reconstruction of a scene from photos where auto-detected keywords could be linked to locations in the scene.

Ray Tracing Hardware and Techniques

Moderator: Philipp Slusallek, Saarland University

SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing Slides Page
Won-Jong Lee, SAMSUNG Advanced Institute of Technology; Youngsam Shin, SAMSUNG Advanced Institute of Technology; Jaedon Lee, SAMSUNG Advanced Institute of Technology; Jin-Woo Kim, Yonsei University; Jae-Ho Nah, University of North Carolina at Chapel Hill; Seokyoon Jung, SAMSUNG Advanced Institute of Technology; Shihwa Lee, SAMSUNG Advanced Institute of Technology; Hyun-Sang Park, National Kongju University; Tack-Don Han, Yonsei University

Similar to last years talk, the reasoning behind aiming for mobile realtime ray tracing was better quality for augmented reality which also reminds me of Jon Olick’s Keynote from last year and his AR results. The solution presented was the same hybrid CPU/GPU solution with updates from SIGGRAPH Asia from the Parallel-pipeline-based Traversal Unit for Hardware-accelerated Ray Tracing presentation which showed performance improvements with coherent rays by splitting the pipeline into separate parts, such as AABB or leaf tests, to allow rays to be iteratively processed in one part without needing to occupy the entire pipeline.

An Energy and Bandwidth Efficient Ray Tracing Architecture Slides Page
Daniel Kopta, University of Utah; Konstantin Shkurko, University of Utah; Josef Spjut, University of Utah; Erik Brunvand, University of Utah; Al Davis, University of Utah

This presentation was based on TRaX (TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing from 2009) and investigating how to reduce energy usage without reducing performance. Most of the energy usage is in data movement so the main aim is to change the pipeline to use macro instructions which will perform multiple operations without needing to write intermediate operands back to the register file. Also, the new system is treelet based since they can be streamed in and remain in L1 cache. The result was a 38% reduction in power with no major loss of performance.

Efficient Divide-And-Conquer Ray Tracing using Ray Sampling Slides Page
Kosuke Nabata, Wakayama University; Kei Iwasaki, Wakayama University/UEI Research; Yoshinori Dobashi, Hokkaido University/JST CREST; Tomoyuki Nishita, UEI Research/Hiroshima Shudo University

Following last year’s SIGGRAPH Naive Ray Tracing: A Divide-And-Conquer Approach presentation by Benjamin Mora, this research focuses on problems discovered with the initial implementation. These problems stem from inefficiencies when splitting geometry without considering the coherence in the rays and low quality filtering during ray division which can result in only a few rays being filtered against geometry. The fix is to select some sample rays, generate partitioning candidates to create bins for the triangles, then use the selected samples to calculate inputs for a cost function to minimize. While discussing this cost metric, they mentioned the poor estimates of the SAH metric with non-uniform ray distributions, seeming timely with Timo’s earlier presentations. The samples can also indicate which child bounding box to traverse first. The results look good although it appears to work best with incoherent rays which have a lot of applications in ray tracing after dealing with primary paths.

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs Slides Page
Samuli Laine, NVIDIA; Tero Karras, NVIDIA; Timo Aila, NVIDIA

A megakernel is a ray tracer with all of the code in a single kernel which is bad for several reasons; instruction cache thrashing, low occupation due to register consumption, and divergence. In the case of this paper, one of the materials shown is a beautiful 4 layer car paint whose shader was white and green specks of code on a powerpoint slide. A pooling mechanism (maintaining something like a million paths) is used to allow the raytracing to queue similar work to be batch processed by smaller kernels performing path generation or material intersection, reducing the amount of code and registers required and minimizing divergence. The whole thing sounds very similar to the work queuing performed in hardware by GPUs until there is sufficient work to kick off a wavefront, nicely described by Fabian Giesen in his Graphics Pipeline posts. It would be good to know what the hardware ray tracing guys think of these results since the separation of the pipeline appears similar to Won-Jong Lee’s parallel pipeline traversal unit.

Panel: Hardware/API Co-evolution

Moderator: Peter Glaskowsky (replies annotated with speaker/company where possible)

ARM: Tom Olson, Intel: Michael Apodaca, Microsoft: Chas Boyd, NVIDIA: Neil Trevett, Qualcomm: Vineet Goel, Samsung: Michael Shebanow

Introduction – Thoughts on API HW Evolution
AMD: deprecate cost of API features
Tom Olson: Is TBDR dead with tessellation? Is tessellation dead?
Intel: Memory is just memory. Bindless and precompiled states.
Microsoft: API as convergence.
NVIDIA: Power and more feedback to devs
Qualcomm: Showed GPU use cases
Samsung: Reiterated that APIs are power inefficient as mentioned in keynote

Power usage?
AMD: Good practice. Examples of power use.
ARM: We need better IHV tools
Intel, Microsoft, NVIDIA: Agree
NVIDIA: OpenGL 4 efficient hinting is difficult
Qualcomm: Offers tile based hints
Samsung: Need to stop wasting work

Charles Loop: Tessellation not dead. Offers advantages, geometry internal to GPU, don’t worry about small tris and rasterise differently – derivatives
? Possibly poor tooling
? Opensubdiv positive example of work being done
Tom: Not broken but needs tweaking

Expose query objects as first class?
Chas: typically left to 3rd parties
Not really hints but required features

When will we see tessellation in mobile? Eg on 2W rather than 200W
Qualcomm: Mobile content different
Neil: Tessellation can save power
Chas: quality will grow
Tom: Mobile evolving differently due to ratios

Able to get info on what happens below driver?
? Very complex scheduling

What about devs that don’t want to save power?
Tom: It doesn’t matter to $2 devs, but AAA
Chas: Devs will become more sensitive

Ray tracing in hardware? Current API
Chas: Don’t know but could add minor details to gpus
Samsung: RT needs all the geometry

SOC features affect usage?
Qualcomm: Heterogenous cores to be exposed to developers

Shared/unified memory?
AMD: Easy to use power
Neil: Yes we need more tools

What about lower level access?

Best Paper Award

All 3 places went to the NVIDIA raytracing team:

1st: On Quality Metrics of Bounding Volume Hierarchies Timo Aila, Tero Karras, Samuli Laine
2nd: Megakernels Considered Harmful: Wavefront Path Tracing on GPUs Samuli Laine, Tero Karras, Timo Aila
3rd: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras, Timo Aila

Next year

HPG 2014 is currently expected to be Lyon during the week of 23-27 June. Hope to see you there!

Spherical Harmonics for Beginners

Spherical Harmonics seem really hard. Most articles are equation heavy, and if you’ve not understood the equations before, seeing them again doesn’t help. Despite reading a lot about them, the first time things fell into place was when I finally found some example code I could throw some numbers at and then visualize the results. In this post I aim to cover the fundamentals of using Spherical Harmonics without the use of equations and maybe just a little code.

What are they really?

The simplest way to think of Spherical Harmonics (SH from here on in) is in terms of what you would use them for. If you have some value that varies based on direction, say for example, the effect of a light at a specific position, then you can sample it in every possible direction and store it using SH. The values are stored as an approximation so they’re quite diffuse aka blurry; you won’t be using them as ray-traced reflections.

You have choice at the level of detail at which you store the values since SH is an infinite series, so you cut it off at bands. Bands are zero indexed, and each band B adds 2B + 1 values to the series. Bands are gathered by order, where order O means the sum of all bands up to O-1*, so order 1 requires 1 value, order 2 needs 4 and order 3 needs 9 – which is typically where most implementations stop. This is because the coefficient used when applying the 3rd band is zero, so the data is somewhat redundant in this case. Then you can consider what the values actually mean at each band; the single value for band 0 could be used as ambient occlusion term and the three values for band 1 could be considered something like a bent normal. Each subsequent band adds detail.

And then, once you have your SH coefficients, you can add, scale and rotate them. Adding means that you can accumulate the effects of multiple lights, scaling means that you can lerp between different values, for example at different points, and rotation means that you can easily move your SH into the space of your model rather than transforming per vertex or per pixel normals to the same space as the SH.

For an example of what you can do with SH, I’ve created a ShaderToy example which demonstrates some of the results you can get with SH. Here’s an image:

SH

In this image you can see the following applications of SH:

  • Top left : Order 2 Directional light SH. Note the diffuse appearance. If you follow the ShaderToy link, this alternates with error with a standard dot().
  • Top right : Order 3 Directional light SH. Note that this is less diffuse than the order 2. If you follow the ShaderToy link, this alternates with error with a standard dot().
  • Low left : Order 2 Spherical light SH.
  • Low right : Order 3 Spherical light SH.

*My understanding here is based on Peter-Pike Sloan’s SH Tricks.

What to read first

The canonical and most quoted reference I’ve seen is the Robin Green Spherical Harmonic Lighting: The Gritty Details paper. It takes a couple of reads to gain a full understanding but makes a good basis for most of the content that you read afterwards. I started here and read it 3 times.

Next I read Tom Forsyth’s presentation from GDCE 2003. It’s easy to understand (along with the followup notes) and shows some practical examples of real world use. There’s some important ideas in the slides that have been taken and advanced upon over the last decade:

  1. You can bake the distant lights that your lighting model can’t handle into SH and add them on.
  2. Convert High Dynamic Range skyboxes to SH to provide diffuse environmental lighting.
  3. Calculate the SH at points in the environment and use them to provide local detail to the lighting.

Show me the Code!

For me, everything started to fall into place when I saw some code because I find code easier to understand and experiment with. About a year ago, Chuck Walbourn posted the parts of the D3DXMath library lost when moving to the DirectXMath library, including the source to D3DXSH-like functions. That page is worth keeping open thanks to the links to the MSN documentation for the D3DXSH versions of the functions.

Starting with XMSHEvalDirectionalLight(), I evaluated my first 3rd order SH representation of a directional light pointing up the Y axis, then I used XMSHEvalDirection() to convert my test Y axis vector to a 3rd order SH direction and then dotted the two values together XMSHDot(). Outside of SH, I’d expect this dot to return 1.0f, but with my SH code, I got 2.1176472 and that’s not some special SH thing turned up to 11, I was just doing it wrong. Here’s the code:

	const unsigned int c_shOrder = 3;
	XMVECTORF32 lightDir = {0.0f, 1.0f, 0.0f};
	XMVECTORF32 lightColor = {1.0f, 1.0f, 1.0f};
	float evalledLight[c_shOrder * c_shOrder];
	XMSHEvalDirectionalLight(c_shOrder, lightDir, lightColor, evalledLight, NULL, NULL);

	XMVECTORF32 normal = {0.0f, 1.0f, 0.0f};
	float dir[c_shOrder * c_shOrder];
	XMSHEvalDirection(dir, c_shOrder, normal);
	float result = XMSHDot(c_shOrder, dir, evalledLight);

It took a while to find Stephen Hill’s (@self_shadow) code in his comment on Seb Lagarde’s blog post about the use of pi in game lighting which applies the exact same functions to generate the SH representation of the light and normal but uses a custom dot with coefficients per band {1.0f, 2.0f/3.0f, 1.0f/4.0f} (it’s the 4th value in that array that would be zero). Updating the code to use that custom dot gives the expected 1.0f – win! Looking at the code, the per-band coefficients could even be baked into the SH representation of the light, but I’ve only seen it done once, earlier in the Seb Lagarde blog post – look for ConvolveCosineLobeBandFactor.

Digging further into the code from Chuck you can also find analytical lights such as Spherical lights (good for faking volumes), Conical lights and Hemispherical lights (good for blue up, green down) as well as support for projecting a D3D11 cubemap into SH – SHProjectCubeMap() – which was the beast I was after.

Cubemaps eh?

With a function like SHProjectCubeMap() you can convert a cubemap into spherical harmonics, a topic covered by a paper called Coefficients for each band: An Efficient Representation for Irradiance Environment Maps by Ravi Ramamoorthi and Pat Hanrahan. This paper is the foundation of techniques regarding converting environment maps such as cubemaps to SH and it highlights the low error rate when using 3rd order SH.

Using a technique like this gives you a diffuse representation of that cubemap that you can use for global or local lighting. In the global case, you’d take your skybox texture, convert to SH and use it to add a little extra to your lighting. In the local case, you can calculate a local cubemap or cubemaps at runtime, convert to SH and use that for more local diffuse lighting – if you have enough local samples you can consider that an irradiance volume, first discussed in this paper in 1998.

If you want to look further at irradiance volumes, it’s worth having a look at Natalya Tatarchuk’s GDCE 05 Irradiance Volumes for Games presentation which gives a high level overview of the techniques and covers material from the aforementioned irradiance volume paper and also discusses irradiance gradients to improve the results of calculating the irradiance inbetween samples.

Even more practical information can be found in a post about production use of irradiance volumes from Steve Anichini (@solid_angle). Reading this after Natalya’s presentation, I could see the reasoning behind the decisions made. I especially liked the idea of calculating a local irradiance gradient for each dynamic object.

Further Reading

There’s a lot of detail on Spherical Harmonics all over the internet. As Tom Forsyth’s presentation mentioned, always search for “irradiance” along with “spherical harmonics” because of the wide range of applications for spherical harmonics. I’d also recommend searching for “games” at the same time since that’s where a lot of the realtime ideas are covered.

Peter-Pike Sloan’s publication on Stupid Spherical Harmonics (SH) Tricks is a useful reference for a lot of the additional things you can do with SH. It’s very commonly referenced when discussing practical use of SH.

SIGGRAPH 2005 had a course on Precomputed Radiance Transfer: Theory and Practice.

The presentation Adding Spherical Harmonic Lighting to the Sushi Engine by Chris Oat mostly covers Precomputed Radiance Transfer when it was very popular in the mid 2000’s with an SH chaser at the end.

At GDC 2008, Manny Ko from Naughty Dog and Jerome Ko from UCSD / Bunkspeed presented Practical Spherical Harmonics based PRT Methods. There’s some covering the same old ground to start with, but the meat of the presentation is Manny Ko’s description of the compression of SH data. With the increasing number of ops/byte available on modern GPUs and access to real integer instructions, considering compression like this is a great idea.

If you want to go above order 3, i.e. straight to 5 skipping that zeroed out 4th order, then obtaining the coefficients can be difficult. Spherical harmonics, WTF? on the I’m doing it wrong blog has the required numbers multiplied by pi. The origin of the coefficients is another Ravi Ramamoorthi and Pat Hanrahan paper – Equation 19 in On the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object – referenced from their environment map paper. Those equations are also included on Simon Brown’s Spherical Harmonics Basis Functions post.

For an example of what you can do with the accumulation of SH, take a look at another post from Steve Anichini – Screen Space Spherical Harmonic Lighting. In this post, he uses SH to accumulate light influences per pixel at quarter res and then extracts a dominant light (covered in Peter-Pike Sloan’s Stupid Spherical Harmonics (SH) Tricks) to perform the lighting. The results look good if a little diffuse. I’d be interested to know what the results would be with higher order SH.

At SIGGRAPH 2008, Hao Chen from Bungie and Xinguo Liu from Zhejiang University presented the Lighting and Material of Halo3 (I remember attending this too). The first half of the talk covers their use of SH lightmaps and gives a set of practical ideas about how to pack, compress and optimize the lightmaps. The second half is less SH and more material focused.

Guerrilla’s Develop 2007 presentation on Deferred Rendering in Killzone 2 includes a few slides (24/25) on image based lighting where each object receives SH lighting from artist placed probes. The lighting is represented by an 8×8 environment map calculated on the SPUs.

For really in-depth details about more real world use in game engines, take a look at:

  1. Shading in Valve’s Source Engine – using their own basis which is an even more diffuse approximation.
  2. Light Propagation Volumes in CryEngine 3 – using SH as part of their GI approximation.
  3. Deferred Radiance Transfer Volumes – the GI solution for Far Cry 3.

Call to Arms

Now that there’s code more easily available, I think that Spherical Harmonics are much more accessible to everyone without needing a library bound to a specific rendering API.

Blog Hiatus 1

So it’s been a while since I last published a post – actually, it’s four and a half months since CCC and DefCon Videos – Part 1 in March. I knew it was going to happen, for several major reasons:

  1. First came GDC, an incredibly busy time of preparation, then meetings and then following up those meetings. And this was a great GDC for Phyre, with the 3.5 announcement and more positive quotes from our users than we could include in that blog post.
  2. An amazing Vita line up, especially in May and July. Thomas was Alone, Velocity Ultra, Rymdkapsel, and Soul Sacrifice in May and then Hotline Miami and Stealth Inc in July. Every one of these I’d recommend to anyone with a Vita.
  3. HPG 2013 and SIGGRAPH 2013 in Anaheim. More planning and preparation and a whole lot of busy. And now, pages and pages of notes to transcribe.

Other than these distractions, there have been some things that I’ve been looking at for new posts:

  1. More DefCon videos. I’ve already got a list of great videos which is longer than the first post.
  2. Dabbling in Android. Thanks to vs-android, I’ve been able to stay in my happy place with Visual Studio while developing for Android. I wanted to see how hard it would be to a) run something on my new Nexus 10 b) then port and run something with several hundred thousand lines of code.
  3. Spherical Harmonics. Having read the Robin Green paper several times until I understood it and then the Peter-Pike Sloan doc, then anything else I could find, I’ve come to the conclusion that there’s quite a gulf between the simplicity of the underlying maths as exposed by a library like the DirectXSH library from Chuck Walbourn or the D3DXSH maths functions, and the literature that describes how to apply the maths. This makes me want to write an “SH for dummies” style post that would let me verify that I understand what is needed to integrate basic SH lighting and then highlight what I’ve missed.
  4. Actually writing up my HPG 2013 and SIGGRAPH 2013 experiences – that normally takes a few weeks!
  5. Updates to my HTML5/JS experiments. I still pick them up from time to time. I just love the simplicity of JS and HTML for protoyping, but I miss the exclusive control over the device that you get from console development.

Give me time, the next post isn’t too far away. (And the 1 in the title is for the fact that this isn’t the last time this is going to happen!)

CCC and DefCon Videos – Part 1

Recently, @rygorous tweeted a link to a video called Writing a Thumbdrive from Scratch presented at 29c3. YouTube’s recommended other videos list lead me through some other C3 talks and then on to Defcon videos. Defcon is described as “the world’s longest running and largest underground hacking conference.” C3 in this case is the Chaos Communication Congress and the about page says that the events blog is maintained by members of the Chaos Computer Club.

For me, a lot of the enjoyment of watching these videos comes from learning something from an area completely outside of my comfort zone. I’ve only been to Games and Graphics conferences and even at those, it’s always the talks about subjects wildly different to what I know well that I learn the most from. Hacking and security topics are something I find interesting despite a lack of exposure to that kind of content. There’s also the fact that some of the presenters are entertaining and the material can be quite funny.

The Videos

Writing a Thumbdrive from Scratch – Travis Goodspeed (29c3)

This talk was about the possibility of designing your own USB drive from scratch. Using something like a Facedancer (prototyped by the presenter) you can prototype your own USB mass storage device. However since you’re able to program the behaviour of the device, beyond handling standard storage requests, you can also add your own behaviour based on how the device is used. For example, based on the different ways in which operating systems access the drive, you can decide whether to allow access to the drive, expose different data or just not work at all. Similarly, by detecting block copy operations, you can tell when the drive is being copied, possibly for later forensic examination,  and this means you could respond by returning something else or just destroy the drive.

There’s more information about Travis’s work on his blog.

Trolling reverse engineers with math – frank^2  (DefCon 18)

Initially the title confused me but the underlying principle is about thinking of a different way to obfuscate your code to obstruct any reverse engineering efforts. This is based on remapping the code into memory based on a lookup function. The example uses a sine wave to distribute chunks of code so that you have multiple ops or basic blocks evenly distributed over memory and someone using a disassembler (or if they’re unlucky, a debugger) will have to track all of the jumping back and forth. Further obfuscation involved adding prologues and epilogues to each op or basic block with extra branches not taken and setting states to be picked up after the jump.

Looking from the obfuscator’s point of view at complicating the lives of those disassembling their work makes this quite a funny presentation.

And that’s how I lost my eye: Exploring Emergency Data Destruction Shane Lawson Senior Security Engineer (DefCon 19)

A simple premise: how do you destroy a hard disk in 60 seconds? There’s a couple of gotchas like limited physical space, not setting off alarms (smoke, seismic etc) and no killing any sysadmins or other humans nearby. The range of explored options is entertaining and the final solution is surprising. Any discussion of the talk will give away some of the highlights, so just go watch it instead.

My life as a spyware developer Garry Pejski (Defcon 18)

The story of a guy who picked up shady job from Craigslist and ended up writing spyware. Since this happened several years ago, the application was designed to work as an IE plugin and was delivered a custom installer using an exploit. The installer meant that there was an element of legitimacy since the user was partially involved with the install, despite it being almost impossible to exit – there was even a set of terms and conditions! The application itself performed affiliate link redirection (changing affiliate IDs to its own) and displayed popups based on intercepted searches. However, it sounds like the affiliate/popup monetization strategy didn’t make as much as using the application for installing other people’s spyware, leading to more hilarious stories.

At the time, the state of the art malware removal was quite basic and sounds easy to defeat. I noticed that the presenter raised the idea of needing to whitelist applications in the future which made me think of the discussion of whitelisting in the Windows Store.

Worth a watch, especially for the view from the Dark Side from one of the guys building the Death Star.

The Art of Trolling (Slides) – Matt ‘openfly’ Joyce (DefCon 19)

A comedy piece covering examples of trolls through history and a examples of the different types of trolling. With all of the anecdotes in this talk and the underlying material, it’s difficult to know what’s true. You won’t learn much, but you might have a laugh.

How I met your girlfriendSamy Kamkar (DefCon 18)

The presenter starts by introducing himself as someone previously banned from using computers for a little misbehaviour. The talk is based on penetrating someone’s Facebook account and starts by focusing heavily on the quality of the elements that form the basis for hashed passwords in PHP. Reducing the entropy of each of these elements results in something that limits the possible range of values to brute force. Ironically this doesn’t affect Facebook because they use their own HipHop system and since this vulnerability was found, PHP has been patched too, as well as the obvious recommendation that you use your own mechanism for seeding the random numbers.

This talk also introduced me to NAT pinning, enabling forwarding of a port from a user’s computer through their router. This is based on submitting invisible HTML forms which I think scared me even more. Getting these invisible forms talking to an IRC server behind the user’s back further escalates the fear. The final link in the chain was establishing location which is easy thanks to Google’s roving mappers having grabbed the location of a lot of routers based on their MAC address.

Definitely worth watching to learn a few new things and for the entertainment value too.

The Dark Side of Crime-fighting, Security, and Professional Intelligence – Richard Thieme ThiemeWorks (DefCon 19)

I assumed this would be funny anecdotes, but it’s something darker. The presenter started by highlighting his history with the conference, as a father figure it sounds like he’s seen it all, and as the talk goes on you realize that his experience means that he knows a lot of people. However his stories from the dark side are less comedic and more about reiterating the scary state of affairs in the professional intelligence community.

Overall, a sober and honest look at where we were in 2011.

Practical Cellphone Spying – Chris Paget (DefCon 18)

This is one I watched wanting to find out how simple it actually was. Apparently very simple. Spend a couple of thousand dollars on a laptop and USRP (Universal Software Radio Peripheral). You start by spoofing a network for phones to connect to – easy enough with well known IDs for the major networks.

Of course you’ll be thinking it’s all encrypted, and it might be, but you can ask the phone to turn off encryption. Nevermind, you’d say, no-one can turn off my encryption without asking, but in fact they can and a lot of phones ship with the disable-encryption warning turned off. This is thanks to countries that need it off by default (for example, India) and not wanting to confuse consumers when they get told that it’s being disabled. Even more worrying is the idea of the security of 2G making it HTTP to 3G’s HTTPS, when you’re so grateful for any kind of connection, you don’t typically think of the security implications.

The whole presentation makes the whole thing seem incredibly easy. Slightly scary, but interesting.

Physical Security You’re Doing It Wrong A.P. Delchi (DefCon 18)

This presentation covers considerations for physical security, and the fact that physical implemented poorly is useless, or more realistically, funny presentation content. The presentation starts with the 5 As:

  1. Assessment – where and what to protect.
  2. Assignment – prioritize what to protect.
  3. Arrangement – how to protect.
  4. Approval – get it signed off.
  5. Action – install it.

Starting at 21 minutes is the what could possibly go wrong section: A discussion of the management and vendor level problems and how to handle them. I do like the example of talking to the construction workers as they know what’s actually going on. The last few minutes covers user’s and HR’s greatest hits. I also learnt Spafford’s Law of Security: “If you have responsibility for security, but no authority to make changes, then you’re just there to take the blame when something goes wrong.”

A good balance of very practical advice and comedy things to be aware of.

You spent all that money and you still got owned – Joseph McCray (DefCon 18)

In this, the presenter tells a good story all about his experiences penetration testing and mechanisms for actually performing the testing. There’s discussion of the different tools and scripts to help with things like load balancer detection, handling intrusion prevention and detection systems, and discovering web application firewalls – all things that cost money and in the examples, are all providing only the illusion of security. The talk then goes on to what you can do when you’re in.

This was probably the first of the talks I saw that tells you what’s available for this kind of work and how easy the tools are to use. And be warned, the presenter uses NSFW language, and you may be offended if you like Ruby.

Steal Everything, Kill Everyone, Cause Total Financial Ruin! Jayson E. Street CIO of Stratagem 1 Solutions (DefCon 19)

A good follow-on from Physical Security You’re Doing It Wrong, this starts with history of the presenter’s work with entertaining stories from actually penetration testing offices by entering them. The presentation is split into 3 parts based on the title:

  1. Examples of stealing – what you can find lying about in an office.
  2. How to kill everyone – due to a lack of security in a hotel allowing access to the kitchen or plant room.
  3. Financial thievery – such as grabbing the paper in the shred bin.

Start to end, this is an entertaining talk that will show you what a focused intruder can achieve, and hopefully while you’re thinking that couldn’t happen where you work, I hope you’re also thinking about how you’d stop it.

Pwned By the owner What happens when you steal a hackers computer (DefCon 18)

This is the story of the consequences of using a hackers stolen computer – with a bonus 5 minute story while setting up the equipment. Starting with the theft, it’ll make you think about your security situation. But after that it’s an exciting story about what you can do with a back door to the computer you used to own, and you’ll be glad not to be the new guy using it.

One of the shorter presentations but worth watching. The only thought that came to mind was that maybe the presenter wasn’t picking on the thief, but a new owner, but either way, it was his kit being used.

Nmap: Scanning the Internet Fyodor, Hacker, insecure.org (DefCon 16)

Nmap is a tool I’ve always wondered about – never having had to use it or really understanding what it does. This talk gives a lot of examples of how to use it and then tips on more advanced usage. The examples show the epic command lines you use to drive the thing and it’s quite obvious that the presenter is the author of the tool. The presentation also shows a nicer GUI frontend to NMap with extra features like a graph of connectivity between nodes.

Interesting stuff if you know very little about Nmap.

Jackpotting Automated Teller Machines Redux– Barnaby Jack (DefCon 18)

To be honest, I confused myself with the title, assuming it was something to do with fruit machines, but even more intriguing, it’s about the gritty internals of ATM machines, focusing on the simple boxes you find in a small shop or petrol station. Although appearing suspicious buying and transporting his own ATMs, the presenter has taken the time to investigate what’s inside. Starting with reverse engineering, he moved on to writing tools to remotely access the ATMs (Dillinger) and rootkit to install (Scrooge).

Although most of the real life excitement of experimenting with the machines happens off screen, the rest of the talk is fascinating enough to make this worth watching.

2012 – A Year In Games – In Haiku form

Since last year’s write up of the games I enjoyed took 5 separate posts (1, 2, 3, 4, 5) and more than 6K words, this year, I thought I’d take a different approach.

Uncharted 3

Aiming? Controls changed? / Beautiful vistas but hard / 2’s still favourite

God of War: Chains of Olympus

First PSP port / Enjoyable hack and slash / Near perfect for me

God of War: Ghost of Sparta

Didn’t play before / Yet more Kratos epicness / Best played in series

The Simpsons Arcade Game

Classic beat-em-up / Saved 30 quid from arcade / Unlimited plays

Tomb Raider: Underworld

First I’ve played of all / More difficult jumps than Drake / Going to take time

Saints Row: The Third

Fantastic intro / Driving, shooting, epic fun / Race quad bikes on fire!

Portal 2

Hadn’t played first one / Polished gameplay and detail / Now want to try first

Fifa (Vita)

Perfect for the train / Even if you don’t football / Backtouch for the win

Touch My Katamari

Sticky ball roll fun / Mad arse king protagonist / Great port, worth buying

Fallout: New Vegas (Game of the year)

Continued from end / Enjoyable new stories / Many hours of play 

Uncharted: Golden Abyss

Epic adventure / Amazing length, depth and plot / First Vita must buy

Cars 2

Racer for the kids / Typical karting action / with the Cars 2 cast

Mass Effect 3

Yay! More chest high walls! / Tried net play but not my thing / OK with Ending

Rayman Origins (Vita)

Blimey – beautiful! / Favourite platformer yet! / Perfect portable

Resistance: Burning Skies

First Vita shooter / Strong start but difficult end / Glad to have 2 sticks

Renegade Ops

PSPlus Freebie / Great, pretty twin stick shooter / Top down SWIV-em-up

inFAMOUS 2

Amazing graphics / Huge city plus UGC /Going to take time

Gravity Rush

Took a while to learn / Something new – refreshing change / Worth a try – get Plus!

Plants Vs Zombies (Vita)

Well-polished and fun /Favourite tower defense / Stiff hand after hour!

Dungeon Hunter: Alliance

Standard RPG / Gave up! Refighting bosses / Not a great ending

Sound Shapes

Surprised by how good / Epic moments as tune grows / Inspirational

Ratchet and Clank

One more chance to play / Jumping and guns nostalgia / 1mill bolt trophy?

Lumines Electronic Symphony

Loved on PSP / Great choice from modern soundtrack / Could play forever

Borderlands DLC: Claptrap’s New Robot Revolution

Last of first’s extras/ Early bosses as claptraps! / Long but walking lots

Ultimate Marvel Vs Capcom 3 (Vita)

Super hard scrapper / Combo-centred for experts / Not Street Fighter 2

Disney Universe

Simple platformer / Limited character choice / Levels drag a bit

Journey

Wish I’d played before / Beauty, art and emotion / Need to revisit

LittleBigPlanet (Vita)

Platform and sub-games / Impressive tech achievement / Not one to Create

FIFA 13

Similar to 12 / Oops, set to semi-pro – fail / Need a year to learn

Borderlands 2

Loved first, couldn’t wait / RPG-lite shooter fun / Bring on DLC

LittleBigPlanet 2

Early Plus title / Nice look, heavy inertia / Drop-in, time-to-time

Tales From Space: Mutant Blobs Attack

Jumpy munching blob / Reminiscent of Sound Shapes / Katamari-like

Chronovolt

Didn’t really like / Ball-rolley, time-freezy game  / Not my type of thing

Welcome Park

Vita Intro app / Initially ignored it / Good, better for kids

Ratchet and Clank: Full Frontal Assault

2 play or too hard /3rd person tower defense / Also on Vita!

LEGO The Lord of the Rings

Retells story well / No real challenge but still fun /Platinumable!

Limbo

Dark in art and plot / Increasingly difficult / So glad I played it.

Need For Speed Most Wanted (Vita)

Free roaming racer / Another Vita classic / My A1 beats all!

Knytt Underground (Vita)

Simple mechanics / Huge explorable game space / Challenging but fun

Vanquish

PSPlus download / Inspired by Access preview / Hyper-intense game

Disney Pixar Toy Story Mania

Disney Move  Shooter / Christmas break game for littl’un / 2 player havoc

Treasures of Montezuma Blitz (Vita)

Free Vita Match-3 / Lives limited by timer / Portable pastime

Call of Duty Black Ops II

Love me a shooter / Future weapons much more fun / Yet to play Zombies

Assassins Creed 3 Liberation (Vita)

First AC for me / Story heavy, some action /Want to try console

The Final Word

Screw this for a game of soldiers! I know you should embrace constraints but this has been a difficult job. Halfway through I realized I could have done something simpler, like limiting myself to a tweet-like 140 characters. If I never have to count syllables again, I’ll be a happy man.

In review – My 1st year (and a bit) of blogging

History of the blog

The first post was written on 6th October 2011 (as you can probably work out from the link). From then to date I’ve reached the following dizzy heights:

I know that 3.5K-ish views could be considered a drop in the ocean of page views, but I think it makes for a good start. I’m glad that the HPG 2012 page has seen so many views since it was my favourite conference of 2012 and I’m proud of its write up (I’m also looking forward to this years HPG!).

I’m also happy the Disneyland Paris review has had some views because it’s the kind of helpful information you need for a visit and although it’s off-topic for the main focus of this blog, I’m glad to write and share content like that.

Also this year

Since starting the blog, I’ve also started using Twitter as @dickyjimforster (which I also posted about) and I’ve been able to discover some interesting things as well as managing to forward other interesting things on! This really satisfies my Google reader sharing urges (which I also discovered was very common for a lot of people) as well as providing more context for what people are doing at events such as SIGGRAPH and GDC.

The future

My aim for this year is to keep writing although I don’t think I can maintain the posting rate of the last year. I’d like to spend some more time practising game development (although I don’t think I can achieve the rate of one game a month however much the efforts of others inspire me) since it’s an area where I feel I need to improve. I’d also like to write a few off-topic but helpful posts like last time.

The final word

I’d like to think that my first year (and a bit) has been successful. Based on a post from Eric@realtimerrenderingOne survey (from Caslon Analytics) gives 126 days for the average lifetime of a typical blog. Another (from problogger) notes even the top 100 blogs last an average of less than 3 years. I’m still here!

SIGGRAPH 2012 – Tuesday

Beyond Programmable Shading (SlidesSIGGRAPH Page)

Five Major Challenges in Real-Time Rendering http://bps12.idav.ucdavis.edu/talks/02_johanAndersson_5MajorChallenges_bps2012.pdf
After the introduction to the BPS course, Johan lead straight into a followup to his talk from the 2010 course. The fact is that the 5 challenges from 2 years ago are still there. However, each of these challenges can be broken down into different areas resulting in a total of roughly 25 different topics to cover spread over the 5 major groups:

  1. Cinematic Image Quality – Types of aliasing, anti-aliasing and blur.
  2. Illumination – Dynamic GI, shadows, reflections
  3. Programmability – Exposing a common front end shader language for different backends (for example HSAIL (Heterogeneous Systems Architecture Intermediate Language), supporting the GPU generating its own work, improving coherency between tasks with things like queues, simpler CPU-GPU collaboration. Programmable blending (as is being exposed by Apple in iOS6 via APPLE_shader_framebuffer_fetch discussed here)
  4. Production costs – Reduce iteration time. A great renderer won’t reduce costs, need to reduce cost of creating content.
  5. Scaling – Power vs resolution.

I do know that prior to the talk, Johan was soliciting feedback from others in the industry which meant that you knew the challenges listed were more than one man’s personal list and it was good to see the names of the contributors at the end of the talk.

Intersecting Lights with Pixels: Reasoning about Forward and Deferred Rendering http://bps12.idav.ucdavis.edu/talks/03_lauritzenIntersectingLights_bps2012.pdf

This was an interesting overview of the state of the art for generating the lists of lights to apply to pixels in a forward or deferred renderer without going in to a forward vs deferred discussion beyond some bandwidth comparisons that were nearly inconclusive due to the approximate equality.

There were 2 main things that I thought made useful takeaways:
1) Doing per tile checks against the bounds of each tile performs a lot of the same tests multiple times. Instead you should use something like a hierarchical quad-tree which should reduce the number of tests required and avoid redundancy.
2) The suggestion of using bimodal Z clustering rather than a higher number of buckets as used in Clustered Deferred and Forward Shading by Olsson et al at HPG. While reading the Olsson paper, I thought that heavily subdividing the Z range was overkill and some twitter discussion around that time highlighted that in some games, there’s limited content between the end of the sights and the nearest wall, in which case a split into two Z ranges would help.

I’d recommend taking the time to read through the slides if you didn’t attend.

Dynamic Sparse Voxel Octrees for Next-Gen Real-Time Rendering http://bps12.idav.ucdavis.edu/talks/04_crassinVoxels_bps2012.pdf

With all of the buzz regarding Sparse Voxel Octrees based on their usage in Unreal 4 (covered in the Advances in Real-Time Rendering course), I thought it would be good to see a presentation from Cyril Crassin who has been working hard and presenting a lot of the work in this area. Most of the talk was an overview of how the system works and how it fits in versus a polygonal geometry representation, comparing the advantages that you can get from cone tracing an SVO.

Yet again, this was another SVO talk where processing cost and storage space were skated over, but since the slides are now available, you can see the numbers. 9 levels of SVO comes in at 200MB to 1GB and the initial construction for the GL sponza demo is 70ms with an update cost of 4-5ms for animated data each frame. The performance figures were slightly confused later when it was said that the GL demo from the new OpenGL Insights book (released chapter here) can build Sponza in 15.44ms. However since the technique is being used in Unreal Engine 4, it was expected that The Technology Behind the “Unreal Engine 4 Elemental Demo” presentation on the following day would show where you could reduce the time and space requirements.

I do like SVOs for the fact that they provide a scalable solution to the visibility part of a GI system and their use in Unreal shows that they can have a runtime implementation. If only they didn’t cost so much to generate and store!

Power Friendly GPU Programming http://bps12.idav.ucdavis.edu/talks/05_ribblePowerRendering_bps2012.pdf

This was a Snapdragon-based presentation on general optimizations that could be applied to save power. Unfortunately the generality of the optimizations – compress textures, draw front to back, and consider your render target changes – brought very little to the power saving discussion. Once frame limiting was recommended as the best way to save power I did wonder how helpful the content would be. I missed the end of the talk with an aim to return for the panel.

From Publication to Product: How Recent Graphics Research has (and has not) Shaped the Industry

This panel was lead by Kayvon Fatahalian and discussed the relationship between research departments and industry with a set of industry luminaries on the panel.

Some of the things that I took note of:

  • No one wants to learn new language – every time someone comes up with a new language, it’s unlikely to get a lot of adoption due to the languages already in use.
  • Papers need to use realistic workloads and industry needs to provide better workloads to facilitate this. Researchers working closely with industry typically get the most relevant workloads due to the requirements of the research.
  • The HLSL language was not expected to last this long – they thought it might run for 5 years or so.

Light Rays Technical Papers http://s2012.siggraph.org/attendees/sessions/100-59

Naïve Ray Tracing: A Divide-And-Conquer Approach ACM Digital Library Version

This presentation started with a back-to-basics description of ray tracing – intersect a bunch of rays with a bunch of geometry. A lot of optimization has gone into the geometry intersection using techniques such as bounding volume hierarchies which adds time to build, memory to store and complexity to create and intersect, and which typically ignores the distribution of the rays and can even increase cost with dynamic scenes. This new technique is based on recursively splitting the set of rays and the set of geometry until you can perform a naive set of intersection tests. It’s a simple algorithm so most of the rest of the talk consisted of results. The performance looked good and they said that the major limit was bandwidth as the reordered the rays and geometry.

This was an interesting presentation since it could lead to a rethink in the way that ray tracing is performed. I imagined the tests to partition the sets of rays and geometry would be prohibitively expensive but it sounds like it could be a good win. It’ll be interesting to see what comes of this technique and any further research. There’s a good write up here too.

Manifold Exploration: Rendering Scenes With Difficult Specular Transport SitePDF

This talk centered on a new way of dealing with specular that can be applied to Markov Chain Monte Carlo (MCMC) based rendering. My understanding is that once you know which points you want to transport light between and which surface to reflect from or refract through, you can make multiple paths that fulfil that transport. This was best expressed by the images in the slides. There was also an extension to account for roughness based on the area around the manifold. This works well in the highly reflective and rough scenes that they showed.

The source is available and includes an implementation of Veach’s Metropolis Light Transport which is notoriously difficult to implement – this meant an extra round of applause for the presenter.

Bidirectional Lightcuts http://www.cs.cornell.edu/~kb/publications/SIG12BidirLC.pdf

Continuing the theme of bias reduction in Virtual Point Light (VPL) systems, Bidirectional Lightcuts extends multidimensional lightcuts to use the same tracing mechanism that adds VPLs to add Virtual Sensor Points (VPSs). The paper introduces weighting mechanisms for the VPL/VPS pairings to allow the use of more advanced features such as gloss, subsurface scattering and anisotropic volumes.

Virtual Ray Lights for Rendering Scenes With Participating Media Site PDF

Virtual Ray Lights are intended to fix the issues that arise when rendering participating media with a VPL technique (firefly-like singularities around each VPL). The ray lights are the paths traced through the volume when adding VPLs and they can be integrated with the camera ray when rendering to add their contribution which gives very convincing results.

As mentioned before, this paper was superceded by beam lights (Progressive Virtual Beam Lights as seen at EGSR) since the ray lights change the firefly singularities to bright streaks.

Fun with Video http://s2012.siggraph.org/attendees/sessions/100-56

Video Deblurring for Hand-Held Cameras Using Patch-Based Synthesis http://cg.postech.ac.kr/research/video_deblur/

This paper discussed a method for fixing the motion blur that remains even after image stabilization. The algorithm is based on finding periods with sharp frames (since shaky videos have sharper and blurrier frames) then applying a patch based process that finds neighbours and blends. They find blur kernels to match the blur and use them to match the patches.

I’d not realized how bad the remaining motion blur could be, but this was a very interesting presentation and covered a lot of previous work.

Eulerian Video Magnification for Revealing Subtle Changes in the World http://people.csail.mit.edu/mrub/vidmag/

The fast forward version of this paper showed 2 example videos that demonstrated detecting a person’s heartrate and an infant’s breathing based on amplifying changes in videos – the main reasons I wanted to see this. The previous work section referenced Motion Magnification from SIGGRAPH 2005. For this technique, the video is spatially decomposed and then they calculate the luminance and apply a temporal filter that smooths the trace. Many equations are used to explain why this works, and slides refer you to the paper for more details.

More interesting results included detecting Bruce Wayne’s pulse in a Batman film when he’s supposed to be asleep and a running demo where the user moved slightly and his eyes didn’t which gave scary results.

Selectively De-Animating Video http://graphics.berkeley.edu/papers/Bai-SDV-2012-08/

The presentation demonstrated a user driven tool that could warp parts of a video to make some elements appear stationary while leaving the motion of other parts intact. These can be looped to create a cinemagraph. The features marked by the user are tracked through the video and used to define the required warp. The warped version is composited with the original to remove the last motion. A major advantage was that the user input requirement seemed minimal.

The main example was a beer being poured into a glass (a common SIGGRAPH video source) where the video was warped such that the glass looked stationary. Other examples given were a roulette wheel where the motion of the ball was faked or the wheel was held still, a video of a guitar player where the guitar was kept still, and a model that was kept still while moving her hair and eyes.

SIGGRAPH Dailies – http://s2012.siggraph.org/attendees/siggraph-dailies

SIGGRAPH Dailies is an end of day session that is based on multiple presentations of a minute each with a wide variety of topics. My memory of the event was that it was very artist driven with a very vocal set of presenters from Texas A+M. The length of the presentations means that they have a very strong visual component and it’s the artists presenting their work that I remember most.