HPG colocates with the EGSR (for 2012) conference every other year (alternating with SIGGRAPH colocation). This year, both conferences were held in Paris at the Maison Des Mines which is a teaching centre with a good room for presentations with a maximum capacity of 200. HPG runs from Monday to midday Wednesday and then EGSR runs until Friday.
Personally I strongly value attendance at conferences, mostly due to the impenetrable language of the papers which becomes crystal clear when explained by one of the authors with PowerPoint or Keynote. I’d prefer it if each paper were available online with at least a transcript if not a recording of the author presenting the concept next to a set of slides. It’s typically while discussing previous work that they explain the basis of their technique in plain English.
Please note that any errors or omissions are my own. Slides are linked from the HPG program pages.
For me the most interesting things were:
- “Power Efficiency for Software Algorithms running on Graphics Processors” since it opens the door to looking at power usage while rendering
- “Maximizing Parallelism in the Construction of BVHs, Octrees and k-d Trees” since it introduced an alternative way of looking at the algorithms involved.
- The Jon Olick Keynote – Computer Graphics meets Computer Vision since it showed me much of the state of the art of computer vision.
- The Town Hall since it was good to see some of the inner workings of such a conference and the passion of those involved.
Design and Novel Uses of Higher-Dimensional Rasterization
As the first talk, this focused on anything that uses sampling can be reconfigured to use the same rasterisation mechanism as higher order rasterisation. There were several different techniques demonstrated:
- Occlusion queries
- Continuous collision detection
- Sampling glossy reflections/refractions
- Motion blurred soft shadows
- Multi-view rendering such as stereoscopy
Adaptive Image Space Shading for Motion and Defocus Blur
This paper presented a mechanism for reducing the number of shading samples requiring calculation based on working out a sufficient set based on the rate of change of the samples (based on depth of field (DOF) circle of confusion (COC) and motion). This then allows reduced computation and then redistribution of the shading, similar to importance sampling. The adaptively sampled result had some visual differences with the fully sampled version due to needing to band limit shaders (which is not currently well handled) and needing some kind of LEAN mapping on normal map mipmaps to avoid issues when rendering specular highlights.
High-Quality Parallel Depth-of-Field Using Line Samples
This talk presented the use of line samples in place of the more typically used point samples. The talk started by showing the high number of point samples required to resolve a DOF effect to a sufficient quality. The use of line samples gives much better results with a lot less samples. (One other thing to note, DOF doesn’t need the typical 5th dimension (time) used by most other higher order rasterisation techniques so that wasn’t covered by this talk).
The rasterisation pipeline was built on CUDA and depends on a limited shading model, currently only phong, due to using a simplifying the line sampling. The line sampling uses a pinwheel of lines with reweighting of the samples due to the more commonly sampled area towards the centre of the pinwheel. Despite using techniques to coallesce common samples and discard samples of minimal value, the example also needed LOD due to high triangle density both near and far in the scene due to minification in the distance and a very large COC near to the camera.
The figures quoted were about 2 fps, most likely at a previously mentioned 800×600. One of the audience members asked about the use of rotated line samples which they hadn’t tried yet.
Maximizing Parallelism in the Construction of BVHs, Octrees and k-d Trees
I found this a very good presentation of an easy to understand paper that I read before attending. The technique is based on a Binary Radix tree combined with a novel way of numbering the nodes that allows for parallel building.
kANN on the GPU with Shifted Sorting (For reference, kANN = k = Number you want, A = Approximate (can be wrong), NN = Nearest Neighbours)
This paper was based on preexisting work called shifted sorting (from the Space Filling Curves paper). They’ve changed the spatial hashing from Hilbert to Morton codes since Morton codes are easier to generate at runtime at the expense of reduced locality due to the ordering. The most complex part of the talk was a sorting section in middle describing the mechanism by which they simulate the shifted sorting by getting 5 lots of 2k results and then merging them into the final k results on the GPU. The final results were compared to FLANN which supports higher k values but is not as fast. They also highlighted that with approximate nearest neighbours, accuracy is also a metric that needs to be considered.
SRDH: Specializing BVH Construction and Traversal Order Using Representative Shadow Ray Sets
This paper contributes a heuristic for improving the quality of the Bounding Volume Hierarchy (BVH) used to optimise shadow ray lookups when raytracing. The introduction highlighted that a scene dependant heuristic is typically required. This lead them to realise that they needed the results from the tracing of the scene to build the BVH for the shadow rays. To achieve this they used a low resolution (16×16) reference render of the scene to build shadow ray tree (i.e. pretesting shadow rays to find common occluders) which gave sufficiently good results.
Most of the posters on display were from Samsung regarding the reimplementation of common features of low power mobile GPUs based on a tile-based or tile-based-deferred renderer. nVidia and Square Enix had more complex posters on display and there was a Comic Sans poster about 2D Delaunay triangulation which I found nearly unreadable due to the font.
Panel Discussion: The Trajectory of Mobile Platforms
Each of the members of the panel had a chance to present their feelings on mobile platforms which I’ve attempted to summarise here:
nVidia: Compared move from Workstation to Home PC with GPU to mobile.
Kayvon Fatahalian (CMU): Moving towards a world of mashups.
Arm: Need to consider nanojoules/pixel – and this will need to drop for upcoming 4k screens (and a TV can nowadays be considered a phone without a modem).
Ray tracing based and stochastic rendering will most likely use too much power.
GPU Compute more likely to be more power efficient in the general case than vertex/fragment.
Samsung: Looking at mobile ray tracing (developing their own hardware) for mobile, tv etc.
Presenting at SIGGRAPH 2012 – Won-Jong Lee.
IMG: Dedicated HW uses least power – 150mW for 1080p decode.
Typically learn from desktop then simplify – eg OpenGL.
Maybe roles will reverse with mobile leading.
Jon Olick: Need to be practical. Games will be price limited. Need to be aware of limitations.
AMD: Common that PC assets reused on mobile (contrasting another comment).
Need to make more immersive – too different to TVs.
It’s OK to separate APIS – like OpenGL.
An audience member asked if there would be ray tracing hardware in future?
Arm said no.
Samsung says yes – it’ll make AR more immersive and the UI more realistic
IMG says it will simplify effects like shadow mapping which would reduce engine costs (audience member highlighted that cost is currently more in content rather than engine development).
nVidia says there’s already hybrid techniques, such as casting secondary rays from rasterised geometry.
nVidia: nVidia are working with Gaikai (which is interesting since the Sony purchase).
Someone highlighted: Doesn’t it take more energy to receive the data than render it?
Should APIs expose lower level access to things like TBDR?
Olick: Devs should have lower level access
Kayvon: But what about when it all changes – the APIs are for abstracting.
Arm: OpenGL hides that
Audience member: But Qualcomm and Apple have GL ES extensions to expose lower level features.
Arm: Oh yeah
Keynote from Kelly Gaither, Director, Data & Information Analysis at Texas Advanced Computing Center (TACC) – Picture This: Visualizing Yesterday, Today and Tomorrow
The focus on the talk was the history of visualisation, the hardware used at TACC for visualisation and working with those who wanted. There were lots of pictures of Tera/PetaFLOP cluster hardware and multipanel displays which were basically hardware pornography. One other interesting thing to note was that all of the hardware systems were named after western elements, for example, lasso, stallion, and stampede.
Algorithm and VLSI Architecture for Real-Time 1080p60 Video Retargeting
This talk presented a hardware implementation of “A System for Retargeting of Streaming Video” (available here) from SIGGRAPH Asia 2009. Due to limited memory space they had to split the input into horizontal strips to calculate the required saliency map (based on Spatio-temporal Saliency Detection Using Phase Spectrum of Quaternion
Fourier Transform) which highlights the high frequency features of the image. They also mentioned that they needed a heavy infinite impulse response filter with a scene dependant constant parameter to handle temporal artifacts (most easily seen in the videos). This filtering also means that there’s currently quite a bit of latency in the hardware and a constant that needs to be set correctly.
Power Efficiency for Software Algorithms running on Graphics Processors
This paper presents the GPU power consumption for a set of rendering and shadowing algorithms on various GPUs. For rendering, they tested forward, z-pre-pass and deferred rendering. For shadowing they tested shadow maps, shadow volumes and variance shadow maps. For desktop GPUs they were able to sample the power consumption inline, but for the portable devices, they needed to sample the entire amount of power consumed by the device.
As the first paper of its type, the results were a little shaky so you couldn’t really compare between devices, only being able to compare results on the same device such that you could say: on a 7970, deferred rendering is cheaper than forward rendering. However, this paper does open the door for lots of considerations about power consumption repeating the same metric highlighted by Arm on the previous day – nanojoules per pixel. I hope that this will lead to IHVs providing this analysis themselves to increase the accuracy, and giving the graphics community the ability to reproduce the results. However, there’ll always be a problem with apples-to-apples comparisons since there are many more considerations when comparing different rendering techniques, since the geometry and shaders may give different visual results as well as performance.
Hot3D Talks and Panel
Intel: Processor Graphics – Tom Piazza
Tom discussed the progression of Intel graphics from the i740 to Ivy Bridge and mentioned 2 TFlop as a nextgen target. One interesting point was that Intel is changing to target systems with batteries such as laptops, tablets, and phones. Tom also mentioned that it was funny using a 486 as scheduler. A member of the audience asked a question about the ISA on these GPUs? Tom indicated that they would aim for an intermediate language and just in time compilation.
Autodesk: Scalable and Robust Rendering in the Cloud
Lots of Autodesk’s tools support rendering in the cloud, the examples given being Revit, Autocad, and Homestyler. The rendering is performanced on Autodesk’s own HW and that then spills over to Amazon EC2, although hardware and rendering costs weren’t really mentioned. They use a specialised renderer called Spectrum which performs lighting based on multi dimensional light cuts. They said that the renderer currently has a limited set of features but covers most of the requirements of common users. An audience member asked a question about what they do with low quality geometry in the render, such as single triangle world ground plane and highly tesselated crockery, they said “crap in crap out”, so they give warnings, but most architects want to improve their geometry so it’s a positive result.
nVidia: Kepler design tradeoffs
This talk was a lot of different revelations gathered from previous presentations. Early on they mentioned that Fermi was power limited, so they considered dropping voltage and/or the clock (another case of performance based on power consumption). For Kepler, they doubled the hardware and halved the clock rate. This wasn’t a problem since more area was available at 28nm than at 40nm and more hardware meant less stores to registers so less power consumed. This change was dependant on a compiler technology improvement to manage the flow control, which contradicts AMD’s Graphics Core Next architecture which mvoed away from VLIW. Another major area for fixes for Kepler was atomics which saw a 2x-10x performance improvement and this was also mentioned in a couple of the other talks too.
Reducing Aliasing Artifacts through Resampling
This talk introduces a new AA technique called RSAA – Resampling Antialising. This technique clusters similar samples and the clustering generates a bitmask which can be used to look up constants for blending. The paper introduces a new metric called SADP (sum of the absolute dot products) which is used for clustering samples. The table indexed with the bitmask is generated based on computer vision techniques and minimising errors for a training set which could be tuned for a particular game. This was a remarkably well presented talk with Alexander Reshetov coming across as a true character.
Clustered Deferred and Forward Shading
This was another paper I had read before attending. The value of the clustering for the samples is based on the spatial distribution of a large group of lights and how that interacts with tile based deferred shading where each tile contains a large range of depths. As an additional clustering key, they’ve also looked at normal cone clustering. Although this was expensive in their scenes, the normal cone clustering looked like it would have value in tiles with less depth and normal variance which is more likely in a game environment. One other thing of note, they mentioned during the managing clusters that their code was 2 passes on Fermi but could be 1 on Kepler due to the improved atomic performance.
Overall, I think that a large part of the positive results for the test was due to the selective nature of the scenes used to test the technique. I’d like to see the performance results in a wider range of scenes (for example as in Matt Pettineo’s light indexed work).
Scalable Ambient Obscurance
This paper was presented by Morgan McGuire and was a followon from his work on AO from previous years. The paper adds 2 major contributions to the previous work:
1) Improved depth value storage and subsequent normal recovery (using ddx/ddy in pixel quads). The first problem they found was the lack of accuracy in the depth values in the depth buffer which was subsequently reducing the quality of the recovered normal.
2) Use of mipmaps to balance the cost of calculating distant contributions. This change actually levels the cost of the technique across varying resolutions (ignoring the mipmap generation). Based on future issues with increasing pixel density, this is a major advantage for this technique.
Longer term, Morgan sees the future using SSAO for local effects and GI raycasting to generate more global effects.
Texture and Appearance
Adaptive Scalable Texture Compression
This paper was presented by Tom Olson from Arm and was based on work from Arm and AMD. It was Tom who had previously raised the issue of power usage in mobile and he referred back to relationship between power and bandwidth use.
The new format is 128bit block based and the format supports a variety of block sizes. Each block can be partitioned and therefore use multiple colour spaces (the format supports 3K partitioning methods). For sampling the colours, it’s possible to have lower resolution palette indices and then bilinearly filter the colours. The format has specific modes for encoding luminance and so can handle a fixed chroma and a varying luminance.
The results looked good against the existing lower bit rate formats, but against higher bit rate formats such as BC6/7 the results were mixed. Funnily enough, there’s no 4bpp format but 6×6 blocks work out at 3.56 bpp which was compared to other 4bpp formats. Although the format looks difficult to encode but apparently they already have an encoder and are looking at creating a runtime encoder.
An audience member asked about alpha channel support. Tom said it depends on the component count where 1 channel = luminance, 2 = luminance alpha, 3 = RGB1, and 4 = RGBA, so some swizzling may be required.
Parallel Patch-based Texture Synthesis
This talk demonstrated a texture synthesis method based on circular patches that could be done in parallel. The patches to be blended in are approximately circular and are then converted to a 2D rectangle using a polar basis. The cuts to insert the patch are found in the rectangle with the cuts needing to start and end at the same place to make a coherent edge. Some deformation is also supported to ensure that the patch fits well. The talk discussed parallelising the dynamic programming for each of the splits based on keeping them local in cells.
Representing Appearance and Pre-filtering Subpixel Data in Sparse Voxel Octrees
This talk won best paper but I’m not sure it added much to previous sparse voxel octree presentations beyond splitting the surface representation into macro and micro versions. The signed distance field was a novel addition and the memory cost of this wasn’t clarified when an audience member asked.
Jon Olick Keynote – Computer Graphics meets Computer Vision
Jon showed lots of computer vision techniques that could be achieved on current hardware. He started by showing two structure from motion techniques:
1) Parallel Tracking and Mapping [for Small AR Workspaces] (PTAM). This technique seemed sufficient to map out lots of local points that could be used for tracking features.
2) Dense Tracking and Mapping (DTAM). This technique generated denser detail and could be used to generate surfaces that could then be used for interaction. In the video shown, the items on the surface of a desk could be used as obstacles for a racing car.
The next vision examples demonstrated adding or removing objects or lights to/from the scene. Jon demonstrated the value of more GI like lighting on the augmentations to better integrate them with the scene (harking back to what Samsung said about using raytracing for more realistic AR).
The finale was Jon demonstrating an AR version of his game.
Overall, the message was great but the demo didn’t really take advantage of any of the great AR mechanisms for interaction, although the tracking was great and the graphical quality beat your average AR demo, I was unsure why the game needed AR.
The Town Hall is an open arena for discussing the state of the HPG conference. Since it alternates its colocation with SIGGRAPH, it’s difficult to get a year-on-year comparison but apparently there’s been a drop in attendance and submissions which someone suggested was possibly due to the ray tracing bubble bursting. There was discussion of attracting AR and visual computing interests. The topic of publishing the proceedings of the conference in Computer Graphics Forum (CGF) was raised as a future possibility which adds a lot of value since some academics won’t submit to HPG since it doesn’t currently count as a publication of their work for future citation. CGF’s policy of ownership of papers was raised in light of the current state of publishing in the industry, but it definitely sounds like both the ACM and CGF work bette with those who publish with them. One other point was raised about the quality of the posters since they mostly came from Samsung and there was a discussion about how they could be handled in future and if anything could change with them.
The next HPG is expected to precede SIGGRAPH 2013 (on the Friday, Saturday, Sunday before SIGGRAPH starts on the Sunday) in Anaheim.