Original Link: http://www.anandtech.com/show/6134/khronos-announces-opengl-es-30-opengl-43-astc-texture-compression-clu



As we approach August the technical conference season for graphics is finally reaching its apex. NVIDIA and AMD held their events in May and June respectively, and this week the two of them along with the other major players in the GPU space are coming together for the graphics industry’s marquee event: SIGGRAPH 2012.

Outside of individual vendor events, SIGGRAPH is typically the major venue for new technology and standards announcements. And though it isn’t really a gaming conference – the show is about professional usage in both senses of the word – a number of those announcements do end up being gaming related. Overall we’ll see a number of announcements this week, and kicking things off will be the Khronos Group.

The Khronos Group is the industry consortium responsible for OpenGL, OpenCL, WebGL, and other open graphics/multimedia standards and APIs. Khronos membership in turn is a who’s who of technology, and includes virtually every major GPU vendor, both desktop and mobile. Like other industry consortiums the group’s president is elected from a member company, with NVIDIA’s Neil Trevett currently holding the Khronos presidency.

OpenGL ES 3.0 Specification Released

Khronos’s first SIGGRAPH 2012 announcement – and certainly the biggest – is that OpenGL ES 3.0 has finally been ratified and the specification released. As the primary graphics API for most mobile/embedded devices, including both iOS and Android, Khronos’s work on OpenGL ES is in turn the primary conduit for mobile graphics technology development. Khronos and its members have been working on OpenGL ES 3.0 (codename: Halti) for some time now, and while the nitty-gritty details of the specification have been finally been hammered out, the first OpenGL 3.0 ES hardware has long since taped-out and is near release.

OpenGL ES 3.0 follows on the heels of OpenGL ES 2.0, Khronos’s last OpenGL ES API released over 5 years ago. Like past iterations of OpenGL ES, 3.0 is intended to continue the convergence of mobile and desktop graphics technologies and APIs where it makes sense to do so. As such, where OpenGL ES 2.0 was an effort to bring a suitable subset of OpenGL 2.x functionality to mobile devices, OpenGL ES 3.0 will inject a selection of OpenGL 3.x and 4.x functionality into the OpenGL ES API.

Unlike the transition from OpenGL ES 1.x to 2.0 (which saw hardware move from fixed function to programmable hardware), OpenGL ES 3.0 is going to be fully backwards compatible with OpenGL ES 2.0, which will make this a much more straightforward iteration for developers, and with any luck we will see a much faster turnaround time on new software taking advantage of the additional functionality. Alongside an easier iteration of the API for developers, hardware manufacturers are far more in sync with Khronos this time around. So unlike OpenGL ES 2.0, which saw the first mass-market hardware nearly 2 years later, OpenGL ES 3.0 hardware will be ready by 2013. Desktop support for OpenGL ES 3.0 is also much farther along, and while we’ll get to the desktop side of things in-depth when we talk about OpenGL 4.3, it’s worth noting at this point that OpenGL 4.3 will offer full OpenGL ES 3.0 compatibility, allowing developers to start targeting both platforms at the same time.

On that note, OpenGL ES 3.0 will mark an interesting point for the graphics industry. Thanks to its use both in gaming and in professional applications, desktop OpenGL has to serve both groups, a position that sometimes leaves it caught in the middle and generating some controversy in the process. The lack of complete modernization for OpenGL 3.0 ruffled some game developers’ feathers, and while the situation has improved immensely since then, the issue never completely goes away.

OpenGL ES on the other hand is primarily geared towards consumer devices and has little legacy functionality to speak of, which makes it easier to implement but also allows OpenGL ES to be relatively more cutting edge. All things considered, for game developers OpenGL ES 3.0 is going to very nearly be the OpenGL 3.0 they didn’t get in 2008, although geometry shaders will be notably absent. Consequently, while OpenGL 4.x (or even 3.3) is still more advanced than OpenGL ES 3.0, it’s not out of the question that we’ll see a bigger share of OpenGL desktop game development use OpenGL ES as opposed to desktop OpenGL.



What’s New in OpenGL ES 3.0

In conjunction with next-generation GPUs, OpenGL ES 3.0 will make a number of new features available to mobile and embedded devices. In terms of functionality, OpenGL ES 3.0 is largely a mobile implementation of the OpenGL 3.3 feature set, with a couple notable features missing and a few additional features plucked from later revisions of OpenGL. In terms of backwards compatibility only OpenGL 4.3 is a complete superset of OpenGL ES 3.0, but for most purposes OpenGL 3.1 is probably the closest desktop OpenGL specification.

On that note, though drawing a comparison to Direct3D isn’t particularly straightforward, since we get asked about it so much we’ll try to answer it. Direct3D of course had a major reachitecting with Direct3D 10 back in 2007, which added a number of features to the API  while giving Microsoft a chance to clean out a great deal of fixed-function legacy cruft. From a major feature perspective OpenGL did not reach parity with Direct3D 10 until OpenGL 3.2, which among other things introduced geometry shader support.

So if we had to place OpenGL ES 3.0 along a Direct3D continuum, as it’s primarily based on OpenGL 3.1, it would be somewhere between Direct3D feature level 9_3 and feature level 10_0, again primarily due to a lack of geometry shaders. Consequently, this is why some mobile GPUs like Adreno 320 can support OpenGL ES 3.0, but not D3D feature level 10_0. If you only implement the baseline OpenGL ES 3.0 feature set, then it won’t be enough for D3D 10_0 compliance.

Strictly Defined Pixel/Uniform/Frame Buffer Objects

The first major addition to OpenGL ES 3.0 is support for a number of buffer formats, alongside a general tightening up of the buffer format specifications. OpenGL ES 2.0’s buffer format specification had some ambiguity, which lead to GPU vendors sometimes implementing the same buffer format in slightly different ways, which in turn could lead to problems for developers.

OpenGL ES 3.0 also adds support for Uniform Buffer Objects, which is a useful and efficient data buffer type for use with shaders.

GLSL ES 3.0

As is common with most OpenGL releases, OpenGL ES 3.0 includes a new version of the GL ES Shading Langauge, used to program shader effects. The primary addition for GLSL ES 3.0 is full support for 32bit integer and 32bit floating point (i.e. full precision) data types and operations.  Previously only lower precisions were supported, which are easier to compute (it takes less hardware and less memory), but as shader complexity increases the relatively large precision errors become even larger.

GLSL ES 3.0 has also seen some syntax and feature tweaks to make it more like desktop OpenGL. This doesn’t change the fact that only OpenGL 4.3 is a complete superset of OpenGL ES 3.0, but it makes it easier for developers used to desktop GLSL to work on GLSL ES and vice versa.

Occlusion Queries and Geometry Instancing

While OpenGL ES 3.0 doesn’t get geometry shaders, it does get several features to help with geometry in general. Chief among these are the addition of occlusion queries and geometry instancing. Occlusion queries allows for fast hardware testing of whether an object’s pixels are blocking (occluding) another object, which is helpful for quickly figuring out whether something can be skipped because it’s occluded. Meanwhile geometry instancing allows the hardware to draw the same object multiple times while only requiring the complete object to be submitted to the rendering pipeline once. This makes trees and other common objects easier for the CPU to set up as it doesn’t need to keep resubmitting the entire object in different locations.

Numerous Texture Features

OpenGL ES 3.0 adds support for a number of texture features; in fact it’s far too many to break down. The big additions are support for floating point textures (to go with the aforementioned FP32 support), 3D textures, depth textures, non-power-of-two textures, and 1 & 2 channel textures (R & R/G).

Multiple Render Targets

Multiple Render Target support allows the GPU to render to multiple textures at once. Simply put, the significance of this feature is that it’s necessary for practical real-time deferred rendering.

MSAA Render To Texture

When rendering to a texture, special consideration must be taken for anti-aliasing, which on earlier hardware generations is only available when run against the framebuffer. OpenGL ES 3.0 will add support for MSAA’d rendering to a texture.

Standardized Texture Compression Format: ETC

Wrapping up our look at OpenGL ES 3.0’s features, we have texture compression. One of the big problems for OpenGL for a number of years was that it didn’t have a standardized texture compression format. In the desktop space the earliest and most common texture compression format is S3TC, which is not available royalty-free, and as such cannot be a part of the core standard (instead only available as an extension). This is a problem that carried over to OpenGL ES, which led to vendors implementing their own incompatible texture compression standards. Among the major OpenGL ES GPUs, S3TC, PVRTC, ETC, and ATITC are the most common texture compression formats.

Because there isn’t a standard texture compression format in OpenGL ES 2.0, developers have to pack their textures multiple times for different hardware, which takes up time and more importantly space. This is a particular problem for Android developers since the platform supports multiple GPUs (versus the PowerVR-only iOS).

For OpenGL ES 3.0, Ericsson has offered up their ETC family of texture compression algorithms on a royalty free basis, allowing Khronos to implement a standard texture compression format and thereby over time resolving the issue of having multiple texture compression formats. Compared to where Khronos eventually wants to go ETC is somewhat dated at this point in time – it only offers 6:1 compression for RGB and 4:1 compression for RGBA – but it will get the job done for now.

For Khronos this is a huge breakthrough since texture compression is even more important on mobile devices than it is desktops, due to the much tighter memory bandwidth requirements. This also allows Khronos to evangelize texture compression to developers who had previously been shying away from using texture compression because of the aforementioned issues.

At the same time it will be interesting to see how developers adopt ETC. Just because it’s a standard doesn’t mean it has to be used, and while we can’t imagine Android developers not using it once OpenGL ES 3.0 is the baseline for applications, Apple bears keeping an eye on. As a PowerVR-only shop they have used PVRTC since day one, and so long as they don’t change to another brand of GPUs they wouldn’t need to actually back ETC.

Of course ETC isn’t the only texture compression format in the pipeline. Khronos is also working on the very interesting ASTC, which we’ll get to in a bit.



OpenGL 4.3 Specification Also Released

Alongside OpenGL ES 3.0, desktop OpenGL is also being iterated upon today with the launch of OpenGL 4.3. Unlike OpenGL ES this is only a much smaller point update. This is not to say that it’s unimportant, but it will bring a smaller list of new features to the table.

At the same time though, this also marks what may potentially be an interesting inflection point for OpenGL gaming on the desktop. On Windows OpenGL usage has never been lower; the only AAA game engine still based on OpenGL is id Tech 5, which with the termination of licensing by id is only used internally. Khronos of course would like to change that, so when Valve Software says that they’re going to be porting Source over to Linux (thereby increasing the audience for OpenGL games) it’s of quite some interest to Khronos. At the same time desktop OpenGL still has a long way to go to recapture its glory days in the late 90’s and early 2000’s, so Khronos is doing what they can to spur that on.

OpenGL ES 3.0 Superset & ETC Texture Compression

Moving on to looking at OpenGL 4.3’s features, unsurprisingly, one of the big additions to OpenGL 4.3 is to add the necessary features to make it a proper superset of OpenGL ES 3.0. One of Khronos’s missteps with OpenGL ES 2.0 was that it wasn’t until OpenGL 4.1 in 2010 that desktop OpenGL become a proper superset of OpenGL ES 2.0, which they’re rectifying this time around by launching OpenGL ES and its equivalent OpenGL at the same time.

Because OpenGL ES 3.0 is largely taken from desktop OpenGL in the first place, there aren’t a ton of changes due to this. But there is one: texture compression. The aforementioned standardization around ETC applies to both OpenGL ES and OpenGL, which means that starting with OpenGL 4.3, desktop OpenGL will have a standard texture compression format.

Practically speaking, this won’t make a huge difference to desktop developers right now. Because S3TC is a required part of the Direct3D specification and all desktop GPUs support Direct3D, S3TC has been a de-facto OpenGL standard for nearly 10 years now. And because few developers will target OpenGL 4.3 right away, that won’t change. But this means that developers targeting 4.3 do finally have a choice in texture compression, and developers doing cross-platform development with OpenGL ES can use the same texture compression format in both cases.

It’s worth noting though that just because a GPU “supports” ETC doesn’t mean it has hardware support. NVIDIA has told us that they’ll be 4.3 compliant, but they’re handling ETC by decompressing the texture in their drivers before sending it over to the GPU in an uncompressed format, and while AMD wasn’t able to get back to us in time it’s almost certainly the same story over there. For ports of OpenGL ES games this isn’t going to be a problem (dGPUs have plenty of high-bandwidth memory), but it means S3TC will remain the de-facto standard desktop OpenGL texture compression format for now.

OpenGL Compute Shaders

Moving on, while OpenGL ES 3.0 compatibility is a big deal for OpenGL, it’s actually not the biggest feature addition for OpenGL 4.3. For that we turn to compute shaders.

As a bit of background, when meaningful compute functionality was introduced for GPUs, Microsoft and Khronos went in two separate directions. Khronos of course created OpenCL, a full-featured ANSI C based API for compute. Microsoft on the other hand introduced compute shaders, which was a special class of HLSL designed for compute. OpenCL is far more flexible, but flexibility has its price. Specifically, implementing compute functionality in OpenGL games often wasn’t as easy as the equivalent functionality using a Direct3D compute shader, and the overhead of OpenCL limited performance.

The end result is that Khronos has decided to implement compute shaders in OpenGL in order to bridge this gap. OpenCL of course remains as Khronos’s premiere compute API for both stand-alone compute applications and OpenGL games/applications that need the full flexibility, for but games that don’t need that level of flexibility and only want to run compute work on a GPU there is now another option.

Like Direct3D’s compute shader functionality, OpenGL’s compute shader functionality is geared towards relatively simple pixel operations, where approaching an algorithm in a compute manner (instead of a graphics manner) allows for faster execution. The compute shaders themselves will be written in GLSL rather than C, underscoring the fact that this is an extension of OpenGL’s shading framework rather than their compute framework. The target for this functionality will be games and other applications which perform compute “close to the pixel”, taking advantage of the faster shared memory and greater thread count that compute shaders offer.

Since OpenGL’s compute shader functionality is being spurred on by Direct3D, it should come as no surprise that OpenGL’s compute shaders are going to be a very close implementation of Direct3D’s compute shaders. Specifically, OpenGL’s compute shader functionality is being advertised by Khronos as matching D3D11’s compute shader functionality. The differences between HLSL and GLSL mean that there’s no straightforward portability, but it underscores the fact that this is the OpenGL analog of D3D’s compute shader functionality.

New Texture & Buffer Features

Moving on, OpenGL 4.3 will also be introducing some new texture and buffer features. On the texture side of things, one new feature will be texture views, which allow for a texture to be “viewed” in a different way by interpreting its results in a different manner, all without needing to duplicate the texture for modification. As for buffers, 4.3 introduces support for reading and writing to very large buffers across all shader types and all stages. The idea behind this is that it’s an efficient way for those new compute shaders to communicate with the graphics pipeline, given the large amount of data that can be in flight with a compute shader.

Wrapping things up, for some time now Khronos has been working on bringing OpenGL into alignment with Direct3D in order to close the feature and developer gap that has been created between the two. As Khronos correctly notes, with the addition of compute shader functionality OpenGL is now a true superset of Direct3D. If desktop OpenGL is going to see a resurgence in the next few years, it’s now in a far better position to pull that off than it was in before.



Adaptive Scalable Texture Compression

As we’ve noted in our rundowns of OpenGL and OpenGL ES, the inclusion of ETC texture compression support as part of the core OpenGL standards has finally given OpenGL a standard texture compression format after a number of years. At the same time however, the ETC format itself is approaching several years old, and not unlike S3TC it’s only designed for a limited number of cases. So while Khronos has ETC right now, in the future they want better texture compression and are now taking the first steps to make that happen.

The reward at the end of that quest is Adaptive Scalable Texture Compression (ASTC), a new texture compression format first introduced by ARM in late in 2011. ASTC was the winning proposal in Khronos's search for a next-generation texture compression format, with the ARM/AMD bloc beating out NVIDIA and their ZIL proposal.

As the winning proposal in that search, if all goes according to plan ASTC will eventually become OpenGL and OpenGL ES’s mandatory next generation texture compression algorithm. In the meantime Khronos is introducing it as an optional feature of OpenGL ES 3.0 and OpenGL 4.3 in order to solicit feedback from hardware and software developers. Only once all parties are satisfied with ASTC to the point that it’s ready to be implemented into hardware can it meaningfully be moved into the core OpenGL specifications.

So what makes ASTC next-generation anyhow? Since the introduction of S3TC in the 90s, various parties have been attempting to improve on texture compression with limited results. In the Direct3D world where S3TC is standard, we’ve seen Microsoft add specialized formats for normal maps and other texture types that are not color maps, but only relatively recently did they add another color map compression method with BC7. BC7 in turn is a high quality but lower compression ratio algorithm that solves the gradient issues S3TC faces, but for a 24bit RGB texture it’s only a 3:1 compression ratio versus 6:1 for S3TC (32bit RGBA fares better; both are 4:1).


ASTC Image Quality Comparison: Original, 4x4 (8bpp), 6x6 (3.56bpp), & 8x8 (2bpp) block size

Meanwhile in the mobile space we’ve seen the industry’s respective GPU manufacturers create their own texture compression formats to get around the fact that S3TC is not royalty free (and as such can’t be included in OpenGL). And while Imagination Technologies in particular has an interesting method in PVRTC that unlike the other formats is not block based – and thereby can offer a 2bpp (16:1) compression ratio – it has its own pros and cons. Then of course there’s the matter trying to convince holders of these compression methods to freely license them for inclusion in OpenGL, when S3/VIA has over the years made a tidy profit off of S3TC’s inclusion in Direct3D.

The end result is that the industry is ripe for a royalty free next generation texture compression format, and ARM + NVIDIA intend to deliver on that with the backing of Khronos.

While ASTC is another block based texture compression format, it does have some very interesting functionality that pushes it beyond S3TC or any other previous texture compression format. ASTC’s primary trick is that unlike other block based texture compression formats, it is not based around a fixed size 4x4 texel block. Rather ASTC has a fixed size of 128bits (16 bytes) with a variable size block ranging from 4x4 to 12x12, in effect offering RGBA compression ratios from 8bpp (4:1) all the way up to an incredible 0.89bpp (36:1). The larger block size not only allows for higher compression ratios, but it also offers developers a much finer grained range of compression ratios to work with compared to previous texture compression formats.

Block Size Bits Per Px Comp. Ratio
4x4 8.00 4:1
5x4 6.40 5:1
5x5 5.12 6.25:1
6x5 4.27 7.5:1
6x6 3.56 9:1
8x5 3.20 10:1
8x6 2.67 12:1
10x5 2.56 12.5:1
10x6 2.13 15:1
8x8 2.00 16:1
10x8 1.60 20:1
10x10 1.28 25:1
12x10 1.07 30:1
12x12 0.89 36:1

Alongside a wide range of compression ratios for traditional color maps, ASTC would also support additional types of textures. With support for normal maps ASTC would also replace other texture compression formats as the preferred format for normal maps, and it would also be the first texture compression format with support for 3D textures. Even HDR textures are on the table, though for the time being Khronos is starting with only support for regular (LDR) textures. With any luck, ASTC will become the all-uses texture compression format for OpenGL.

As you can imagine, Khronos is rather excited about the potential for ASTC. With their strong position in the mobile graphics space they need to provide paths to improving mobile graphics quality and performance amidst the reality of Moore’s Law and the other realities of SoC manufacturing. Specifically, mobile GPU bandwidth isn’t expected to grow by leaps and bounds like shading performance, meaning Khronos and its members need to do more with what amounts to less memory bandwidth. For Khronos texture compression is key, as ASTC will allow developers to pack in smaller textures and/or improve their texture quality without using larger textures, thereby making the most of the limited memory bandwidth available.

Of course the desktop world also stands to benefit. ARM’s objective PSNR data for ASTC has it performing far better than S3TC at the same compression ratio, which would bring higher quality texture compression to the desktop at the same texture size. And since ASTC is being developed by Khronos members and released royalty free, at this point there’s no reason to believe that Direct3D couldn’t adopt it in the future, especially since all major Direct3D GPUs also support OpenGL in the first place, so the requisite hardware would already be in place.

With all of that said, there’s still quite a bit of a distance to go between where ASTC is at today and where Khronos would like it to end up. For the time being ASTC needs to prove itself as an optional extension, so that GPU vendors are willing to implement it in hardware. It’s only after it becomes a hardware feature that ASTC can be widely adopted by developers.



OpenCL Gets A CLU

Finally, on top of their OpenGL announcements Khronos is also making several smaller announcements related to some of their other initiatives. As most of these announcements fall outside of our traditional coverage areas we won’t go into great detail here, but there’s one item that falls under the domain of OpenCL: CLU.

For their SIGGRAPH OpenCL side project Khronos is announcing CLU, the Computing Language Utility (ed: Flynn Lives). Since the release of the first OpenCL specification in 2008, Khronos has been looking to expand upon OpenCL both with regards to features (resulting in additional iterations of the standard) and in improving the development tools for OpenCL. CLU falls under this latter effort.

In a nutshell, CLU is essentially a combination of a lightweight API and a template library intended to greatly simplify OpenCL development prototyping. As OpenCL was designed to be a relatively low level language (based on ANSI C), it takes quite a bit of effort to write a program from scratch due to all the facets of OpenCL that must be dealt with. CLU in turn provides a lightweight API that sits on top of a number of templates, the whole of which is designed to abstract away some of that complexity, allowing developers to get started more easily.

Ultimately developers of complex and high-performnace programs will still want to dive into the deepest layers of OpenCL. But for teaching and early prototyping Khronos believes this will significantly improve the OpenCL experience beyond the current paradigm of getting thrown into the deep end of the pool. For teaching beginners in particular, Khronos is hoping to get the process of writing their first OpenCL program down to under an hour as opposed to the much longer period of time it currently takes most beginners.

Finally, like many of their efforts, Khronos is looking to leverage the wider open source community to further improve CLU. Like the OpenGL Utility Toolkit (GLUT), the usefulness of CLU is based on how much functionality is implemented into the utility, and unlike GLUT it’s open source (under an Intel license), making it easy to fork and extend the utility.

Log in

Don't have an account? Sign up now