for Ubuntu 13.10:

```
sudo add-apt-repository ppa:henrythasler/pwnsensor
sudo apt-get update
apt-get install pwnsensor
```

pwnsensor can then be found under the *system* menu.

Sourcecode and additional information is available on github

]]>Here's how it goes:

```
> cd /usr/src
> wget https://www.openssl.org/source/openssl-1.0.1e.tar.gz
> tar -xvzf openssl-1.0.1e.tar.gz
> cd openssl-1.0.1e/
> ./config no-threads shared --prefix=/usr
> make depend
> make
> make test
> make install
```

If you get any errors during the "make depend" step telling something about gcc missing, use yast2 to install gcc and dependencies.

When its finished (compiling and testing takes several minutes), check the correct installation with

```
> openssl version
OpenSSL 1.0.1e 11 Feb 2013
```

Now you can update your apache config. Read this one also. Make sure you restart apache.

Now let's update the OpenSSH server (sshd) as well. It uses the latest OpenSSL libs we have just created. You may also need the "zlib-devel" and "tcpd-devel" package. Use yast2 to install these before you begin. With tcp-wrappers support you can use DenyHosts with your OpenSSH server to prevent brute force attacks.

```
> cd /usr/src
> wget http://ftp.spline.de/pub/OpenBSD/OpenSSH/portable/openssh-6.2p2.tar.gz
> tar xzvf openssh-6.2p2.tar.gz
> cd openssh-6.2p2/
> ./configure --prefix=/usr --sysconfdir=/etc/ssh --with-tcp-wrappers
> make
```

I recommend you rename your /etc/ssh directory to something else an let the new ssh version create a new one with all new settings an keys. You can modify the new config file afterwards.

```
> mv /etc/ssh /etc/ssh.old
> make install
```

You can check the current version with

```
> sshd -?
OpenSSH_6.2p2, OpenSSL 1.0.1e 11 Feb 2013
usage: sshd [-46DdeiqTt] [-b bits] [-C connection_spec] [-c host_cert_file]
[-f config_file] [-g login_grace_time] [-h host_key_file]
[-k key_gen_time] [-o option] [-p port] [-u len]
```

]]>I use Maperitive to create my own map, optimized for hiking/biking. Right now, Maperitive uses SRTM-1 and SRTM-3 DEM data (HGT-Files). A decent description of SRTM-files can be found at vterrain.org.

The Austrian Government (Land Tirol) has published high-resolution DEM data (TIRIS) derived from airborne laser scans (LIDAR) over their territory. This TIRIS-data has a resolution of 10m, which is 3-4 times more accurate than other existing SRTM-1 DEM data. The most accurate SRTM-1 data (I have found) for Tyrol is available on Jonathan de Ferrantis website Viewfinder Panoramas.

Here is a comparison (note the detailed rendering of the roads/rivers on the right image):

The goal of this article is to use this data as source for all DEM-related operations like hill-shading or contour lines with Maperitive.

First of all, we need the DEM data from Tyrol. It can be obtained from this page. Download and unzip into one folder (skip "Bezirk Innsbruck" it's a subset of "Bezirk Innsbruck-Land"). There should be 8 files named "*_10m_float.asc".

The TIRIS data format is Arcinfo ASCII-Grid which is more or less a plain text file with height information (details later). Maperitive cannot use it. We need to convert this data to an SRTM-1 HGT file. That's why we need to set up a working GDAL environment. Visit the GDAL-website and follow their installation instructions for your environment. Make sure you also install the *proj*-package (including development-files) in a linux environment. We will need the gdal_translate and gdalwarp tools.

To merge the TIRIS data with an existing SRTM1-tile please grab a copy at the viewfinder website. In this example I will use the file N47E011.hgt tile. It covers the region around Innsbruck.

I use an OpenSuse installation within a VirtualBox for the GDAL-Toolkit. You can get an OpenSuse image here. You may also need a shared folder in OpenSuse for data exchange to the host computer. If you have problems setting up your environment, let me know.

That was the most boring part. Now lets have a look at the DEM data itself.

Here are some key facts about the DEM data the Austrian government of Tyrol provides:

- Data format: Arcinfo ASCII-Grid
- Datatype: Float32
- Resolution: 10m x 10m.
- Projection: Transverse Mercator (EPSG 31254).

See "Informationsblatt.txt" provided with every zip-file for more details.

To get an idea how impressive (meaning accurate) the TIRIS data is, you should have a look at it first (Scale: 1 pixel=10m):

To visualize the raw DEM-data I recommend GlobalMapper. The unregistered ("free") version can load and display most DEM formats including the Arc ASCII Grid. However, the "free" version cannot save any data. But that's no problem, since we have GDAL at hand.

You can also give VTBuilder a try. It has similar visualization functionality and is also able to save data. However, I was not able to convert the TIRIS data into SRTM format without offset errors in the resulting SRTM1-HGT file.

This is the complete shell script to convert the TIRIS DEM data into STRM1-HGT:

```
# create backup of original SRTM1-HGT file
cp N47E011.hgt N47E011.old
# convert original STRM1 to TIFF
gdal_translate N47E011.hgt N47E011.tif
# convert and merge all TIRIS data to one TIFF
gdalwarp -s_srs EPSG:31254 -t_srs EPSG:4326 -srcnodata -9999 -r bilinear
-overwrite -te 11.0 47.0 12.0 48.0 -ts 3601 3601 -order 3 -et 0.0
-wt Float32 -wo SAMPLE_STEPS=100 -dstnodata none *10m_float.asc N47E011_tiris.tif
# merge TIRIS-TIFF into old SRTM1 data
gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -r bilinear -order 3 -et 0.0
-wt Float32 -wo SAMPLE_STEPS=100 N47E011_tiris.tif N47E011.tif
# convert result back to SRTM1 (overwrite original file)
gdal_translate -of SRTMHGT N47E011.tif N47E011.hgt
```

Let's get step by step through the script:

cp N47E011.hgt N47E011.old

Creates (copy) a backup of the original SRTM1 data. In case you want to use the original file later for another purpose.

gdal_translate N47E011.hgt N47E011.tif

Use gdal_translate to convert the original SRTM1-HGT data into GeoTIFF format. We need this to merge the new data derived from the TIRIS DEM into the existing SRTM data.

gdalwarp -s_srs EPSG:31254 -t_srs EPSG:4326 -srcnodata -9999 -dstnodata none

-overwrite -te 11.0 47.0 12.0 48.0 -ts 3601 3601

-r bilinear -order 3 -et 0.0 -wt Float32 -wo SAMPLE_STEPS=100

*10m_float.asc N47E011_tiris.tif

With this mighty command, you grab every TIRIS file (*10m_float.asc) and convert in into a GeoTIFF. During this process, we define the source and target projection, tell the converter to skip non existent data represented as -9999m above sea level in our source file. In case there is already an existing GeoTIFF, we will overwrite it. It also needs to know what SRTM-tile we are going to be in (lower-left-upper-right window) and what the target resolution we want. Note that the resolution of SRTM1 data is 3601x3601. We can't do better here. That is also the reason why we need to rescale the input data with a bilinear 3rd order filter. The red parts should set reprojection quality to high but I'm not sure they do anything at all.

Ok, here comes the tricky part:

gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -r bilinear -order 3 -et 0.0

-wt Float32 -wo SAMPLE_STEPS=100 N47E011_tiris.tif N47E011.tif

It's almost the same as the command before except it isn't. Note that the -overwrite option is not set. This command merges the current SRTM1-file (the tif we created with the second command from a hgt-file) with our converted TIRIS-data. The new (=better) data is copied into the old file where available. Other parts remain the same as in the original file:

The last command converts the tif back to a SRTM1-hgt file:

gdal_translate -of SRTMHGT N47E011.tif N47E011.hgt

That's it. We have our updated hgt-file based on high-precision LIDAR-DEM data.

The above script is just an example an depends on hard-coded coordinates and tiles. Therefore I have to create a more complex script that covers all of the affected tiles for Tyrol (coming soon).

You can download the generated HGT file for Innsbruck area here: N47E011.zip

]]>Today I'm going to introduce emulated quadruple floating point precision (quad-single-precision) with GLSL. I will use the well-known mandelbrot set to demonstrate the concept. The sourcecode is available as usual.

The last posts use double precision (hardware and emulated) to calculate mandelbrot sets down to [insert] units per pixel in the complex plane. To push that limit even further I will emulate quadruple precision (quad-precision) using four single precision variables (quad-single). The original concept and the Fortran/C++ sourcecode was developed by Yozo Hi, Xiaoye S. Li and David H. Bailey at Berkeley. It is available as the QD library. I just need to convert and modify this code for GLSL...

The concept is basically the same I have described in on of my previous posts about double-single emulation. For quad-precision you just add two more floats to represent your actual value. On the contrary, arithmetic operations are much more difficult to perform, because you must take care of all the carry-over stuff. This results in a lot of difficult and expensive functions to perform just a simple addition. The technical details are described in this paper (PDF).

The Qt application features only minor changes. To improve precision on this end I use long doubles for all variables that eventually end up in the shader.

To make things easier in the shader it gets two vec2 elements instead of four floats with the glUniform2fv-function. That's why we need a handle for this function:

```
// define prototype
typedef void (APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value);
PFNGLUNIFORM2FVPROC glUniform2fv;
// get handle from OpenGL-context
glUniform2fv = (PFNGLUNIFORM2FVPROC) GLFrame->context()->getProcAddress("glUniform2fv");
```

It is used as follows:

```
float vec2[2];
vec2[0] = (float)xpos;
vec2[1] = xpos - (double)vec2[0];
glUniform2fv(glGetUniformLocation(ShaderProgram->programId(), "qs_cx"), 2, vec2);
```

I guess the shader has gotten somewhat complex. Nevertheless you can have a look at it and maybe adapt or extend it to your own purpose. The variable names are the same as in the original c++ code. The comments behind the lines are the original code (useful for debugging/comparing).

```
#version 120
// emulated quadruple precision GLSL library
// created by Henry thasler (thasler.org/blog)
// based on the QD library (http://crd-legacy.lbl.gov/~dhbailey/mpdist/)
uniform int iterations;
uniform float frame;
uniform float radius;
uniform vec2 qs_z;
uniform vec2 qs_w;
uniform vec2 qs_h;
uniform vec2 qs_cx;
uniform vec2 qs_cy;
// inline double quick_two_sum(double a, double b, double &err)
vec2 quick_2sum(float a, float b)
{
float s = a + b; // double s = a + b;
return vec2(s, b-(s-a)); // err = b - (s - a);
}
/* Computes fl(a+b) and err(a+b). */
// inline double two_sum(double a, double b, double &err)
vec2 two_sum(float a, float b)
{
float v,s,e;
s = a+b; // double s = a + b;
v = s-a; // double bb = s - a;
e = (a-(s-v))+(b-v); // err = (a - (s - bb)) + (b - bb);
return vec2(s,e);
}
vec2 split(float a)
{
float t, hi;
t = 8193. * a;
hi = t - (t-a);
return vec2(hi, a-hi);
}
vec3 three_sum(float a, float b, float c)
{
vec2 tmp;
vec3 res;// = vec3(0.);
float t1, t2, t3;
tmp = two_sum(a, b); // t1 = qd::two_sum(a, b, t2);
t1 = tmp.x;
t2 = tmp.y;
tmp = two_sum(c, t1); // a = qd::two_sum(c, t1, t3);
res.x = tmp.x;
t3 = tmp.y;
tmp = two_sum(t2, t3); // b = qd::two_sum(t2, t3, c);
res.y = tmp.x;
res.z = tmp.y;
return res;
}
//inline void three_sum2(double &a, double &b, double &c)
vec3 three_sum2(float a, float b, float c)
{
vec2 tmp;
vec3 res;// = vec3(0.);
float t1, t2, t3; // double t1, t2, t3;
tmp = two_sum(a, b); // t1 = qd::two_sum(a, b, t2);
t1 = tmp.x;
t2 = tmp.y;
tmp = two_sum(c, t1); // a = qd::two_sum(c, t1, t3);
res.x = tmp.x;
t3 = tmp.y;
res.y = t2 + t3; // b = t2 + t3;
return res;
}
vec2 two_prod(float a, float b)
{
float p, e;
vec2 va, vb;
p=a*b;
va = split(a);
vb = split(b);
e = ((va.x*vb.x-p) + va.x*vb.y + va.y*vb.x) + va.y*vb.y;
return vec2(p, e);
}
vec4 renorm(float c0, float c1, float c2, float c3, float c4)
{
float s0, s1, s2 = 0.0, s3 = 0.0;
vec2 tmp;
// if (QD_ISINF(c0)) return;
tmp = quick_2sum(c3,c4); // s0 = qd::quick_two_sum(c3, c4, c4);
s0 = tmp.x;
c4 = tmp.y;
tmp = quick_2sum(c2,s0); // s0 = qd::quick_two_sum(c2, s0, c3);
s0 = tmp.x;
c3 = tmp.y;
tmp = quick_2sum(c1,s0); // s0 = qd::quick_two_sum(c1, s0, c2);
s0 = tmp.x;
c2 = tmp.y;
tmp = quick_2sum(c0,s0); // c0 = qd::quick_two_sum(c0, s0, c1);
c0 = tmp.x;
c1 = tmp.y;
s0 = c0;
s1 = c1;
tmp = quick_2sum(c0,c1); // s0 = qd::quick_two_sum(c0, c1, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0) {
tmp = quick_2sum(s1,c2); // s1 = qd::quick_two_sum(s1, c2, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0) {
tmp = quick_2sum(s2,c3); // s2 = qd::quick_two_sum(s2, c3, s3);
s2 = tmp.x;
s3 = tmp.y;
if (s3 != 0.0)
s3 += c4;
else
s2 += c4;
} else {
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0){
tmp = quick_2sum(s2,c4); // s2 = qd::quick_two_sum(s2, c4, s3);
s2 = tmp.x;
s3 = tmp.y;}
else{
tmp = quick_2sum(s1,c4); // s1 = qd::quick_two_sum(s1, c4, s2);
s1 = tmp.x;
s2 = tmp.y;}
}
} else {
tmp = quick_2sum(s0,c2); // s0 = qd::quick_two_sum(s0, c2, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0) {
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0){
tmp = quick_2sum(s2,c4); // s2 = qd::quick_two_sum(s2, c4, s3);
s2 = tmp.x;
s3 = tmp.y;}
else{
tmp = quick_2sum(s1,c4); // s1 = qd::quick_two_sum(s1, c4, s2);
s1 = tmp.x;
s2 = tmp.y;}
} else {
tmp = quick_2sum(s0,c3); // s0 = qd::quick_two_sum(s0, c3, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0){
tmp = quick_2sum(s1,c4); // s1 = qd::quick_two_sum(s1, c4, s2);
s1 = tmp.x;
s2 = tmp.y;}
else{
tmp = quick_2sum(s0,c4); // s0 = qd::quick_two_sum(s0, c4, s1);
s0 = tmp.x;
s1 = tmp.y;}
}
}
return vec4(s0, s1, s2, s3);
}
vec4 renorm4(float c0, float c1, float c2, float c3)
{
float s0, s1, s2 = 0.0, s3 = 0.0;
vec2 tmp;
// if (QD_ISINF(c0)) return;
tmp = quick_2sum(c2,c3); // s0 = qd::quick_two_sum(c2, c3, c3);
s0 = tmp.x;
c3 = tmp.y;
tmp = quick_2sum(c1,s0); // s0 = qd::quick_two_sum(c1, s0, c2);
s0 = tmp.x;
c2 = tmp.y;
tmp = quick_2sum(c0,s0); // c0 = qd::quick_two_sum(c0, s0, c1);
c0 = tmp.x;
c1 = tmp.y;
s0 = c0;
s1 = c1;
if (s1 != 0.0) {
tmp = quick_2sum(s1,c2); // s1 = qd::quick_two_sum(s1, c2, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0){
tmp = quick_2sum(s2,c3); // s2 = qd::quick_two_sum(s2, c3, s3);
s2 = tmp.x;
s3 = tmp.y;}
else{
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;}
} else {
tmp = quick_2sum(s0,c2); // s0 = qd::quick_two_sum(s0, c2, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0){
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;}
else{
tmp = quick_2sum(s0,c3); // s0 = qd::quick_two_sum(s0, c3, s1);
s0 = tmp.x;
s1 = tmp.y;}
}
return vec4(s0, s1, s2, s3);
}
vec3 quick_three_accum(float a, float b, float c)
{
vec2 tmp;
float s;
bool za, zb;
tmp = two_sum(b, c); // s = qd::two_sum(b, c, b);
s = tmp.x;
b = tmp.y;
tmp = two_sum(a, s); // s = qd::two_sum(a, s, a);
s = tmp.x;
a = tmp.y;
za = (a != 0.0);
zb = (b != 0.0);
if (za && zb)
return vec3(a,b,s);
if (!zb) {
b = a;
a = s;
} else {
a = s;
}
return vec3(a,b,0.);
}
// inline qd_real qd_real::ieee_add(const qd_real &a, const qd_real &b)
vec4 qs_ieee_add(vec4 _a, vec4 _b)
{
vec2 tmp=vec2(0.);
vec3 tmp3=vec3(0.);
int i, j, k;
float s, t;
float u, v; // double-length accumulator
float x[4] = float[4](0.0, 0.0, 0.0, 0.0);
float a[4], b[4];
a[0] = _a.x;
a[1] = _a.y;
a[2] = _a.z;
a[3] = _a.w;
b[0] = _b.x;
b[1] = _b.y;
b[2] = _b.z;
b[3] = _b.w;
i = j = k = 0;
if (abs(a[i]) > abs(b[j]))
u = a[i++];
else
u = b[j++];
if (abs(a[i]) > abs(b[j]))
v = a[i++];
else
v = b[j++];
tmp = quick_2sum(u,v); // u = qd::quick_two_sum(u, v, v);
u = tmp.x;
v = tmp.y;
while (k < 4) {
if (i >= 4 && j >= 4) {
x[k] = u;
if (k < 3)
x[++k] = v;
break;
}
if (i >= 4)
t = b[j++];
else if (j >= 4)
t = a[i++];
else if (abs(a[i]) > abs(b[j])) {
t = a[i++];
} else
t = b[j++];
tmp3 = quick_three_accum(u,v,t) ; // s = qd::quick_three_accum(u, v, t);
u = tmp3.x;
v = tmp3.y;
s = tmp3.z;
if (s != 0.0) {
x[k++] = s;
}
}
// add the rest.
for (k = i; k < 4; k++)
x[3] += a[k];
for (k = j; k < 4; k++)
x[3] += b[k];
// qd::renorm(x[0], x[1], x[2], x[3]);
// return qd_real(x[0], x[1], x[2], x[3]);
return renorm4(x[0], x[1], x[2], x[3]);
}
// inline qd_real qd_real::sloppy_add(const qd_real &a, const qd_real &b)
vec4 qs_sloppy_add(vec4 a, vec4 b)
{
float s0, s1, s2, s3;
float t0, t1, t2, t3;
float v0, v1, v2, v3;
float u0, u1, u2, u3;
float w0, w1, w2, w3;
vec2 tmp;
vec3 tmp3;
s0 = a.x + b.x; // s0 = a[0] + b[0];
s1 = a.y + b.y; // s1 = a[1] + b[1];
s2 = a.z + b.z; // s2 = a[2] + b[2];
s3 = a.w + b.w; // s3 = a[3] + b[3];
v0 = s0 - a.x; // v0 = s0 - a[0];
v1 = s1 - a.y; // v1 = s1 - a[1];
v2 = s2 - a.z; // v2 = s2 - a[2];
v3 = s3 - a.w; // v3 = s3 - a[3];
u0 = s0 - v0;
u1 = s1 - v1;
u2 = s2 - v2;
u3 = s3 - v3;
w0 = a.x - u0; // w0 = a[0] - u0;
w1 = a.y - u1; // w1 = a[1] - u1;
w2 = a.z - u2; // w2 = a[2] - u2;
w3 = a.w - u3; // w3 = a[3] - u3;
u0 = b.x - v0; // u0 = b[0] - v0;
u1 = b.y - v1; // u1 = b[1] - v1;
u2 = b.z - v2; // u2 = b[2] - v2;
u3 = b.w - v3; // u3 = b[3] - v3;
t0 = w0 + u0;
t1 = w1 + u1;
t2 = w2 + u2;
t3 = w3 + u3;
tmp = two_sum(s1, t0); // s1 = qd::two_sum(s1, t0, t0);
s1 = tmp.x;
t0 = tmp.y;
tmp3 = three_sum(s2, t0, t1); // qd::three_sum(s2, t0, t1);
s2 = tmp3.x;
t0 = tmp3.y;
t1 = tmp3.z;
tmp3 = three_sum2(s3, t0, t2); // qd::three_sum2(s3, t0, t2);
s3 = tmp3.x;
t0 = tmp3.y;
t2 = tmp3.z;
t0 = t0 + t1 + t3;
// qd::renorm(s0, s1, s2, s3, t0);
return renorm(s0, s1, s2, s3, t0); // return qd_real(s0, s1, s2, s3);
}
vec4 qs_add(vec4 _a, vec4 _b)
{
return qs_sloppy_add(_a, _b);
// return qs_ieee_add(_a, _b);
}
vec4 qs_mul(vec4 a, vec4 b)
{
float p0, p1, p2, p3, p4, p5;
float q0, q1, q2, q3, q4, q5;
float t0, t1;
float s0, s1, s2;
vec2 tmp;
vec3 tmp3;
tmp = two_prod(a.x, b.x); // p0 = qd::two_prod(a[0], b[0], q0);
p0 = tmp.x;
q0 = tmp.y;
tmp = two_prod(a.x, b.y); // p1 = qd::two_prod(a[0], b[1], q1);
p1 = tmp.x;
q1 = tmp.y;
tmp = two_prod(a.y, b.x); // p2 = qd::two_prod(a[1], b[0], q2);
p2 = tmp.x;
q2 = tmp.y;
tmp = two_prod(a.x, b.z); // p3 = qd::two_prod(a[0], b[2], q3);
p3 = tmp.x;
q3 = tmp.y;
tmp = two_prod(a.y, b.y); // p4 = qd::two_prod(a[1], b[1], q4);
p4 = tmp.x;
q4 = tmp.y;
tmp = two_prod(a.z, b.x); // p5 = qd::two_prod(a[2], b[0], q5);
p5 = tmp.x;
q5 = tmp.y;
/* Start Accumulation */
tmp3 = three_sum(p1, p2, q0); // qd::three_sum(p1, p2, q0);
p1 = tmp3.x;
p2 = tmp3.y;
q0 = tmp3.z;
/* Six-Three Sum of p2, q1, q2, p3, p4, p5. */
tmp3 = three_sum(p2, q1, q2); // qd::three_sum(p2, q1, q2);
p2 = tmp3.x;
q1 = tmp3.y;
q2 = tmp3.z;
tmp3 = three_sum(p3, p4, p5); // qd::three_sum(p3, p4, p5);
p3 = tmp3.x;
p4 = tmp3.y;
p5 = tmp3.z;
/* compute (s0, s1, s2) = (p2, q1, q2) + (p3, p4, p5). */
tmp = two_sum(p2, p3); // s0 = qd::two_sum(p2, p3, t0);
s0 = tmp.x;
t0 = tmp.y;
tmp = two_sum(q1, p4); // s1 = qd::two_sum(q1, p4, t1);
s1 = tmp.x;
t1 = tmp.y;
s2 = q2 + p5;
tmp = two_sum(s1, t0); // s1 = qd::two_sum(s1, t0, t0);
s1 = tmp.x;
t0 = tmp.y;
s2 += (t0 + t1);
/* O(eps^3) order terms */
s1 += a.x*b.w + a.y*b.z + a.z*b.y + a.w*b.x + q0 + q3 + q4 + q5;
return renorm(p0, p1, s0, s1, s2); // qd::renorm(p0, p1, s0, s1, s2);
}
float ds_compare(vec2 dsa, vec2 dsb)
{
if (dsa.x < dsb.x) return -1.;
else if (dsa.x == dsb.x)
{
if (dsa.y < dsb.y) return -1.;
else if (dsa.y == dsb.y) return 0.;
else return 1.;
}
else return 1.;
}
float qs_compare(vec4 qsa, vec4 qsb)
{
if(ds_compare(qsa.xy, qsb.xy)<0.) return -1.; // if (dsa.x < dsb.x) return -1.;
else if (ds_compare(qsa.xy, qsb.xy) == 0.) // else if (dsa.x == dsb.x)
{
if(ds_compare(qsa.zw, qsb.zw)<0.) return -1.; // if (dsa.y < dsb.y) return -1.;
else if (ds_compare(qsa.zw, qsb.zw) == 0.) return 0.;// else if (dsa.y == dsb.y) return 0.;
else return 1.;
}
else return 1.;
}
float qs_mandel(void)
{
vec4 qs_tx = vec4(gl_TexCoord[0].x, vec3(0.)); // get position of current pixel
vec4 qs_ty = vec4(gl_TexCoord[0].y, vec3(0.));
// initialize complex variable with respect to current position, zoom, ...
vec4 cx = qs_add(qs_add(vec4(qs_cx,0.,0.),qs_mul(qs_tx,vec4(qs_z,0.,0.))),vec4(qs_w,0.,0.));
vec4 cy = qs_add(qs_add(vec4(qs_cy,0.,0.),qs_mul(qs_ty,vec4(qs_z,0.,0.))),vec4(qs_h,0.,0.));
vec4 tmp;
vec4 zx = cx;
vec4 zy = cy;
vec4 two = vec4(2.0, vec3(0.));
vec4 e_radius = vec4(radius*radius, vec3(0.)); // no sqrt available so compare with radius^2 = 2^2 = 2*2 = 4
for(int n=0; n<iterations; n++)
{
tmp = zx;
zx = qs_add(qs_add(qs_mul(zx, zx), -qs_mul(zy, zy)), cx);
zy = qs_add(qs_mul(qs_mul(zy, tmp), two), cy);
if( qs_compare(qs_add(qs_mul(zx, zx), qs_mul(zy, zy)), e_radius)>0.)
{
return(float(n) + 1. - log(log(length(vec2(zx.x, zy.x))))/log(2.)); // http://linas.org/art-gallery/escape/escape.html
}
}
return 0.;
}
void main()
{
float n = qs_mandel();
gl_FragColor = vec4((-cos(0.025*n)+1.0)/2.0,
(-cos(0.08*n)+1.0)/2.0,
(-cos(0.12*n)+1.0)/2.0,
1.0);
}
```

Please note that there are two methods in the qs_add-function to add two quad-singles: "sloppy_add", which is faster and less accurate and "ieee_add" (nice and slow). You can use either of them.

It actually works! New worlds of our mandelbrot lay ahead. Undiscovered features can be made visible with just a blink of our GPU eye.

Quad-Single works fine but is really slooooooooow... See for yourself:

CPU: Intel i5-2400 @ 3.1 GHz

GPU: ATI HD4870

Compared with emulated double precision (51 FPS) and hardware accelerated double precision (154 FPS) the emulated quad-precision (6 FPS) is taking it's time.

As you may have noticed if you actually tried this demo, zooming and scrolling beyond zoom levels of 48 is a bit inaccurate. This is due to the limited precision of the variables that the main (Qt) program hands over to the shader. It uses double precision (more precisely: emulated double) and is limited to a minimal step width that is well above the quad single precision of the shader. This is an issue I'm going to solve in another post (hopefully...).

Quadruple precision is - in terms of computational resources - very expensive to implement. It is suitable for real-time applications if you stick with simple calculations.

Eric Bainville has written some code for fp128 (128-bit fixed point numbers). Maybe I can try this with GLSL in the future and see how it performs.

The reduced computing precision (only long doubles are generally available in Qt) on the Qt side is currently limiting the possibilities of the shader performance. An equal or higher precision as in the shader is required to explore the full depth of the emulated quadruple precision in GLSL. Maybe the quad-single concept can be extended to quad-double.

Download: GLSL_QuadSingleMandel

]]>I reckon that it must be some weird optimization on NVIDIA cards that break the emulation code. So I searched for a way to disable these optimizations. After some googling I found an old blog entry by Cyril Crassin (check out his new blog if you are into 3D graphics) which helped me out with this problem.

To turn off the NVIDIA optimizations I had to add the following lines to the shader code.

`#pragma optionNV(fastprecision off)`

Now it should work fine with NVIDIA GPUs. Let me know if there are still some issues.

Download updated version: GLSL_EmuMandel

I have also found a handy tool called NVEmulate to examine the GLSL compiler output and other stuff to analyze GLSL assembly on NVIDIA GPUs.

Jethro provided the division function:

vec2 ds_div(vec2 dsa, vec2 dsb)

{

vec2 dsc;

float c11, c21, c2, e, s1, s2, t1, t2, t11, t12, t21, t22;

float a1, a2, b1, b2, cona, conb, split = 8193.;

```
s1 = dsa.x / dsb.x;
cona = s1 * split;
conb = dsb.x * split;
a1 = cona – (cona – s1);
b1 = conb – (conb – dsb.x);
a2 = s1 – a1;
b2 = dsb.x – b1;
c11 = s1 * dsb.x;
c21 = (((a1 * b1 – c11) + a1 * b2) + a2 * b1) + a2 * b2;
c2 = s1 * dsb.y;
t1 = c11 + c2;
e = t1 – c11;
t2 = ((c2 – e) + (c11 – (t1 – e))) + c21;
t12 = t1 + t2;
t22 = t2 – (t12 – t1);
t11 = dsa[0] – t12;
e = t11 – dsa[0];
t21 = ((-t12 – e) + (dsa.x – (t11 – e))) + dsa.y – t22;
s2 = (t11 + t21) / dsb.x;
dsc.x = s1 + s2;
dsc.y = s2 – (dsc.x – s1);
return dsc;
}
```

]]>To improve performance of our mandelbrot project a little bit further, I will show how to use real, hardware accelerated double precision arithmetics.

As we use OpenGL and GLSL for our shaders, we can use the OpenGL functions defined in the "GL_ARB_gpu_shader_fp64" extension (see specification for details). This requires OpenGL 3.2 and GLSL 1.5 but you can try anyway...

The shader code is similar to the first part of this series with single floats only that I use double as data type. I also use double precision vector elements (dvec2) to make the code shorter. Please note that for some reason the length() function does not work with double precision vars (at least on my GPU, an ATI HD4870):

```
uniform int iterations;
uniform int frame;
uniform float radius;
uniform dvec2 d_c;
uniform dvec2 d_s;
uniform double d_z;
float dmandel(void)
{
dvec2 c = d_c + dvec2(gl_TexCoord[0].xy)*d_z + d_s;
dvec2 z = c;
for(int n=0; n<iterations; n++)
{
z = dvec2(z.x*z.x - z.y*z.y, 2.0lf*z.x*z.y) + c;
if(length(vec2(z.x,z.y)) > radius)
{
return(float(n) + 1. - log(log(length(vec2(z.x,z.y))))/log(2.)); // http://linas.org/art-gallery/escape/escape.html
}
}
return 0.;
}
void main()
{
float n = dmandel();
gl_FragColor = vec4((-cos(0.025*n)+1.0)/2.0,
(-cos(0.08*n)+1.0)/2.0,
(-cos(0.12*n)+1.0)/2.0,
1.0);
}
```

To access the shader variables from the main program we need functions suitable for double precision vars. OpenGL provides functions called glUniform1dv, glUniform2dv and so on. Unfortunately the Qt implementation does not (yet) support this function in their libraries. To overcome this situation we have to grab the handles to these functions directly from OpenGL. So let's have a look at the glext.h. There you will find the definitions of these function. We add these to our project:

typedef void (APIENTRYP PFNGLUNIFORM1DVPROC) (GLint location, GLsizei count, const GLdouble *value);

PFNGLUNIFORM1DVPROC glUniform1dv;

```
typedef void (APIENTRYP PFNGLUNIFORM2DVPROC) (GLint location, GLsizei count, const GLdouble *value);
PFNGLUNIFORM2DVPROC glUniform2dv;
typedef GLint (APIENTRYP PFNGLGETUNIFORMLOCATIONPROC) (GLuint program, const GLchar *name);
PFNGLGETUNIFORMLOCATIONPROC glGetUniformLocation;
```

You may have noticed a third function called glGetUniformLocation. This function is required by the glUniformxdv functions to access the correct shader variable.

Now we grab the handles as follows:

glGetUniformLocation = (PFNGLGETUNIFORMLOCATIONPROC) GLFrame->context()->getProcAddress("glGetUniformLocation");

glUniform1dv = (PFNGLUNIFORM1DVPROC) GLFrame->context()->getProcAddress("glUniform1dv");

glUniform2dv = (PFNGLUNIFORM1DVPROC) GLFrame->context()->getProcAddress("glUniform2dv");

```
if(glGetUniformLocation && glUniform1dv && glUniform2dv) // did we get all handles?
{
qDebug() << "Yay! Hardware accelerated double precision enabled.";
RenderCaps '= 0x04; // yes, we can perform double precision rendering
}
else qDebug() << "Too bad, your GPU does not support hardware accelerated double precision.";
```

After that is done, we can use these function to feed our double precision mandelbrot shader shader as follows:

double tmp, dvec2[2];

// snip

case 2: // double precision (FP64) shader values

dvec2[0] = xpos;

dvec2[1] = ypos;

glUniform2dv(glGetUniformLocation(ShaderProgram->programId(), "d_c"), 2, dvec2);

dvec2[0] = -((double)w)/2.0/zoom;

dvec2[1] = -((double)h)/2.0/zoom;

glUniform2dv(glGetUniformLocation(ShaderProgram->programId(), "d_s"), 2, dvec2);

tmp = 1./zoom;

glUniform1dv(glGetUniformLocation(ShaderProgram->programId(), "d_z"), 1, &tmp);

break;

To benchmark the emulated and hardware accellerated double precision I used the following settings:

single precision: 120 FPS

emulated double precision: 25 FPS

hardware double precision: 90 FPS

As expected the hardware supported version is almost 4 times faster that the emulated. Hopefully more GPUs support double precision in the future.

download here: GLSL_DoubleMandel

Hardware acellerated double precision rendering works fine on supported GPUs. In our mandelbrot demo it is much faster than emulated an also gives a detail boost of about four zoom steps (47 vs. 43).

But I guess this is not the end of the line. I'm sure I can improve accuracy even more. Stay tuned.

]]>Since I am very interested to discover the beauty of our mandelbrot set in close detail I will improve the existing shader with emulated double precision variables (aka: double-single) and see how far I can push it.

Since I didn't find any GLSL code samples using emulated precision I decided to use the DSFUN90 library by David H. Bailey which is written in Fortran. Fontran is no good for GPUs so I had to convert the parts I needed to GLSL.

Single precision floats can hold up to 8 digits and an exponent. Say, when you want to store the numer **0.4888129819481270** in a single float variable you will get **4.8881298e-1** (8 digits and an exponent). The remainder (the green part) is lost. On the other hand you can store the number **0.0000000019481270** in a single float without having any trouble (**1.9481270e-9**). Check out this page to convert any number to their single or double precision counterparts and see what happens.

You may have figured out by now, that you can store the number **0.4888129819481270** as the sum of **4.8881298e-1** and **1.9481270e-9** and that it is possible to store each of these two parts in a single floating point variable. So we just split the double precision value in two single precision variables as shown above and we are fine. Well, we're almost fine, since the functions to do the basic math stuff like addition or multiplication get a bit more complicated but that's the point where Mr. Bailey's library comes in and helps us out with his emulated double precision arithmetic.

Preparing the double precision variables before transferring the to our shader works as described above:

- take a double (
**0.4888129819481270**) and convert it to single float (**4.8881298e-1**). Store it. - convert the single float back to double (
**0.4888129800000000**) and subtract it from our original value. - Store the result (
**0.0000000019481270**)**1.9481270e-9**).`vec2[0] = (float)xpos; vec2[1] = xpos - (double)vec2[0]; ShaderProgram->setUniformValue("ds_cx0", vec2[0]); ShaderProgram->setUniformValue("ds_cx1", vec2[1]);`

The blue and green parts can be seen as High- and Low-Part of our emulated double value.

The emulated double precision values (double-single) can be stored as vec2 in GLSL. This makes the code short and readability is improved (vec2(ds_hi, ds_lo)).

To evaluate our mandelbrot formula (z = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c) and do the other stuff to create a cool looking image, we need the following arithmetics:

- Convert to/from emulated double precision (double-single)
- Addition, subtraction
- Multiplication
- Comparison

Conversion to double-single is easy since you just copy the value to the High part of the double-single (DS) variable.

```
vec2 ds_set(float a)
{
vec2 z;
z.x = a;
z.y = 0.0;
return z;
}
vec2 ds_two = ds_set(2.0);
```

To create a single float from our DS variable we just use the High-part and leave the (much smaller) Low-part out.

float s_two = ds_two.x;

Addition is a bit more complex since you have to take care of some carry over from low to high part.

vec2 ds_add (vec2 dsa, vec2 dsb)

{

vec2 dsc;

float t1, t2, e;

```
t1 = dsa.x + dsb.x;
e = t1 - dsa.x;
t2 = ((dsb.x - e) + (dsa.x - (t1 - e))) + dsa.y + dsb.y;
dsc.x = t1 + t2;
dsc.y = t2 - (dsc.x - t1);
return dsc;
}
```

Multiplication is even more weird...

vec2 ds_mul (vec2 dsa, vec2 dsb)

{

vec2 dsc;

float c11, c21, c2, e, t1, t2;

float a1, a2, b1, b2, cona, conb, split = 8193.;

```
cona = dsa.x * split;
conb = dsb.x * split;
a1 = cona - (cona - dsa.x);
b1 = conb - (conb - dsb.x);
a2 = dsa.x - a1;
b2 = dsb.x - b1;
c11 = dsa.x * dsb.x;
c21 = a2 * b2 + (a2 * b1 + (a1 * b2 + (a1 * b1 - c11)));
c2 = dsa.x * dsb.y + dsa.y * dsb.x;
t1 = c11 + c2;
e = t1 - c11;
t2 = dsa.y * dsb.y + ((c2 - e) + (c11 - (t1 - e))) + c21;
dsc.x = t1 + t2;
dsc.y = t2 - (dsc.x - t1);
return dsc;
}
```

I have also improved the coloring method to smooth, continuous coloring as described in this post by Linas Vepstas or on wikipedia.

if(length(z) > radius)

{

return(float(n) + 1. - log(log(length(z)))/log(2.));

}

Don't be scared of all that logarithm stuff. Most modern GPU can handle these very well.

Starting with our block example above the emulation shows excellent details for zoom levels up to 42 before the precision of our emulated doubles is spent:

I get the following framerates in benchmark mode on my desktop ATI HD4870. They show that the emulation is round about 4 times slower than single precision mode. But still qualifies for realtime rendering.

Compared to single precision with a maximum resolution of

units in the complex plane per pixel the emulated doubles perform well up to

units per pixel.

Emulated double precision is a cool thing to do and works quite well on modern GPUs. Let's see if there can be done something to improve accuracy further...

updated version, see this post for details: GLSL_EmuMandel

]]>In my previous GLSL post I have shown how to draw a white quad. Since this is not the greatest visual experience, I will show something more interesting you can do with shaders in this post.

I have chosen the famous mandelbrot fractal. It's nice and colorful. Quite a fine eyecatcher.

Download the code: GLSL_SimpleMandel (windows version)

The code was tested on WinXP/ATI-GPU and Win7/NVidia-GPU

The final result looks like this:

Let's see what we need:

- user interface (short help, fancy settings textboxes)
- render area (the colorful thingy)
- fractal navigation (like zoom, drag, position info)
- mandelbrot algorithm/background information
- GLSL shader

The user interface can be set up with the Qt designer. There are some pitfalls when designing graphical UIs with Qt, namely:

- Make sure you set the "Horizontal Policy" of the QLineEdit-Objects to "Minimum". Otherwise they will break your layout.
- You need a QGridLayout to place a Widget. A QFrame is not sufficient.

Since we want to set the color for each pixel according to some obscure fractal formula (see below) we don't need fancy stuff like 3D projection, so our OpenGL init function is really short:

```
void QGLRenderThread::GLInit(void)
{
glClearColor(0.25f, 0.25f, 0.4f, 0.0f); // Background => dark blue
glDisable(GL_DEPTH_TEST);
glEnable(GL_TEXTURE_2D);
const GLubyte* pGPU = glGetString(GL_RENDERER);
const GLubyte* pVersion = glGetString(GL_VERSION);
const GLubyte* pShaderVersion = glGetString(GL_SHADING_LANGUAGE_VERSION);
qDebug() << "GPU: " << QString((char*)pGPU).trimmed();
qDebug() << "OpenGL: " << QString((char*)pVersion).trimmed();
qDebug() << "GLSL: " << QString((char*)pShaderVersion);
}
```

Set the background color, disable depth testing, enable textures and show some debug information about your current hardware configuration, OpenGL and GLSL version.

When resizing the window we use glOrtho instead of glPerspective to make our OpenGL coorinate space the same size as our window:

```
void QGLRenderThread::GLResize(int width, int height)
{
glViewport(0, 0, width, height);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho (0, w, 0, h, 0, 1);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
}
```

With this setting the lower left corner represents the origin of the OpenGL coordinate system.

To navigate in our mandelbrot set we use a left-mouse-button drag function.

The zoom function is combined with a point-and-zoom feature. This gives you the possibility to zoom into the area around the mouse cursor (like on google maps). Looks complicated and it is, since it took me a while to figure it out...

```
void QGLFrame::wheelEvent(QWheelEvent * event)
{
RenderThread.Zoom( (event->delta()>0), event->pos(), 2.0);
updateLabels(event->pos());
RenderThread.Redraw();
}
```

In the Thread:

```
void QGLRenderThread::Zoom(bool dir, const QPoint &pos, double zfact)
{
double c;
c = xpos+(pos.x()-w/2)/zoom;
xpos = (xpos+((dir)?1.0:-1.0)*(c-xpos)*((dir)?(1.0-1.0/zfact):(zfact-1.0)));
c = ypos-(pos.y()-h/2)/zoom;
ypos = (ypos+((dir)?1.0:-1.0)*(c-ypos)*((dir)?(1.0-1.0/zfact):(zfact-1.0)));
zoom*=(dir)?zfact:(1.0/zfact);
}
```

Information about the current position is displayed in editfields on the left. We employ a signal/slot mechanism for this purpose since the editfields in Qt can handle this pretty easily:

```
void QGLFrame::updateLabels(const QPoint &pos)
{
double x,y;
RenderThread.getMousePosition(x,y, pos);
emit showRePosition(QString("%L1").arg(x,0,'f',16));
emit showImPosition(QString("%L1").arg(y,0,'f',16));
RenderThread.getZoom(y);
emit showZoomValue(QString("%L1").arg(log2(y/128.),0,'f',2));
emit showIterationsValue(QString("%1").arg(RenderThread.getIterations()));
}
```

To render a mandelbrot fractal you have to evaluate the following equation for every pixel:

Where c is the corresponding point in the complex plane for this pixel. The color of this pixel is defined by the number of iterations after |z|>2.

This is the vertex shader:

```
#version 120
void main(void)
{
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}
```

The vertex shader only sets the texture coordinates of our window-sized textured rectangle where the lower left window corner represents the texture coordinates (0/0) and the upper right corner the texture coordinates (windowwidth/windowheight). This area represents the first quadrand of the cartesian coordinate system.

The mandelbrot calculation is performed in the following fragment shader program:

```
#version 120
uniform int iterations;
uniform float frame;
uniform float radius;
uniform float f_cx, f_cy;
uniform float f_sx, f_sy;
uniform float f_z;
int fmandel(void)
{
vec2 c = vec2(f_cx, f_cy) + gl_TexCoord[0].xy*f_z + vec2(f_sx,f_sy);
vec2 z=c;
for(int n=0; n<iterations; n++)
{
z = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
if((z.x*z.x + z.y*z.y) > radius) return n;
}
return 0;
}
void main()
{
int n = fmandel();
gl_FragColor = vec4((-cos(0.025*float(n))+1.0)/2.0,
(-cos(0.08*float(n))+1.0)/2.0,
(-cos(0.12*float(n))+1.0)/2.0,
1.0);
}
```

The function fmandel is used to compute the iterations needed for the aforementioned equation. The constant c representing the current pixel in the complex plane is derived by the texture coordinates, the center and the zoom factor (given by the uniform values).

We can use the vec2 data-type to simplify addition of complex numbers, but have to keep in mind that multiplicating complex numbers is something different.

The color of a pixel is derived from the number of iteration and some cosine function that gives black for n=0 and otherwise a colorful gradient.

The render thead is designed to run only when needed and is put to sleep when no updates of the fractal set are required. This can be done with the QWaitCondition and QMutex objects. It prevents the GPU from unnessessary load and leaves the CPU for other tasks/applications.

As the user zooms/drags or changes the window size the thread wakes up and initiates an update via the fractal shader.

To test ther performance of your hardware setup or if you want to play around with other shaders and compare the performances, you can enable the benchmark checkbox to evaluate the maximum framerate when the rendering is performed continuously.

To measure the exact framerateI use the QueryPerformanceCounter. Unfortunately this is only available under win and has some problems. See this blog for reasons not to use it. If anyone has an idea what to use instead please let me know.

Please also note that the framerate largely depends on the number of iterations, the center position you are looking at and of course the window size since the calculation is performed for *every* pixel in the window.

We have shown a simple mandelbrot implementation with Qt and OpenGL that shows great performance results and allows real time updates of the mandelbrot set even on mobile GPUs.

Unfortunately the zoom factor is limited by the floating point precision used to perform the calculation. Zoom factors of 18 or more lead to larger pixels not showing more details:

A method to improve accuracy is to emulate double precision on the GPU or even use double precision variables as it is possible on some graphics card.

This will be subject of the next part in our "heavy computing" series.

]]>The point of this tutorial is to show how Qt supports shader programs and how to use them in your application.

Download the source: Hello_GLSL

Qt already provides two classes to work with shaders. We just have to add some function to use them.

The first class we need is QGLShaderProgram. It provides a container for our shaders and comes with functions to link and use the shader programs while rendering OpenGL objects. The second class is QGLShader. This class supports loading and compiling various (vertex, geometric, fragment) shaders written win GLSL. To use these classes include

We extend the RenderThread with a function to load a vertex and fragment shaders given in two separate files. The filenames are passed as two QString parameters.

```
void QGLRenderThread::LoadShader(QString vshader, QString fshader)
{
if(ShaderProgram)
{
ShaderProgram->release();
ShaderProgram->removeAllShaders();
}
else ShaderProgram = new QGLShaderProgram;
if(VertexShader)
{
delete VertexShader;
VertexShader = NULL;
}
if(FragmentShader)
{
delete FragmentShader;
FragmentShader = NULL;
}
// load and compile vertex shader
QFileInfo vsh(vshader);
if(vsh.exists())
{
VertexShader = new QGLShader(QGLShader::Vertex);
if(VertexShader->compileSourceFile(vshader))
ShaderProgram->addShader(VertexShader);
else qWarning() << "Vertex Shader Error" << VertexShader->log();
}
else qWarning() << "Vertex Shader source file " << vshader << " not found.";
// load and compile fragment shader
QFileInfo fsh(fshader);
if(fsh.exists())
{
FragmentShader = new QGLShader(QGLShader::Fragment);
if(FragmentShader->compileSourceFile(fshader))
ShaderProgram->addShader(FragmentShader);
else qWarning() << "Fragment Shader Error" << FragmentShader->log();
}
else qWarning() << "Fragment Shader source file " << fshader << " not found.";
if(!ShaderProgram->link())
{
qWarning() << "Shader Program Linker Error" << ShaderProgram->log();
}
else ShaderProgram->bind();
}
```

After some cleanup in case there was already a shader program active, we load first the vertex, then the fragment shader program and compile them. When we are successful the shaders are linked to our ShaderProgram and thats it.

One good thing when using the Qt functions is that you get the error messages from the GLSL compiler for shader debugging purposes (Shader->log()).

After that we can load our basic shaders

void QGLRenderThread::run()

{

GLFrame->makeCurrent();

GLInit();

LoadShader("./Basic.vsh", "./Basic.fsh");

//...snip...

The sample shader programs are pretty easy (=dull). I won't go into details here.

Vertex Shader:

void main(void)

{

gl_Position = ftransform();

}

Fragment Shader:

```
void main(void)
{
gl_FragColor = vec4(1.);
}
```

We have shown how to include GLSL-Shaders with Qt and render a simple white quad. More to follow.

]]>In this first tutorial we will create a simple (I mean really simple...) Qt program to render OpenGL graphics (spinning quad). I will extend the programs you find in the Qt boxes demo and Qt OpenGL examples with a separate thread that does the actual redering to be independent (in terms of framerate) from any GUI interaction or other Qt events.

Sourcecode: Tut_01_OpenGL_Setup (Qt-Project files).

The result of this tutorial looks like this:

This tutorial was mainly inspired by the Qt newsletter Glimpsing the Third Dimension.

Make sure your have a working installation of Qt. I use 4.7.0 (Aug 2010) but the latest version should work as well.

- Qt Installation (qt.nokia.com/downloads)

I use the windows version of Qt so I will focus on that platform. In general the code should work on other platforms too. Let me know if there is a problem

First we need to subclass the QGLWidget to get the OpenGL functionality. Then we have to create a new thread based on QThread and tell the thread to do some rendering. Finally we embed our specially designed QGLWidget derivate into our main window.

Create a new "Qt Gui Application". When you're done your will see the following default project tree.

Since we want to use OpenGL we need to add the opengl module to our project. Open the Project file "Tut_01_OpenGL_Setup.pro" with a double click and add the "opengl" keyword to the 'QT +=' statement as shown below:

```
#-------------------------------------------------
#
# Project created by QtCreator 2011-04-25T21:36:17
#
#-------------------------------------------------
QT += core gui opengl
TARGET = Tut_01_OpenGL_Setup
TEMPLATE = app
SOURCES += main.cpp
mainwindow.cpp
myglframe.cpp
myrenderthread.cpp
HEADERS += mainwindow.h
myglframe.h
myrenderthread.h
FORMS += mainwindow.ui
```

Add a new class (right-click project root) and choose "C++ Class". Name the new class something like "MyGLFrame". Set the Base class property to "QGLWidget". Type information is inherited from QWidget. After that 2 new files (myglframe.h and myglframe.cpp) will appear in your project tree.

You need to add one more class named "MyRenderThread" based on QThread. Type information is inherited from QObject. Another two files (myrenderthread.h and myrenderthread.cpp) will appear in your project. The final project structure should look like this:

The header file contains the prototypes of the functions used by the thread. Note that the constructor is called with a pointer to the OpenGL context.

The "resizeViewport()" is used to tell the thread that the OpenGL frame has changed its size and the viewport has to be modified as well with a "GLResize()" call. "stop()" simply stops the rendering.

The thread's main function is "run()". At first it is nessessary to claim the OpenGL rendering context so the thread can do the actual rendering. The GLInit() function can is used to set all OpenGL-specific properties. In this example is't just the background color since the default settings of all other properties are fine for now.

The rendering loop is placed in a while-statement until the rendering is disabled by a call to "stop()" which sets the doRendering-variable to false. In this while-loop we check if a resize-event has occured and eventually change the projection matrix. After that the frame is cleared and the inentity matrix is loaded and our rendering function is called with "paintGL()". Make sure you swap the buffers to make the rendered contents visible.

To save some CPU/GPU power we add the following msleep() statement which does nothing but release the CPU for 16 milliseconds to do other stuff than render our dull quad. 16ms gives us about 60 FPS which is more than enough to get a smooth animation.

```
void MyRenderThread::run()
{
GLFrame->makeCurrent();
GLInit();
while (doRendering)
{
if (doResize)
{
GLResize(w, h);
doResize = false;
}
glClear(GL_COLOR_BUFFER_BIT ' GL_DEPTH_BUFFER_BIT);
glLoadIdentity();
paintGL(); // render actual frame
FrameCounter++;
GLFrame->swapBuffers();
msleep(16); // wait 16ms => about 60 FPS
}
}
```

This is the code to resize our OpenGL window accordingly. Check out Nehe's OpenGL tutorials for details.

void MyRenderThread::GLResize(int width, int height)

{

glViewport(0, 0, width, height);

glMatrixMode(GL_PROJECTION);

```
glLoadIdentity();
gluPerspective(45.,((GLfloat)width)/((GLfloat)height),0.1f,1000.0f);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
}
```

This is our simple render function. Is draws four colored (glColor3f) vertices (glVertex3f) and combines them to a quad (glBegin). The quad is moved somewhat away from the camera (glTranslatef) and rotated (glRotatef) 1 degree around the z-axis every frame. Thats it.

```
void MyRenderThread::paintGL(void)
{
glTranslatef(0.0f, 0.0f, -5.0f);
glRotatef(FrameCounter,0.0f,0.0f,1.0f);
glBegin(GL_QUADS);
glColor3f(1.,1.,0.); glVertex3f(-1.0, -1.0,0.0);
glColor3f(1.,1.,1.); glVertex3f(1.0, -1.0,0.0);
glColor3f(1.,0.,1.); glVertex3f(1.0, 1.0,0.0);
glColor3f(1.,0.,0.); glVertex3f(-1.0, 1.0,0.0);
glEnd();
}
```

To get an OpenGL rendering context we had to subclass the QGLWidget (myglframe). This class contains just our rendering thread and does the interaction with it.

Please make sure to define the destructor (~MyGLFrame) and also the paintEvent. Otherwise you will get linker errors and error messages during runtime.

The constructor initializes the RenderThread and disables the automatic buffer swapping (we do this in our thread, remember?).

```
MyGLFrame::MyGLFrame(QWidget *parent) :
QGLWidget(parent),
RenderThread(this)
{
setAutoBufferSwap(false);
}
```

To assure a clean shutdown of our thread, you must stop and wait for it before exit.

void MyGLFrame::stopRenderThread(void)

{

RenderThread.stop();

RenderThread.wait();

}

In the MainWindow we create a new instance of our MyGLFrame and assign it to the CentralWidget (setCentralWidget). After initializing the thread (initRenderThread) we are good to go.

MainWindow::MainWindow(QWidget *parent) :

QMainWindow(parent),

ui(new Ui::MainWindow)

{

ui->setupUi(this);

GLFrame = new MyGLFrame();

setCentralWidget(GLFrame);

GLFrame->initRenderThread();

}

```
MainWindow::~MainWindow()
{
delete GLFrame;
delete ui;
}
void MainWindow::closeEvent(QCloseEvent *evt)
{
GLFrame->stopRenderThread();
QMainWindow::closeEvent(evt);
}
```

We have implemented a simple OpenGL render thread with Qt. This framework can be extended as you like and will also be used in further tutorials. Hope you like it

]]>