docker run --rm -ti -p 8000:8000 -v $(pwd):/node-mapnik afrith/node-mapnik:latest

npm install -g mapnik

export NODE_PATH=/usr/local/lib/node_modules &&\

cd node-mapnik/ &&\

node vector.js

serve:

python3 -m http.server

convert:

`docker run --rm -ti -v $(pwd)/out:/data klokantech/tippecanoe tippecanoe --maximum-zoom=4 --output=/data/map.mbtiles /data/map.geojson`

serve:

`docker run --rm -ti -v $(pwd)/out:/data -p 8080:80 klokantech/tileserver-gl map.mbtiles`

I use my own postgis-container with the latest versions and some useful extensions.

Importing of geospatial data will be done with Imposm 3. It has some additional features over OSM2pgsql that might be useful:

- Generalizing/Simplification of data to create low-resolution data from existing tables that can be used for small zoom levels.
- regex-ing of content/names/whatever
- custom SQL-Filter

Import with Imposm requires a mapping-file where you can define what features are imported. You can also implement some advanced filtering mechanism and even have

The import script and additional files can be found here.

Import of a small sample pbf file (14MB) takes around 10s

]]>There are several tools available for serving vector tiles. Let's check the following:

You need a running postgis-instance and need to specify the table that is being used:

Starting T-Rex requires just a Docker container:

`docker run --rm -ti --net gis -p 6767:6767 sourcepole/t-rex serve --dbconn postgresql://postgres@postgis/vector --bind=0.0.0.0`

Visiting localhost:6767/# will show a simple user interface with the tables we imported:

You can even view the data on a map by clicking the different buttons in the top-right corner:

So far so good, BUT. Clicking on our buildings-table leads to a ton of errors in the t-rex console:

```
2018-12-30 14:08:00.410 ERROR Layer 'osm_buildings': Unknown geometry type GEOMETRY
2018-12-30 14:08:00.410 ERROR Row { statement: StatementInfo { name: "s18", param_types: [Type(Float8), Type(Float8), Type(Float8), Type(Float8)], columns: [Column { name: "geometry", type_: Type(Other(Other { name: "geometry", oid: 50960, kind: Simple, schema: "public" })) }, Column { name: "id", type_: Type(Int4) }, Column { name: "osm_id", type_: Type(Int8) }, Column { name: "name", type_: Type(Varchar) }, Column { name: "type", type_: Type(Varchar) }] } }
```

A hint why this is the case is shown right after startup of the t-rex container:

`2018-12-30 14:10:15.540 WARN Multiple geometry types in "import"."osm_buildings".geometry: MULTIPOLYGON, POLYGON`

It seems that T-Rex can't handle multiple geometry types. I was able to lessen the errors with a specific config file that uses ST_MAKEVALID(geometry) to fix some invalid geometries but was yet unable to mitigate the issue altogether.

Visiting the roads-table was even worse:

This was accompanied by database-restarts and complete freezing of my computer... Definitely NOT what we want.

Setting up Tegola is a bit more work as it does not provide any means to view the data it serves. Tegola itself needs a config-file with layer-definitions similar to T-Rex. I roughly followed the Tegola Mapbox Tutorial to setup a basic html-viewer for our map data.

You can find the required files here

After starting up the tegola-server with `serve.sh`

and loading the index.html page in a browser we get the following page:

Zooming in a bit more shows all buildings as defined in Tegola-config (min_zoom=13):

Tegola did not crash or show any other unexpected behavior during my tests. It has caching and simplification features and the queries can be defined in detail for each layer via SQL-Property.

When zooming out a lot (z=5), the time to generate the tiles increases dramatically as the whole database will be dumped into one vector tile, so great care must be taken to optimize the amount of data for each zoom level with generalized tables.

Tegola it is for now.

]]>As vector maps have been really successful (see maptiler example) in the last years I want to upgrade my own map and toolchain to fancy new vector tiles technology. This series of blog posts serves mainly to sort out my thoughts and hopefully make life easier for the next guy to try this.

This is what the map looks like at the moment:

As the process of presenting a vector tile map to an end-user requires a whole set of tools (aka Toolchain), I will try to cover all parts of the design- and implementation-process.

All project-files will be stored on github.com/henrythasler/vectortiles. Create an issue there, if you have any questions, hints or whatevers.

The following goals MUST be reached by the new toolchain/setup:

- A free source for spatial data (e.g. openstreetmap)
- Means of pre-processing and storing the spatial data (e.g. in a database)
- Content and quality of the resulting map is roughly the same as it is with the current toolchain.
- Use open source software whenever possible
- Dockerize the whole toolchain for portability
- support for high-DPI screens

The following items would be nice-to-have

- Existing style definitions (CartoCSS, Mapnik XML) can be reused/converted
- Map design process is supported by some GUI/Editor
- Toolchain can also generate raster tiles for offline use with low-end devices (smartphone).

During my initial research I found these pages on various related topics helpful (random order):

- How to make mvt with PostGIS by Parlons Tech
- Vector tiles, PostGIS and OpenLayers by Giovanni Allegri
- awesome-vector-tiles by mapbox
- OpenMapTiles.org
- Using the new MVT function in PostGIS by Chris Whong
- Vector Tiles - Introduction & Usage with QGIS by Pirmin and Kalberer
- MVT generation: Mapnik vs PostGIS by Rafa de la Torre

We have set some preliminary goals for this whole project. Some my be revised as the project proceeds. The next post will present some results of the initial research regarding tools and methods.

]]>This post will cover various sources for geospatial data and storage of that data.

There are two main group of providers for geospatial data:

Almost all of them provide services where you can create your own map styles and share them via their website/toolchain. There are various plans to cover the costs according to your needs/traffic. You can also add your own data-sets (like pets-per-squarekilometer) to show on top of various basemaps. This might be useful if you are just looking to provide some specific content on a basemap.

All services provide styling capabilities based on a pre-defined subset of map features (e.g. I haven't found playground-POIs on mapbox). You can upload additional data sources but that still involves pre-processing of raw data.

As mentioned before, you can use the map service providers to host your geospatial data. This requires specific pre-processing locally/cloud.

The following methods of storing data are available right now:

- cloud-based
- format depends on map service provider

- local
- raw-format (e.g. as PBF)
- database
- (...)

- Use up-to-date geospatial data for limited geographic areas (e.g. Europe) provided by Geofabrik.
- Use planet-wide data (e.g. administrative boundaries) from OpenStreetMapData
- Use global data from planet.osm to add some more planet-wide data if needed.

- Geospatial data will be stored in a postgres/postgis database.
- Raster data will be stored locally on disk

for Ubuntu 13.10:

```
sudo add-apt-repository ppa:henrythasler/pwnsensor
sudo apt-get update
apt-get install pwnsensor
```

pwnsensor can then be found under the *system* menu.

Sourcecode and additional information is available on github

]]>Here's how it goes:

```
> cd /usr/src
> wget https://www.openssl.org/source/openssl-1.0.1e.tar.gz
> tar -xvzf openssl-1.0.1e.tar.gz
> cd openssl-1.0.1e/
> ./config no-threads shared --prefix=/usr
> make depend
> make
> make test
> make install
```

If you get any errors during the "make depend" step telling something about gcc missing, use yast2 to install gcc and dependencies.

When its finished (compiling and testing takes several minutes), check the correct installation with

```
> openssl version
OpenSSL 1.0.1e 11 Feb 2013
```

Now you can update your apache config. Read this one also. Make sure you restart apache.

Now let's update the OpenSSH server (sshd) as well. It uses the latest OpenSSL libs we have just created. You may also need the "zlib-devel" and "tcpd-devel" package. Use yast2 to install these before you begin. With tcp-wrappers support you can use DenyHosts with your OpenSSH server to prevent brute force attacks.

```
> cd /usr/src
> wget http://ftp.spline.de/pub/OpenBSD/OpenSSH/portable/openssh-6.2p2.tar.gz
> tar xzvf openssh-6.2p2.tar.gz
> cd openssh-6.2p2/
> ./configure --prefix=/usr --sysconfdir=/etc/ssh --with-tcp-wrappers
> make
```

I recommend you rename your /etc/ssh directory to something else an let the new ssh version create a new one with all new settings an keys. You can modify the new config file afterwards.

```
> mv /etc/ssh /etc/ssh.old
> make install
```

You can check the current version with

```
> sshd -?
OpenSSH_6.2p2, OpenSSL 1.0.1e 11 Feb 2013
usage: sshd [-46DdeiqTt] [-b bits] [-C connection_spec] [-c host_cert_file]
[-f config_file] [-g login_grace_time] [-h host_key_file]
[-k key_gen_time] [-o option] [-p port] [-u len]
```

]]>I use Maperitive to create my own map, optimized for hiking/biking. Right now, Maperitive uses SRTM-1 and SRTM-3 DEM data (HGT-Files). A decent description of SRTM-files can be found at vterrain.org.

The Austrian Government (Land Tirol) has published high-resolution DEM data (TIRIS) derived from airborne laser scans (LIDAR) over their territory. This TIRIS-data has a resolution of 10m, which is 3-4 times more accurate than other existing SRTM-1 DEM data. The most accurate SRTM-1 data (I have found) for Tyrol is available on Jonathan de Ferrantis website Viewfinder Panoramas.

Here is a comparison (note the detailed rendering of the roads/rivers on the right image):

The goal of this article is to use this data as source for all DEM-related operations like hill-shading or contour lines with Maperitive.

First of all, we need the DEM data from Tyrol. It can be obtained from this page. Download and unzip into one folder (skip "Bezirk Innsbruck" it's a subset of "Bezirk Innsbruck-Land"). There should be 8 files named "*_10m_float.asc".

The TIRIS data format is Arcinfo ASCII-Grid which is more or less a plain text file with height information (details later). Maperitive cannot use it. We need to convert this data to an SRTM-1 HGT file. That's why we need to set up a working GDAL environment. Visit the GDAL-website and follow their installation instructions for your environment. Make sure you also install the *proj*-package (including development-files) in a linux environment. We will need the gdal_translate and gdalwarp tools.

To merge the TIRIS data with an existing SRTM1-tile please grab a copy at the viewfinder website. In this example I will use the file N47E011.hgt tile. It covers the region around Innsbruck.

I use an OpenSuse installation within a VirtualBox for the GDAL-Toolkit. You can get an OpenSuse image here. You may also need a shared folder in OpenSuse for data exchange to the host computer. If you have problems setting up your environment, let me know.

That was the most boring part. Now lets have a look at the DEM data itself.

Here are some key facts about the DEM data the Austrian government of Tyrol provides:

- Data format: Arcinfo ASCII-Grid
- Datatype: Float32
- Resolution: 10m x 10m.
- Projection: Transverse Mercator (EPSG 31254).

See "Informationsblatt.txt" provided with every zip-file for more details.

To get an idea how impressive (meaning accurate) the TIRIS data is, you should have a look at it first (Scale: 1 pixel=10m):

To visualize the raw DEM-data I recommend GlobalMapper. The unregistered ("free") version can load and display most DEM formats including the Arc ASCII Grid. However, the "free" version cannot save any data. But that's no problem, since we have GDAL at hand.

You can also give VTBuilder a try. It has similar visualization functionality and is also able to save data. However, I was not able to convert the TIRIS data into SRTM format without offset errors in the resulting SRTM1-HGT file.

This is the complete shell script to convert the TIRIS DEM data into STRM1-HGT:

```
# create backup of original SRTM1-HGT file
cp N47E011.hgt N47E011.old
# convert original STRM1 to TIFF
gdal_translate N47E011.hgt N47E011.tif
# convert and merge all TIRIS data to one TIFF
gdalwarp -s_srs EPSG:31254 -t_srs EPSG:4326 -srcnodata -9999 -r bilinear
-overwrite -te 11.0 47.0 12.0 48.0 -ts 3601 3601 -order 3 -et 0.0
-wt Float32 -wo SAMPLE_STEPS=100 -dstnodata none *10m_float.asc N47E011_tiris.tif
# merge TIRIS-TIFF into old SRTM1 data
gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -r bilinear -order 3 -et 0.0
-wt Float32 -wo SAMPLE_STEPS=100 N47E011_tiris.tif N47E011.tif
# convert result back to SRTM1 (overwrite original file)
gdal_translate -of SRTMHGT N47E011.tif N47E011.hgt
```

Let's get step by step through the script:

cp N47E011.hgt N47E011.old

Creates (copy) a backup of the original SRTM1 data. In case you want to use the original file later for another purpose.

gdal_translate N47E011.hgt N47E011.tif

Use gdal_translate to convert the original SRTM1-HGT data into GeoTIFF format. We need this to merge the new data derived from the TIRIS DEM into the existing SRTM data.

gdalwarp -s_srs EPSG:31254 -t_srs EPSG:4326 -srcnodata -9999 -dstnodata none

-overwrite -te 11.0 47.0 12.0 48.0 -ts 3601 3601

-r bilinear -order 3 -et 0.0 -wt Float32 -wo SAMPLE_STEPS=100

*10m_float.asc N47E011_tiris.tif

With this mighty command, you grab every TIRIS file (*10m_float.asc) and convert in into a GeoTIFF. During this process, we define the source and target projection, tell the converter to skip non existent data represented as -9999m above sea level in our source file. In case there is already an existing GeoTIFF, we will overwrite it. It also needs to know what SRTM-tile we are going to be in (lower-left-upper-right window) and what the target resolution we want. Note that the resolution of SRTM1 data is 3601x3601. We can't do better here. That is also the reason why we need to rescale the input data with a bilinear 3rd order filter. The red parts should set reprojection quality to high but I'm not sure they do anything at all.

Ok, here comes the tricky part:

gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -r bilinear -order 3 -et 0.0

-wt Float32 -wo SAMPLE_STEPS=100 N47E011_tiris.tif N47E011.tif

It's almost the same as the command before except it isn't. Note that the -overwrite option is not set. This command merges the current SRTM1-file (the tif we created with the second command from a hgt-file) with our converted TIRIS-data. The new (=better) data is copied into the old file where available. Other parts remain the same as in the original file:

The last command converts the tif back to a SRTM1-hgt file:

gdal_translate -of SRTMHGT N47E011.tif N47E011.hgt

That's it. We have our updated hgt-file based on high-precision LIDAR-DEM data.

The above script is just an example an depends on hard-coded coordinates and tiles. Therefore I have to create a more complex script that covers all of the affected tiles for Tyrol (coming soon).

You can download the generated HGT file for Innsbruck area here: N47E011.zip

]]>Today I'm going to introduce emulated quadruple floating point precision (quad-single-precision) with GLSL. I will use the well-known mandelbrot set to demonstrate the concept. The sourcecode is available as usual.

The last posts use double precision (hardware and emulated) to calculate mandelbrot sets down to [insert] units per pixel in the complex plane. To push that limit even further I will emulate quadruple precision (quad-precision) using four single precision variables (quad-single). The original concept and the Fortran/C++ sourcecode was developed by Yozo Hi, Xiaoye S. Li and David H. Bailey at Berkeley. It is available as the QD library. I just need to convert and modify this code for GLSL...

The concept is basically the same I have described in on of my previous posts about double-single emulation. For quad-precision you just add two more floats to represent your actual value. On the contrary, arithmetic operations are much more difficult to perform, because you must take care of all the carry-over stuff. This results in a lot of difficult and expensive functions to perform just a simple addition. The technical details are described in this paper (PDF).

The Qt application features only minor changes. To improve precision on this end I use long doubles for all variables that eventually end up in the shader.

To make things easier in the shader it gets two vec2 elements instead of four floats with the glUniform2fv-function. That's why we need a handle for this function:

```
// define prototype
typedef void (APIENTRYP PFNGLUNIFORM2FVPROC) (GLint location, GLsizei count, const GLfloat *value);
PFNGLUNIFORM2FVPROC glUniform2fv;
// get handle from OpenGL-context
glUniform2fv = (PFNGLUNIFORM2FVPROC) GLFrame->context()->getProcAddress("glUniform2fv");
```

It is used as follows:

```
float vec2[2];
vec2[0] = (float)xpos;
vec2[1] = xpos - (double)vec2[0];
glUniform2fv(glGetUniformLocation(ShaderProgram->programId(), "qs_cx"), 2, vec2);
```

I guess the shader has gotten somewhat complex. Nevertheless you can have a look at it and maybe adapt or extend it to your own purpose. The variable names are the same as in the original c++ code. The comments behind the lines are the original code (useful for debugging/comparing).

```
#version 120
// emulated quadruple precision GLSL library
// created by Henry thasler (thasler.org/blog)
// based on the QD library (http://crd-legacy.lbl.gov/~dhbailey/mpdist/)
uniform int iterations;
uniform float frame;
uniform float radius;
uniform vec2 qs_z;
uniform vec2 qs_w;
uniform vec2 qs_h;
uniform vec2 qs_cx;
uniform vec2 qs_cy;
// inline double quick_two_sum(double a, double b, double &err)
vec2 quick_2sum(float a, float b)
{
float s = a + b; // double s = a + b;
return vec2(s, b-(s-a)); // err = b - (s - a);
}
/* Computes fl(a+b) and err(a+b). */
// inline double two_sum(double a, double b, double &err)
vec2 two_sum(float a, float b)
{
float v,s,e;
s = a+b; // double s = a + b;
v = s-a; // double bb = s - a;
e = (a-(s-v))+(b-v); // err = (a - (s - bb)) + (b - bb);
return vec2(s,e);
}
vec2 split(float a)
{
float t, hi;
t = 8193. * a;
hi = t - (t-a);
return vec2(hi, a-hi);
}
vec3 three_sum(float a, float b, float c)
{
vec2 tmp;
vec3 res;// = vec3(0.);
float t1, t2, t3;
tmp = two_sum(a, b); // t1 = qd::two_sum(a, b, t2);
t1 = tmp.x;
t2 = tmp.y;
tmp = two_sum(c, t1); // a = qd::two_sum(c, t1, t3);
res.x = tmp.x;
t3 = tmp.y;
tmp = two_sum(t2, t3); // b = qd::two_sum(t2, t3, c);
res.y = tmp.x;
res.z = tmp.y;
return res;
}
//inline void three_sum2(double &a, double &b, double &c)
vec3 three_sum2(float a, float b, float c)
{
vec2 tmp;
vec3 res;// = vec3(0.);
float t1, t2, t3; // double t1, t2, t3;
tmp = two_sum(a, b); // t1 = qd::two_sum(a, b, t2);
t1 = tmp.x;
t2 = tmp.y;
tmp = two_sum(c, t1); // a = qd::two_sum(c, t1, t3);
res.x = tmp.x;
t3 = tmp.y;
res.y = t2 + t3; // b = t2 + t3;
return res;
}
vec2 two_prod(float a, float b)
{
float p, e;
vec2 va, vb;
p=a*b;
va = split(a);
vb = split(b);
e = ((va.x*vb.x-p) + va.x*vb.y + va.y*vb.x) + va.y*vb.y;
return vec2(p, e);
}
vec4 renorm(float c0, float c1, float c2, float c3, float c4)
{
float s0, s1, s2 = 0.0, s3 = 0.0;
vec2 tmp;
// if (QD_ISINF(c0)) return;
tmp = quick_2sum(c3,c4); // s0 = qd::quick_two_sum(c3, c4, c4);
s0 = tmp.x;
c4 = tmp.y;
tmp = quick_2sum(c2,s0); // s0 = qd::quick_two_sum(c2, s0, c3);
s0 = tmp.x;
c3 = tmp.y;
tmp = quick_2sum(c1,s0); // s0 = qd::quick_two_sum(c1, s0, c2);
s0 = tmp.x;
c2 = tmp.y;
tmp = quick_2sum(c0,s0); // c0 = qd::quick_two_sum(c0, s0, c1);
c0 = tmp.x;
c1 = tmp.y;
s0 = c0;
s1 = c1;
tmp = quick_2sum(c0,c1); // s0 = qd::quick_two_sum(c0, c1, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0) {
tmp = quick_2sum(s1,c2); // s1 = qd::quick_two_sum(s1, c2, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0) {
tmp = quick_2sum(s2,c3); // s2 = qd::quick_two_sum(s2, c3, s3);
s2 = tmp.x;
s3 = tmp.y;
if (s3 != 0.0)
s3 += c4;
else
s2 += c4;
} else {
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0){
tmp = quick_2sum(s2,c4); // s2 = qd::quick_two_sum(s2, c4, s3);
s2 = tmp.x;
s3 = tmp.y;}
else{
tmp = quick_2sum(s1,c4); // s1 = qd::quick_two_sum(s1, c4, s2);
s1 = tmp.x;
s2 = tmp.y;}
}
} else {
tmp = quick_2sum(s0,c2); // s0 = qd::quick_two_sum(s0, c2, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0) {
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0){
tmp = quick_2sum(s2,c4); // s2 = qd::quick_two_sum(s2, c4, s3);
s2 = tmp.x;
s3 = tmp.y;}
else{
tmp = quick_2sum(s1,c4); // s1 = qd::quick_two_sum(s1, c4, s2);
s1 = tmp.x;
s2 = tmp.y;}
} else {
tmp = quick_2sum(s0,c3); // s0 = qd::quick_two_sum(s0, c3, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0){
tmp = quick_2sum(s1,c4); // s1 = qd::quick_two_sum(s1, c4, s2);
s1 = tmp.x;
s2 = tmp.y;}
else{
tmp = quick_2sum(s0,c4); // s0 = qd::quick_two_sum(s0, c4, s1);
s0 = tmp.x;
s1 = tmp.y;}
}
}
return vec4(s0, s1, s2, s3);
}
vec4 renorm4(float c0, float c1, float c2, float c3)
{
float s0, s1, s2 = 0.0, s3 = 0.0;
vec2 tmp;
// if (QD_ISINF(c0)) return;
tmp = quick_2sum(c2,c3); // s0 = qd::quick_two_sum(c2, c3, c3);
s0 = tmp.x;
c3 = tmp.y;
tmp = quick_2sum(c1,s0); // s0 = qd::quick_two_sum(c1, s0, c2);
s0 = tmp.x;
c2 = tmp.y;
tmp = quick_2sum(c0,s0); // c0 = qd::quick_two_sum(c0, s0, c1);
c0 = tmp.x;
c1 = tmp.y;
s0 = c0;
s1 = c1;
if (s1 != 0.0) {
tmp = quick_2sum(s1,c2); // s1 = qd::quick_two_sum(s1, c2, s2);
s1 = tmp.x;
s2 = tmp.y;
if (s2 != 0.0){
tmp = quick_2sum(s2,c3); // s2 = qd::quick_two_sum(s2, c3, s3);
s2 = tmp.x;
s3 = tmp.y;}
else{
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;}
} else {
tmp = quick_2sum(s0,c2); // s0 = qd::quick_two_sum(s0, c2, s1);
s0 = tmp.x;
s1 = tmp.y;
if (s1 != 0.0){
tmp = quick_2sum(s1,c3); // s1 = qd::quick_two_sum(s1, c3, s2);
s1 = tmp.x;
s2 = tmp.y;}
else{
tmp = quick_2sum(s0,c3); // s0 = qd::quick_two_sum(s0, c3, s1);
s0 = tmp.x;
s1 = tmp.y;}
}
return vec4(s0, s1, s2, s3);
}
vec3 quick_three_accum(float a, float b, float c)
{
vec2 tmp;
float s;
bool za, zb;
tmp = two_sum(b, c); // s = qd::two_sum(b, c, b);
s = tmp.x;
b = tmp.y;
tmp = two_sum(a, s); // s = qd::two_sum(a, s, a);
s = tmp.x;
a = tmp.y;
za = (a != 0.0);
zb = (b != 0.0);
if (za && zb)
return vec3(a,b,s);
if (!zb) {
b = a;
a = s;
} else {
a = s;
}
return vec3(a,b,0.);
}
// inline qd_real qd_real::ieee_add(const qd_real &a, const qd_real &b)
vec4 qs_ieee_add(vec4 _a, vec4 _b)
{
vec2 tmp=vec2(0.);
vec3 tmp3=vec3(0.);
int i, j, k;
float s, t;
float u, v; // double-length accumulator
float x[4] = float[4](0.0, 0.0, 0.0, 0.0);
float a[4], b[4];
a[0] = _a.x;
a[1] = _a.y;
a[2] = _a.z;
a[3] = _a.w;
b[0] = _b.x;
b[1] = _b.y;
b[2] = _b.z;
b[3] = _b.w;
i = j = k = 0;
if (abs(a[i]) > abs(b[j]))
u = a[i++];
else
u = b[j++];
if (abs(a[i]) > abs(b[j]))
v = a[i++];
else
v = b[j++];
tmp = quick_2sum(u,v); // u = qd::quick_two_sum(u, v, v);
u = tmp.x;
v = tmp.y;
while (k < 4) {
if (i >= 4 && j >= 4) {
x[k] = u;
if (k < 3)
x[++k] = v;
break;
}
if (i >= 4)
t = b[j++];
else if (j >= 4)
t = a[i++];
else if (abs(a[i]) > abs(b[j])) {
t = a[i++];
} else
t = b[j++];
tmp3 = quick_three_accum(u,v,t) ; // s = qd::quick_three_accum(u, v, t);
u = tmp3.x;
v = tmp3.y;
s = tmp3.z;
if (s != 0.0) {
x[k++] = s;
}
}
// add the rest.
for (k = i; k < 4; k++)
x[3] += a[k];
for (k = j; k < 4; k++)
x[3] += b[k];
// qd::renorm(x[0], x[1], x[2], x[3]);
// return qd_real(x[0], x[1], x[2], x[3]);
return renorm4(x[0], x[1], x[2], x[3]);
}
// inline qd_real qd_real::sloppy_add(const qd_real &a, const qd_real &b)
vec4 qs_sloppy_add(vec4 a, vec4 b)
{
float s0, s1, s2, s3;
float t0, t1, t2, t3;
float v0, v1, v2, v3;
float u0, u1, u2, u3;
float w0, w1, w2, w3;
vec2 tmp;
vec3 tmp3;
s0 = a.x + b.x; // s0 = a[0] + b[0];
s1 = a.y + b.y; // s1 = a[1] + b[1];
s2 = a.z + b.z; // s2 = a[2] + b[2];
s3 = a.w + b.w; // s3 = a[3] + b[3];
v0 = s0 - a.x; // v0 = s0 - a[0];
v1 = s1 - a.y; // v1 = s1 - a[1];
v2 = s2 - a.z; // v2 = s2 - a[2];
v3 = s3 - a.w; // v3 = s3 - a[3];
u0 = s0 - v0;
u1 = s1 - v1;
u2 = s2 - v2;
u3 = s3 - v3;
w0 = a.x - u0; // w0 = a[0] - u0;
w1 = a.y - u1; // w1 = a[1] - u1;
w2 = a.z - u2; // w2 = a[2] - u2;
w3 = a.w - u3; // w3 = a[3] - u3;
u0 = b.x - v0; // u0 = b[0] - v0;
u1 = b.y - v1; // u1 = b[1] - v1;
u2 = b.z - v2; // u2 = b[2] - v2;
u3 = b.w - v3; // u3 = b[3] - v3;
t0 = w0 + u0;
t1 = w1 + u1;
t2 = w2 + u2;
t3 = w3 + u3;
tmp = two_sum(s1, t0); // s1 = qd::two_sum(s1, t0, t0);
s1 = tmp.x;
t0 = tmp.y;
tmp3 = three_sum(s2, t0, t1); // qd::three_sum(s2, t0, t1);
s2 = tmp3.x;
t0 = tmp3.y;
t1 = tmp3.z;
tmp3 = three_sum2(s3, t0, t2); // qd::three_sum2(s3, t0, t2);
s3 = tmp3.x;
t0 = tmp3.y;
t2 = tmp3.z;
t0 = t0 + t1 + t3;
// qd::renorm(s0, s1, s2, s3, t0);
return renorm(s0, s1, s2, s3, t0); // return qd_real(s0, s1, s2, s3);
}
vec4 qs_add(vec4 _a, vec4 _b)
{
return qs_sloppy_add(_a, _b);
// return qs_ieee_add(_a, _b);
}
vec4 qs_mul(vec4 a, vec4 b)
{
float p0, p1, p2, p3, p4, p5;
float q0, q1, q2, q3, q4, q5;
float t0, t1;
float s0, s1, s2;
vec2 tmp;
vec3 tmp3;
tmp = two_prod(a.x, b.x); // p0 = qd::two_prod(a[0], b[0], q0);
p0 = tmp.x;
q0 = tmp.y;
tmp = two_prod(a.x, b.y); // p1 = qd::two_prod(a[0], b[1], q1);
p1 = tmp.x;
q1 = tmp.y;
tmp = two_prod(a.y, b.x); // p2 = qd::two_prod(a[1], b[0], q2);
p2 = tmp.x;
q2 = tmp.y;
tmp = two_prod(a.x, b.z); // p3 = qd::two_prod(a[0], b[2], q3);
p3 = tmp.x;
q3 = tmp.y;
tmp = two_prod(a.y, b.y); // p4 = qd::two_prod(a[1], b[1], q4);
p4 = tmp.x;
q4 = tmp.y;
tmp = two_prod(a.z, b.x); // p5 = qd::two_prod(a[2], b[0], q5);
p5 = tmp.x;
q5 = tmp.y;
/* Start Accumulation */
tmp3 = three_sum(p1, p2, q0); // qd::three_sum(p1, p2, q0);
p1 = tmp3.x;
p2 = tmp3.y;
q0 = tmp3.z;
/* Six-Three Sum of p2, q1, q2, p3, p4, p5. */
tmp3 = three_sum(p2, q1, q2); // qd::three_sum(p2, q1, q2);
p2 = tmp3.x;
q1 = tmp3.y;
q2 = tmp3.z;
tmp3 = three_sum(p3, p4, p5); // qd::three_sum(p3, p4, p5);
p3 = tmp3.x;
p4 = tmp3.y;
p5 = tmp3.z;
/* compute (s0, s1, s2) = (p2, q1, q2) + (p3, p4, p5). */
tmp = two_sum(p2, p3); // s0 = qd::two_sum(p2, p3, t0);
s0 = tmp.x;
t0 = tmp.y;
tmp = two_sum(q1, p4); // s1 = qd::two_sum(q1, p4, t1);
s1 = tmp.x;
t1 = tmp.y;
s2 = q2 + p5;
tmp = two_sum(s1, t0); // s1 = qd::two_sum(s1, t0, t0);
s1 = tmp.x;
t0 = tmp.y;
s2 += (t0 + t1);
/* O(eps^3) order terms */
s1 += a.x*b.w + a.y*b.z + a.z*b.y + a.w*b.x + q0 + q3 + q4 + q5;
return renorm(p0, p1, s0, s1, s2); // qd::renorm(p0, p1, s0, s1, s2);
}
float ds_compare(vec2 dsa, vec2 dsb)
{
if (dsa.x < dsb.x) return -1.;
else if (dsa.x == dsb.x)
{
if (dsa.y < dsb.y) return -1.;
else if (dsa.y == dsb.y) return 0.;
else return 1.;
}
else return 1.;
}
float qs_compare(vec4 qsa, vec4 qsb)
{
if(ds_compare(qsa.xy, qsb.xy)<0.) return -1.; // if (dsa.x < dsb.x) return -1.;
else if (ds_compare(qsa.xy, qsb.xy) == 0.) // else if (dsa.x == dsb.x)
{
if(ds_compare(qsa.zw, qsb.zw)<0.) return -1.; // if (dsa.y < dsb.y) return -1.;
else if (ds_compare(qsa.zw, qsb.zw) == 0.) return 0.;// else if (dsa.y == dsb.y) return 0.;
else return 1.;
}
else return 1.;
}
float qs_mandel(void)
{
vec4 qs_tx = vec4(gl_TexCoord[0].x, vec3(0.)); // get position of current pixel
vec4 qs_ty = vec4(gl_TexCoord[0].y, vec3(0.));
// initialize complex variable with respect to current position, zoom, ...
vec4 cx = qs_add(qs_add(vec4(qs_cx,0.,0.),qs_mul(qs_tx,vec4(qs_z,0.,0.))),vec4(qs_w,0.,0.));
vec4 cy = qs_add(qs_add(vec4(qs_cy,0.,0.),qs_mul(qs_ty,vec4(qs_z,0.,0.))),vec4(qs_h,0.,0.));
vec4 tmp;
vec4 zx = cx;
vec4 zy = cy;
vec4 two = vec4(2.0, vec3(0.));
vec4 e_radius = vec4(radius*radius, vec3(0.)); // no sqrt available so compare with radius^2 = 2^2 = 2*2 = 4
for(int n=0; n<iterations; n++)
{
tmp = zx;
zx = qs_add(qs_add(qs_mul(zx, zx), -qs_mul(zy, zy)), cx);
zy = qs_add(qs_mul(qs_mul(zy, tmp), two), cy);
if( qs_compare(qs_add(qs_mul(zx, zx), qs_mul(zy, zy)), e_radius)>0.)
{
return(float(n) + 1. - log(log(length(vec2(zx.x, zy.x))))/log(2.)); // http://linas.org/art-gallery/escape/escape.html
}
}
return 0.;
}
void main()
{
float n = qs_mandel();
gl_FragColor = vec4((-cos(0.025*n)+1.0)/2.0,
(-cos(0.08*n)+1.0)/2.0,
(-cos(0.12*n)+1.0)/2.0,
1.0);
}
```

Please note that there are two methods in the qs_add-function to add two quad-singles: "sloppy_add", which is faster and less accurate and "ieee_add" (nice and slow). You can use either of them.

It actually works! New worlds of our mandelbrot lay ahead. Undiscovered features can be made visible with just a blink of our GPU eye.

Quad-Single works fine but is really slooooooooow... See for yourself:

CPU: Intel i5-2400 @ 3.1 GHz

GPU: ATI HD4870

Compared with emulated double precision (51 FPS) and hardware accelerated double precision (154 FPS) the emulated quad-precision (6 FPS) is taking it's time.

As you may have noticed if you actually tried this demo, zooming and scrolling beyond zoom levels of 48 is a bit inaccurate. This is due to the limited precision of the variables that the main (Qt) program hands over to the shader. It uses double precision (more precisely: emulated double) and is limited to a minimal step width that is well above the quad single precision of the shader. This is an issue I'm going to solve in another post (hopefully...).

Quadruple precision is - in terms of computational resources - very expensive to implement. It is suitable for real-time applications if you stick with simple calculations.

Eric Bainville has written some code for fp128 (128-bit fixed point numbers). Maybe I can try this with GLSL in the future and see how it performs.

The reduced computing precision (only long doubles are generally available in Qt) on the Qt side is currently limiting the possibilities of the shader performance. An equal or higher precision as in the shader is required to explore the full depth of the emulated quadruple precision in GLSL. Maybe the quad-single concept can be extended to quad-double.

Download: GLSL_QuadSingleMandel

]]>I reckon that it must be some weird optimization on NVIDIA cards that break the emulation code. So I searched for a way to disable these optimizations. After some googling I found an old blog entry by Cyril Crassin (check out his new blog if you are into 3D graphics) which helped me out with this problem.

To turn off the NVIDIA optimizations I had to add the following lines to the shader code.

`#pragma optionNV(fastprecision off)`

Now it should work fine with NVIDIA GPUs. Let me know if there are still some issues.

Download updated version: GLSL_EmuMandel

I have also found a handy tool called NVEmulate to examine the GLSL compiler output and other stuff to analyze GLSL assembly on NVIDIA GPUs.

Jethro provided the division function:

vec2 ds_div(vec2 dsa, vec2 dsb)

{

vec2 dsc;

float c11, c21, c2, e, s1, s2, t1, t2, t11, t12, t21, t22;

float a1, a2, b1, b2, cona, conb, split = 8193.;

```
s1 = dsa.x / dsb.x;
cona = s1 * split;
conb = dsb.x * split;
a1 = cona – (cona – s1);
b1 = conb – (conb – dsb.x);
a2 = s1 – a1;
b2 = dsb.x – b1;
c11 = s1 * dsb.x;
c21 = (((a1 * b1 – c11) + a1 * b2) + a2 * b1) + a2 * b2;
c2 = s1 * dsb.y;
t1 = c11 + c2;
e = t1 – c11;
t2 = ((c2 – e) + (c11 – (t1 – e))) + c21;
t12 = t1 + t2;
t22 = t2 – (t12 – t1);
t11 = dsa[0] – t12;
e = t11 – dsa[0];
t21 = ((-t12 – e) + (dsa.x – (t11 – e))) + dsa.y – t22;
s2 = (t11 + t21) / dsb.x;
dsc.x = s1 + s2;
dsc.y = s2 – (dsc.x – s1);
return dsc;
}
```

]]>