Generating a very large array of objects freezes program - opengl

when I generate a huge array of 13k+ tiles that I want to render as textures onto the screen it crashes and I have no idea why, this is the method that is causing the issue
public ArrayList<Tile> getNewChunk(int width, int height)
{
int amountOfTilesY = height * 32;
int amountOfTilesX = width * 32;
int amountOfTiles = (amountOfTilesX + amountOfTilesY) / 32;
ArrayList<Tile> tiles = new ArrayList<Tile>(amountOfTiles);
for(int i = 0; i < amountOfTilesX; i += 32)
{
for(int j = 0; j < amountOfTilesY; j += 32)
{
Tile tempTile = new DirtTile(i, j, "res/tile/DirtTile.png");
tiles.add(tempTile);
}
}
return tiles;
}
So if ou could please help :D
the engine I am using to render the game with is lwjgl 2, using opengl
I can provide more code if needed

Based on what you provided, the issue could come from multiple things. #javec is correct in that loading the texture once per object would drastically reduce performance. You should be loading the single unit and distributing its single unit ID throughout the list.
Also, it is unclear based on what you provided whether or not you are generating buffers for each tile separately here. If so, you should be generating a single buffer and sharing it in the same way.
Depending on what you plan on using these tiles for, you may consider a different method. If the tiles are connected as a terrain, you can always scale your mesh appropriately and tile your texture using something like
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);

Related

OpenGL 4.1 internally stores my texture incorrectly, causing a garbled image

I'm trying to load a 2bpp image format into OpenGL textures. The format is just a bunch of indexed-color pixels, 4 pixels fit into one byte since it's 2 bits per pixel.
My current code works fine in all cases except if the image's width is not divisible by 4. I'm not sure if this has something to do with the data being 2bpp, as it's converted to a pixel unsigned byte array (GLubyte raw[4096]) anyway.
16x16? Displays fine.
16x18? Displays fine.
18x16? Garbled mess.
22x16? Garbled mess.
etc.
Here is what I mean by works VS. garbled mess (resized to 3x):
Here is my code:
GLubyte raw[4096];
std::ifstream bin(file, std::ios::ate | std::ios::binary | std::ios::in);
unsigned short size = bin.tellg();
bin.clear();
bin.seekg(0, std::ios::beg);
// first byte is height; width is calculated from a combination of filesize & height
// this part works correctly every time
char ch = 0;
bin.get(ch);
ubyte h = ch;
ubyte w = ((size-1)*4)/h;
printf("%dx%d Filesize: %d (%d)\n", w, h, size-1, (size-1)*4);
// fill it in with 0's which means transparent.
for (int ii = 0; ii < w*h; ++ii) {
if (ii < 4096) {
raw[ii] = 0x00;
} else {
return false;
}
}
size_t i;
while (bin.get(ch)) {
// 2bpp mode
// take each byte in the file, split it into 4 bytes.
raw[i] = (ch & 0x03);
raw[i+1] = (ch & 0x0C) >> 2;
raw[i+2] = (ch & 0x30) >> 4;
raw[i+3] = (ch & 0xC0) >> 6;
i = i + 4;
}
texture_sizes[id][1] = w;
texture_sizes[id][2] = h;
glGenTextures(1, &textures[id]);
glBindTexture(GL_TEXTURE_2D, textures[id]);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
GLenum fmt = GL_RED;
GLint swizzleMask[] = { GL_RED, GL_RED, GL_RED, 255 };
glTexParameteriv(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_RGBA, swizzleMask);
glTexImage2D(GL_TEXTURE_2D, 0, fmt, w, h, 0, fmt, GL_UNSIGNED_BYTE, raw);
glBindTexture(GL_TEXTURE_2D, 0);
What's actually happening for some reason, the image is being treated as if it's 20x24; OpenGL(probably?) seems to be forcefully rounding the width up to the nearest number that's divisible by 4. That would be 20. This is despite the w value in my code being correct at 18, it's as if OpenGL is saying "no, I'm going to make it be 20 pixels wide internally."
However since the texture is still being rendered as an 18x24 rectangle, the last 2 pixels of each row - that should be the first 2 pixels of the next row - are just... not being rendered.
Here's what happens when I force my code's w value to always be 20, instead of 18. (I just replaced w = ((size-1)*4)/h with w = 20):
And here's when my w value is 18 again, as in the first image:
As you can see, the image is a whole 2 pixels wider; those 2 pixels at the end of every row should be on the next row, because the width is supposed to be 18, not 20!
This proves that for whatever reason, internally, the texture bytes were parsed and stored as if they were 20x24 instead of 18x24. Why that is, I can't figure out, and I've been trying to solve this specific problem for days. I've verified that the raw bytes and everything are all the values I expect; there's nothing wrong with my data format. Is this an OpenGL bug? Why is OpenGL forcing internally storing my texture as 20x24, when I clearly told it to store it as 18x24? The rest of my code recognizes that I told the width to be 18 not 20, it's just OpenGL itself that doesn't.
Finally, one more note: I've tried loading the exact same file, in the exact same way with the LÖVE framework (Lua), exact same size and exact same bytes as my C++ version and all. And I dumped those bytes into love.image.newImageData and it displays just fine!
That's the final proof that it's not my format's problem; it's very likely OpenGL's problem or something in the code above that I'm overlooking.
How can I solve this problem? The problem being that OpenGL is storing textures internally with an incorrect width value (20 as opposed to the value of 18 that I gave the function) therefore loading the raw unsigned bytes incorrectly.

How do you upload texture data to a Sparse Texture using TexSubImage in OpenGL?

I am following apitest on github, and am seeing some very strange behavior in my renderer.
It seems like the Virtual Pages are not receiving the correct image data.
Original Image is 500x311:
When i render this image using a Sparse Texture, i must resize the backing store to 512x384 (to be a mutliple of the page size) and my result is:
As you can see it looks like a portion of the subimage (a sub sub image) was loaded to each individual virtual page.
To test this, i cropped the image to the size of just 1 virtual page (256x128): here is the result:
as expected, the single virutal page was filled with the exact, correct, cropped image.
Lastly, I increased the crop size to be 2 virtual pages worth, 256x256, one on top of another. here is the result:
This proves that calling texSubimage with an amount of texelData larger than Virtual_Page_Size causes errors.
Does care need to be taken when passing data to glsubimage that is larger than the virtual page size? I see no logic for this in apitest so think this could be a driver issue. Or I am missing something major.
Here is some code:
I stored the Texture in a Texture Array and to simplify turned the array into just a texture2d. both produce the same exact result. Here is the Texture Memory Allocation:
_check_gl_error();
glGenTextures(1, &mTexId);
glBindTexture(GL_TEXTURE_2D, mTexId);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_SPARSE_ARB, GL_TRUE);
// TODO: This could be done once per internal format. For now, just do it every time.
GLint indexCount = 0,
xSize = 0,
ySize = 0,
zSize = 0;
GLint bestIndex = -1,
bestXSize = 0,
bestYSize = 0;
glGetInternalformativ(GL_TEXTURE_2D, internalformat, GL_NUM_VIRTUAL_PAGE_SIZES_ARB, 1, &indexCount);
if(indexCount == 0) {
fprintf(stdout, "No Virtual Page Sizes for given format");
fflush(stdout);
}
_check_gl_error();
for (GLint i = 0; i < indexCount; ++i) {
glTexParameteri(GL_TEXTURE_2D, GL_VIRTUAL_PAGE_SIZE_INDEX_ARB, i);
glGetInternalformativ(GL_TEXTURE_2D, internalformat, GL_VIRTUAL_PAGE_SIZE_X_ARB, 1, &xSize);
glGetInternalformativ(GL_TEXTURE_2D, internalformat, GL_VIRTUAL_PAGE_SIZE_Y_ARB, 1, &ySize);
glGetInternalformativ(GL_TEXTURE_2D, internalformat, GL_VIRTUAL_PAGE_SIZE_Z_ARB, 1, &zSize);
// For our purposes, the "best" format is the one that winds up with Z=1 and the largest x and y sizes.
if (zSize == 1) {
if (xSize >= bestXSize && ySize >= bestYSize) {
bestIndex = i;
bestXSize = xSize;
bestYSize = ySize;
}
}
}
_check_gl_error();
mXTileSize = bestXSize;
glTexParameteri(GL_TEXTURE_2D, GL_VIRTUAL_PAGE_SIZE_INDEX_ARB, bestIndex);
_check_gl_error();
//Need to ensure that the texture is a multiple of the tile size.
physicalWidth = roundUpToMultiple(width, bestXSize);
physicalHeight = roundUpToMultiple(height, bestYSize);
// We've set all the necessary parameters, now it's time to create the sparse texture.
glTexStorage2D(GL_TEXTURE_2D, levels, GL_RGBA8, physicalWidth, physicalHeight);
_check_gl_error();
for (GLsizei i = 0; i < slices; ++i) {
mFreeList.push(i);
}
_check_gl_error();
mHandle = glGetTextureHandleARB(mTexId);
_check_gl_error();
glMakeTextureHandleResidentARB(mHandle);
_check_gl_error();
mWidth = physicalWidth;
mHeight = physicalHeight;
mLevels = levels;
Here is what happens after the allocation:
glTextureSubImage2DEXT(mTexId, GL_TEXTURE_2D, level, 0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, data);
I have tried making width and height the physical width of the backing store AND the width/height of the incoming image content. Neither produce desired results. I exclude mip levels for now. When I was using Mip levels and the texture array i was getting different results but similar behavior.
Also the image is loaded from SOIL and before i implemented sparse textures, that worked very well (before sparse i implemented bindless).

Compressed texture batching in OpenGL

I'm trying to create an atlas of compressed textures but I can't seem to get it working. Here is a code snippet:
void Texture::addImageToAtlas(ImageProperties* imageProperties)
{
generateTexture(); // delete and regenerate an empty texture
bindTexture(); // bind it
atlasProperties.push_back(imageProperties);
width = height = 0;
for (int i=0; i < atlasProperties.size(); i++)
{
width += atlasProperties[i]->width;
height = atlasProperties[i]->height;
}
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
// glCompressedTexImage2D MUST be called with valid data for the 'pixels'
// parameter. Won't work if you use zero/null.
glCompressedTexImage2D(GL_TEXTURE_2D, 0,
GL_COMPRESSED_RGBA8_ETC2_EAC,
width,
height,
0,
(GLsizei)(ceilf(width/4.f) * ceilf(height/4.f) * 16.f),
atlasProperties[0]->pixels);
// Recreate the whole atlas by adding all the textures we have appended
// to our vector so far
int x, y = 0;
for (int i=0; i < atlasProperties.size(); i++)
{
glCompressedTexSubImage2D(GL_TEXTURE_2D,
0,
x,
y,
atlasProperties[i]->width,
atlasProperties[i]->height,
GL_RGBA,
(GLsizei)(ceilf(atlasProperties[i]->width/4.f) * ceilf(atlasProperties[i]->height/4.f) * 16.f),
atlasProperties[i]->pixels);
x += atlasProperties[i]->width;
}
unbindTexture(); // unbind the texture
}
I'm testing this with just 2 small KTX textures that have the same size and as you can see from the code I'm trying to append the second one next to the first one on the x axis.
My KTX parsing works fine as I can render individual textures but as soon as I try to batch (that is as soon as I use glCompressedTexSubImage2d) I get nothing on the screen.
It might be useful to know that all of this works fine if I replace compressed textures with PNGs and swap the glCompressedTexImage2d and glCompressedTexSubImage2d with their non-compressed versions...
One of the things that I cannot find any information on is the x and y position of the textures in the atlas. How do I offset them? So if the first texture has a width of 60 pixels for example, do I just position the second one at 61?
I've seen some code online where people calculate the x and y position as follows:
x &= ~3;
y &= ~3;
Is this what I need to do and why? I've tried it but it doesn't seem to work.
Also, I'm trying the above code on an ARM i.mx6 Quad with a Vivante GPU, and I get the suspicion from what I read online that glCompressedTexSubImage2d might not be working on this board.
Can anyone please help me out?
The format you pass to glCompressedTexSubImage2D() must be the same as the one used for the corresponding glCompressedTexImage2D(). From the ES 2.0 spec:
This command does not provide for image format conversion, so an INVALID_OPERATION error results if format does not match the internal format of the texture image being modified.
Therefore, to match the glCompressedTexImage2D() call, the glCompressedTexSubImage2D() call needs to be:
glCompressedTexSubImage2D(GL_TEXTURE_2D,
0, x, y, atlasProperties[i]->width, atlasProperties[i]->height,
GL_COMPRESSED_RGBA8_ETC2_EAC,
(GLsizei)(ceilf(atlasProperties[i]->width/4.f) *
ceilf(atlasProperties[i]->height/4.f) * 16.f),
atlasProperties[i]->pixels);
As for the sizes and offsets:
Your logic of determining the overall size would only work if the height of all sub-images is the same. Or more precisely, since the height is set to the height of the last sub-image, if no other height is larger than the last one. To make it more robust, you would probably want to use the maximum height of all sub-images.
I was surprised that you can't pass null as the last argument of glCompressedTexImage2D(), but it seems to be true. At least I couldn't find anything allowing it in the spec. But this being the case, I don't think it would be ok to simply pass the pointer to the data of the first sub-image. That would not be enough data, and it would read beyond the end of the memory. You may have to allocate and pass "data" that is large enough to cover the entire atlas texture. You could probably set it to anything (e.g. zero it out), since you're going to replace it anyway.
The way I read the ETC2 definition (as included in the ES 3.0 spec), the width/height of the texture do not strictly have to be multiples of 4. However, the positions for glCompressedTexSubImage2D() do have to be multiples of 4, as well as the width/height, unless they extend to the edge of the texture. This means that you have to make the width of each sub-image except the last a multiple of 4. At that point, you might as well use a multiple of 4 for everything.
Based on this, I think the size determination should look like this:
width = height = 0;
for (int i = 0; i < atlasProperties.size(); i++)
{
width += (atlasProperties[i]->width + 3) & ~3;
if (atlasProperties[i]->height > height)
{
height = atlasProperties[i]->height;
}
}
height = (height + 3) & ~3;
uint8_t* dummyData = new uint8_t[width * height];
memset(dummyData, 0, width * height);
glCompressedTexImage2D(GL_TEXTURE_2D, 0,
GL_COMPRESSED_RGBA8_ETC2_EAC,
width, height, 0,
width * height,
dummyData);
delete[] dummyData;
Then to set the sub-images:
int xPos = 0;
for (int i = 0; i < atlasProperties.size(); i++)
{
int w = (atlasProperties[i]->width + 3) & ~3;
int h = (atlasProperties[i]->height + 3) & ~3;
glCompressedTexSubImage2D(GL_TEXTURE_2D,
0, xPos, 0, w, h,
GL_COMPRESSED_RGBA8_ETC2_EAC,
w * h,
atlasProperties[i]->pixels);
xPos += w;
}
The whole thing would get slightly simpler if you could ensure that the original texture images already had sizes that are multiples of 4. Then you can skip rounding up the sizes/positions to multiples of 4.
After all, this was one of those mistakes that make you want to hit your head on a wall. GL_COMPRESSED_RGBA8_ETC2_EAC was actually not supported on the board.
I copied it from the headers but it did not query the device for supported formats. I can use a DXT5 format just fine with this code.

How to load devIL image from raw data

I would like to create devIL image from raw texture data, but I can't seem to find a way to do it. The proper way seems to be ilLoadL with IL_RAW, but I can't get it to work. The documentation in here says that that there should be 13-byte header in the data, so i just put meaningless data there. If I put 0 to "size" parameter of ilLoadL,
I'll get black texture, no matter what. Otherwise my program refuses to draw anything. ilIsImage returns true, and I can create openGL texture from it just fine. The code works if I load texture from file.
It's not much, but here's my code so far:
//Loading:
ilInit();
iluInit();
ILuint ilID;
ilGenImages(1, &ilID);
ilBindImage(ilID);
ilEnable(IL_ORIGIN_SET);
ilOriginFunc(IL_ORIGIN_LOWER_LEFT);
//Generate 13-byte header and fill it with meaningless numbers
for (int i = 0; i < 13; ++i){
data.insert(data.begin() + i, i);
}
//This fails.
if (!ilLoadL(IL_RAW, &data[0], size)){
std::cout << "Fail" << std::endl;
}
Texture creation:
ilBindImage(ilId[i]);
ilConvertImage(IL_RGBA, IL_UNSIGNED_BYTE);
glBindTexture(textureTarget, id[i]);
glTexParameteri(textureTarget, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(textureTarget, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameterf(textureTarget, GL_TEXTURE_MIN_FILTER, filters[i]);
glTexParameterf(textureTarget, GL_TEXTURE_MAG_FILTER, filters[i]);
glTexImage2D(textureTarget, 0, GL_RGBA,
ilGetInteger(IL_IMAGE_WIDTH), ilGetInteger(IL_IMAGE_HEIGHT),
0, GL_RGBA, GL_UNSIGNED_BYTE, ilGetData());
If an image format has a header, you can generally assume it contains some important information necessary to correctly read the rest of the file. Filling it with "meaningless data" is inadvisable at best.
Since there is no actual struct in DevIL for the .raw header, let us take a look at the implementation of iLoadRawInternal () to figure out what those first 13-bytes are supposed to be.
// Internal function to load a raw image
ILboolean iLoadRawInternal()
{
if (iCurImage == NULL) {
ilSetError(IL_ILLEGAL_OPERATION);
return IL_FALSE;
}
iCurImage->Width = GetLittleUInt(); /* Bytes: 0-3 {Image Width} */
iCurImage->Height = GetLittleUInt(); /* Bytes: 4-7 {Image Height} */
iCurImage->Depth = GetLittleUInt(); /* Bytes: 8-11 {Image Depth} */
iCurImage->Bpp = (ILubyte)igetc(); /* Byte: 12 {Bytes per-pixel} */
NOTE: The /* comments */ are my own
GetLittleUInt () reads a 32-bit unsigned integer in little-endian order and advances the read location appropriately. igetc () does the same for a single byte.
This is equivalent to the following C structure (minus the byte order consideration):
struct RAW_HEADER {
uint32_t width;
uint32_t height;
uint32_t depth; // This is depth as in the number of 3D slices (not bit depth)
uint8_t bpp; // **Bytes** per-pixel (1 = Luminance, 3 = RGB, 4 = RGBA)
};
If you read the rest of the implementation of iLoadRawInternal () in il_raw.c, you will see that without proper values in the header DevIL will not be able to calculate the correct file size. Filling in the correct values should help.

OpenGL calls segfault when called from OpenMP thread

Let me start by trying to specify what I want to do:
Given a grey scale image, I want to create 256 layers (assuming 8bit images), where each layer is the image thresholded with a grey scale i -- which is also the i'th layer (so, i=0:255). For all of these layers I want to compute various other things which are not very relevant to my problem, but this should explain the structure of my code.
The problem is that I need to execute the code very often, so I want to speed things up as much as possible, using a short amount of time (so, simple speedup tricks only). Therefore I figured I could use the OpenMP library, as I have a quad core, and everything is CPU-based at the moment.
This brings me to the following code, which executes fine (at least, it looks fine :) ):
#pragma omp parallel for private(i,out,tmp,cc)
for(i=0; i< numLayers; i++){
cc=new ConnectedComponents(255);
out = (unsigned int *) malloc(in->dimX()* in->dimY()*sizeof(int));
tmp = (*in).dupe();
tmp->threshold((float) i);
if(!tmp){ printf("Could not allocate enough memory\n"); exit(-1); }
cc->connected(tmp->data(),out,tmp->dimX(),tmp->dimY(),std::equal_to<unsigned int>(), true);
free(out);
delete tmp;
delete cc;
}
ConnectedComponents is just some library which implements the 2-pass floodfill, just there for illustration, it is not really part of the problem.
This code finishes fine with 2,3,4,8 threads (didn't test any other number).
So, now the weird part. I wanted to add some visual feedback, helping me to debug. The object tmp contains a method called saveAsTexture(), which basically does all the work for me, and returns the texture ID. This function works fine single threaded, and also works fine with 2 threads. However, as soon as I go beyond 2 threads, the method causes a segmentation fault.
Even with #pragma omp critical around it (just in case saveAsTexture() is not thread-safe), or executing it only once, it still crashes. This is the code I have added to the previous loop:
if(i==100){
#pragma omp critical
{
tmp->saveToTexture();
}
}
which is only executed once, since i is the iterator, and it is a critical section... Still, the code ALWAYS segfaults at the first openGL call (bruteforce tests with printf(), fflush(stdout)).
So, just to make sure I am not leaving out relevant information, here is the saveAsTexture function:
template <class T> GLuint FIELD<T>::saveToTexture() {
unsigned char *buf = (unsigned char*)malloc(dimX()*dimY()*3*sizeof(unsigned char));
if(!buf){ printf("Could not allocate memory\n"); exit(-1); }
float m,M,avg;
minmax(m,M,avg);
const float* d = data();
int j=0;
for(int i=dimY()-1; i>=0; i--) {
for(const float *s=d+dimX()*i, *e=s+dimX(); s<e; s++) {
float r,g,b,v = ((*s)-m)/(M-m);
v = (v>0)?v:0;
if (v>M) { r=g=b=1; }
else { v = (v<1)?v:1; }
r=g=b=v;
buf[j++] = (unsigned char)(int)(255*r);
buf[j++] = (unsigned char)(int)(255*g);
buf[j++] = (unsigned char)(int)(255*b);
}
}
GLuint texid;
glPixelStorei(GL_UNPACK_ALIGNMENT,1);
glDisable(GL_TEXTURE_3D);
glEnable(GL_TEXTURE_2D);
glActiveTexture(GL_TEXTURE0);
glGenTextures(1, &texid);
printf("TextureID: %d\n", texid);
fflush(stdout);
glBindTexture(GL_TEXTURE_2D, texid);
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT );
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, dimX(), dimY(), 0, GL_RGB, GL_UNSIGNED_BYTE, buf);
glBindTexture(GL_TEXTURE_2D, 0);
glDisable(GL_TEXTURE_2D);
free(buf);
return texid;
}
It is good to note here that T is ALWAYS a float in my program.
So, I do not understand why this program works fine when executed with 1 or 2 threads (executed ~25 times, 100% success), but segfaults when using more threads (executed ~25 times, 0% success). And ALWAYS at the first openGL call (e.g. if I remove glPixelStorei(), it segfaults at glDisable()).
Am I overlooking something really obvious, am I encountering a weird OpenMP bug, or... what is happening?
You can only make OpenGL calls from one thread at a time, and the thread has to have the current context active.
An OpenGL context can only be used by one thread at a time (limitation imposed by wglMakeCurrent/glxMakeCurrent).
However, you said you're using layers. I think you can use different contexts for different layers, with the WGL_ARB_create_context extension (I think there's one for linux too) and setting the WGL_CONTEXT_LAYER_PLANE_ARB parameter. Then you could have a different context per thread, and things should work out.
Thank you very much for all the answers! Now I know why it fails I have decided to simply store everything in a big 3D texture (because this was an even easier solution), and just send all the data to the GPU at once. That works fine in this case.