Can I rely on SDL_Surface::pitch being constant? - sdl

I'm working on a project which utilises SDL 1.2.15. The application constructs a SDL_Surface whose frame buffer is then retreived via getDisplaySurface()->pixels and sent via serial line.
I learned, that the pixel buffer pointed to by SDL_Surface::pixels is not necessarily continuous. The byte sequence might be interrupted by blocks of data which are not part of the visible image area.
That means the image is of size 320×240, but the pixel buffer could be of size, let's say, 512×240. (I imagine speedups possible due to memory alignment could be a valid reason. That's just my assumption which is not backed by actual knowledge, though.)
Question:
In my case, I happen to be lucky and the the pixel buffer has exactly the dimensions of my image. Can I trust that the pixel buffer dimensions wouldn't change?
That way I could just send the pixel buffer content to the serial interface and don't have to write code dealing with removal of those invalid blocks.

SDL uses 4-byte alignment for rows. It also matches OpenGL's default alignment.

Related

What is a buffer in Computer Graphics

Give me a brief, clear definition of a Buffer in Computer Graphics, then a short description of a buffer.
Most of the definitions on the internet are answering "Frame Buffer" yet there are other types of buffers in computer graphics to be more specific in OpenGL.
Someone to give me a brief, clear definition of a Buffer in Computer Graphics
There isn't one. "Buffer" is a term that is overloaded and can mean different things in different contexts.
A "framebuffer" (one word) basically has no relation to many other kinds of "buffer". In OpenGL, a "framebuffer" is an object that has attached images to which you can render.
"Buffering" as a concept generally means using multiple, usually identically sized, regions of storage in order to prevent yourself from writing to a region while that region is being consumed by some other process. The default framebuffer in OpenGL may be double-buffered. This means that there is a front image and a back image. You render to the back image, then swap the images to render the next frame. When you swap them, the back image becomes the front image, which means that it is now visible. You then render to the old front image, now the back image, which is no longer visible. This prevents seeing incomplete rendering products, since you're never writing to the image that is visible.
You'll note that while a "framebuffer" may involve "buffering," the concept of "buffering" can be used with things that aren't "framebuffers". The two are orthogonal, unrelated.
The most broad definition of "buffer" might be "some memory that is used to store bulk data". But this would also include "textures", which in most APIs do not consider to be "buffers".
OpenGL (and Vulkan) as an API have a more strict definition. A "buffer object" is an area of contiguous, unformatted memory which can be read from or written to by various GPU processes. This is distinct from a "texture" object, which has a specific format that is internal to the implementation. Because a texture's format is not known to you, you are not allowed to directly manipulate the bytes of a texture's storage. Any bytes you upload to it or read from it are done through an API that allows the implementation to play with them.
For buffer objects, you can load arbitrary bytes to a buffer object's storage without the API (directly) knowing what those bytes mean. You can even map the storage and access it like a regular pointer to CPU memory.
"Buffer" you can simply think of it as a block of memory .
But you have to specify the context here because it means many things.
For Example In the OpenGL VBO concept. This means vertex buffer object which we use it to store vertices data in it . Like that we can do many things, we can store indices in a buffer, textures, etc.,
And For the FrameBuffer you mentioned, It is an entirely different topic. In OpenGL or Vulkan we can create custom framebuffers called frame buffer objects(FBO) apart from the default frame buffer. We can bind FBO and draw things onto it & by adding texture as an attachment we can get whatever we draw on the FBO updated to that texture.
So Buffer has so many meanings here,

C++ running out of memory trying to draw large image with OpenGL

I have created a simple 2D image viewer in C++ using MFC and OpenGL. This image viewer allows a user to open an image, zoom in/out, pan around, and view the image in its different color layers (cyan, yellow, magenta, black). The program works wonderfully for reasonably sized images. However I am doing some stress testing on some very large images and I am easily running out of memory. One such image that I have is 16,700x15,700. My program will run out of memory before it can even draw anything because I am dynamically creating an UCHAR[] with a size of height x width x 4. I multiply it by 4 because there is one byte for each RGBA value when I feed this array to glTexImage2D(GLTEXTURE_2D, 0, GL_RGB8, width, height, 0, GL_RGBA, GLUNSIGNED_BYTE, (GLvoid*)myArray)
I've done some searching and have read a few things about splitting my image up into tiles, instead of one large texture on a single quad. Is this something that I should be doing? How will this help me with my memory? Or is there something better that I should be doing?
Your allocation is of size 16.7k * 15.7k * 4 which is ~1GB in size. The rest of the answer depends on whether you are compiling to 32 bit or 64 bit executable and whether you are making use of Physical Address Extensions (PAE). If you are unfamiliar with PAE, chances are you aren't using it, by the way.
Assuming 32 Bit
If you have a 32 bit executable, you can address 3GB of that memory so one third of your memory is being used up in a single allocation. Now, to add to the problem, when you allocate a chunk of memory, that memory must be available as a single continuous range of free memory. You might easily have more than 1GB of memory free but in chunks smaller than 1GB, which is why people suggest you split your texture up into tiles. Splitting it into 32 x 32 smaller tiles means you are allocating 1024 allocations of 1MB for example (this is probably unnecessarily fine-grained).
Note: citation required but some modes of linux allow only 2GB..
Assuming 64 Bit
It seems unlikely that you are building a 64 bit executable, but if you were then the logically addressable memory is much higher. Typical numbers will be 2^42 or 2^ 48 (4096 GB and 256 TB, respectively). This means that large allocations shouldn't fail under anything other than artificial stress tests and you will kill your swapfile before you exhaust the logical memory.
If your constraints / hardware allow, I'd suggest building to 64bit instead of 32bit. Otherwise, see below
Tiling vs. Subsampling
Tiling and subsampling are not mutually exclusive, up front. You may only need to make one change to solve your problem but you might choose to implement a more complex solution.
Tiling is a good idea if you are in 32 bit address space. It complicates the code but removes the single 1GB contiguous block problem that you seem to be facing. If you must build a 32 bit executable, I would prefer that over sub-sampling the image.
Sub-sampling the image means that you have an additional (albeit-smaller) block of memory for the subsampled vs. original image. It might have a performance advantage inside openGL but set that against additional memory pressure.
A third way, with additional complications is to stream the image from disk when necessary. If you zoom out to show the whole image, you will be subsampling >100 pixels per screen pixel on a 1920 x 1200 monitor. You might choose to create an image that is significantly subsampled by default, and use that until you are sufficiently zoomed-in that you need a higher-resolution version of a subset of the image. If you are using SSDs this can give acceptable performance but adds a lot by way of additional complication.

Loading an uncompressed DDS to GL texture

I've this DDS file. I wrote a simple DDS reader to read a DDS header and print its details based on the MSDN specification. It says that the it's an RGB DDS with 32 bytes per pixel bit depth and the alpha is ignored i.e. pixel format is X8R8G8B8 (or A8R8G8B8). To verify this, I also opened this file in a hex editor which shows the first (i.e. from the data start) 4 bytes as BB GG RR 00 (replace them with the first pixel's right hex colour values). I read that OpenGL's texture copy functions act on bytes (atleast conceptually) and thus from its viewpoint, this data is B8G8R8A8. Please correct me if my understanding is wrong here.
Now to glTexImage2D internal format I pass RGBA8 and to external format and type I pass BGRA and UNSIGNED_BYTE. This leads to a blue tint in the rendered output. In my fragment shader, just to verify, I did a swizzle to swap R and B and it renders correctly.
I reverted the shader code and then replaced the type from UNSIGNED_BYTE with UNSIGNED_INT_8_8_8_8_REV (based on this suggestion) and it still renders the blue tint. Now changing the external format to RGBA and with either type (UNSIGNED_BYTE or UNSIGNED_INT_8_8_8_8_REV) it renders fine!
Since OpenGL doesn't support ARGB, giving BGRA is understandable. But how come RGBA is working correctly here? This seems wrong.
Why does the type have no effect on the ordering of the channels?
Does the GL_UNPACK_ALIGNMENT have a bearing in this? I left it as the default (4). If I read the manual right, this should have no effect on how the client memory is read.
Details
OpenGL version 3.3
Intel HD Graphics that supports upto OpenGL 4.0
Used GLI to load the DDS file and get the data pointer
I finally found the answers myself! Posting it here so that it may help someone in future.
Since OpenGL doesn't support ARGB, giving BGRA is understandable. But how come RGBA is working correctly here? This seems wrong.
By inspecting the memory pointed to by void* data that GLI returns when a pointer to the image's binary data is asked for, it can be seen that GLI had already reordered the bytes when transferring data from the file to client memory. The memeory window shows, from lower to higher address, data in the form RR GG BB AA. This explains why passing GL_RGBA works. However, the wrong on GLI's part is that when external format is queried for it returns GL_BGRA instead of GL_RGBA. A bug to address this has been raised.
Why does the type have no effect on the ordering of the channels?
No, it has an effect. The machine that I'm trying this experiment on is an Intel x86_64 little endian machine. OpenGL Wiki clearly states that the client pixel data is always in client byte ordering. Now when GL_UNSIGNED_BYTE or GL_UNSIGNED_INT_8_8_8_8_REV is passed, the underlying base type (not the component type) is an unsigned int for both; thus reading an int from data, on a little-endian machine would mean, the variable in register would end up with the bytes swapped i.e. RR GG BB AA in the RAM would reach the VRAM asAA BB GG RR; when addressed by a texture of type RGBA (RR GG BB AA), reading AA would actually give RR. To correct it, the OpenGL implementation swaps the bytes to neutralise the endianness of the machine, in the case of GL_UNSIGNED_BYTE type. While for GL_UNSIGNED_INT_8_8_8_8_REV we explicitly instruct OpenGL to swap the byte order and thus it renders correctly. However, if the type is passed as GL_UNSIGNED_INT_8_8_8_8 then the rendering is screwed up, since we instruct OpenGL to copy the bytes as it was read on the machine.
Does the GL_UNPACK_ALIGNMENT have a bearing in this? I left it as the default (4). If I read the manual right, this should have no effect on how the client memory is read.
It does have a bearing on the unpacking of texture data from client memory to server memory. However, that's to account for the padding bytes present in an image's rows to compute the stride (pitch) correctly. But to this issue specifically it doesn't have a bearing since it's pitch flag is 0 i.e. there're no padding bits in the DDS file in question.
Related material: https://www.opengl.org/wiki/Pixel_Transfer#Pixel_type

OpenCL/OpenGL Interop dimensions of a renderbuffer in relation to workgroup sizes

I'm doing some tests on OpenCL/OpenGL interop. One of them consists of creating an OpenGL FBO with an attached renderbuffer. I'm trying to write pixels to an associated OpenCL memory object (a image2d_t) in a kernel, keeping everything on the GPU and blitting the FBO to the main OpenGL framebuffer every frame. The framebuffer is shown in a resizable window so the size of the renderbuffer can vary.
If I try to run the kernel I get a CL_INVALID_WORKGROUP_SIZE size error unless the width and height dimensions are multiples of the (local) workgroup sizes.
Is this really necessary? I don't like having to pad extra memory, introduce width/height parameters and add additional boundary checks inside the kernel if I can avoid it...
I also wouldn't like to only use workgroup sizes of 1 ;-)
Yes, it is specified in the OpenCL specification, up to 1.2, that the global size shall be a multiple of the local size. In 2.0 they relaxed this, but of course there are no 2.0 implementations yet.
The common workaround is to round-up your global work size to the next multiple of the local work size, but pass in the desired (real) global size as parameters, and then in your kernel check if global_id(0)/global_id(1) are less than the real size before doing work.
Alternative, pass NULL as your local work size and let the runtime select (but a tuned-for-the-hardware local work size is usually faster).

Read Framebuffer-texture like an 1D array

I am doing some gpgpu calculations with GL and want to read my results from the framebuffer.
My framebuffer-texture is logically an 1D array, but I made it 2D to have a bigger area. Now I want to read from any arbitrary pixel in the framebuffer-texture with any given length.
That means all calculations are already done on GPU side and I only need to pass certain data to the cpu that could be aligned over the border of the texture.
Is this possible? If yes is it slower/faster than glReadPixels on the whole image and then cutting out what I need?
EDIT
Of course I know about OpenCL/CUDA but they are not desired because I want my program to run out of the box on (almost) any platform.
Also I know that glReadPixels is very slow and one reason might be that it offers some functionality that I do not need (Operating in 2D). Therefore I asked for a more basic function that might be faster.
Reading the whole framebuffer with glReadPixels just to discard it all except for a few pixels/lines would be grossly inefficient. But glReadPixels lets you specify a rect within the framebuffer, so why not just restrict it to fetching the few rows of interest ? So you maybe end up fetching some extra data at the start and end of the first and last lines fetched, but I suspect the overhead of that is minimal compared with making multiple calls.
Possibly writing your data to the framebuffer in tiles and/or using Morton order might help structure it so a tighter bounding box can be be found and the extra data retrieved minimised.
You can use a pixel buffer object (PBO) to transfer pixel data from the framebuffer to the PBO, then use glMapBufferARB to read the data directly:
http://www.songho.ca/opengl/gl_pbo.html