Are there any known size/space limitation of QPixmap and/or QImage objects documented? I did not find any useful information regarding this. I'm currently using Qt 4.7.3 on OSX and Windows. Particulary I'm interested in:
Width/Height limits?
Limits depending on color format?
Difference between 32/64 bit machines?
Difference regarding OS?
I would naively suspect that memory is the only limitation, so one could calculate max size by
width x height x byte_per_pixel
I assume that there is a more elaborate rule of thumb; also 32 bit machines may have addressing problems when you run into GB dimensions.
In the end I want to store multiple RGBA images of about 16000x16000 pixel in size and render them using transparency onto each other within a QGraphicsScene. The workstation available can have a lot of RAM, let's say 16GB.
tl;dr: What size limits of QImage/QPixmap are you aware of, or where can I find such information?
Edit: I'm aware of the tiling approach and I'm fine with that. Still it would be great to know the things described above.

Both are limited to 32767x32767 pixels. That is, you can think of them as using a signed 16-bit value for both the X and Y resolution.
No axis can ever exceed 32767 pixels, even if the other axis is only 1 pixel. Operating system "bitness" does not affect the limitation.
The underlying system may run into other limits, such as memory as you mentioned, before such a huge image can be created.
You can see an example of this limitation in the following source code:
if (uint(w) >= 32768 || uint(h) >= 32768) {
w = h = 0;
is_null = true;

Building on the answer by #charles-burns, here is relevant source code for QImage:
QImageData *d = 0;
if (format == QImage::Format_Invalid)
return d;
const int depth = qt_depthForFormat(format);
const int calc_bytes_per_line = ((width * depth + 31)/32) * 4;
const int min_bytes_per_line = (width * depth + 7)/8;
if (bpl <= 0)
bpl = calc_bytes_per_line;
if (width <= 0 || height <= 0 || !data
|| INT_MAX/sizeof(uchar *) < uint(height)
|| INT_MAX/uint(depth) < uint(width)
|| bpl <= 0
|| height <= 0
|| bpl < min_bytes_per_line
|| INT_MAX/uint(bpl) < uint(height))
return d; // invalid parameter(s)
So here, bpl is the number of bytes per line, which is effectively width * depth_in_bytes. Using algebra on that final invalid test:
INT_MAX/uint(bpl) < uint(height)
INT_MAX < uint(height) * uint(bpl)
INT_MAX < height * width * depth_in_bytes
So, your image size in total must be less than 2147483647 (for 32-bit ints).

I actually had occasion to look into this at one time. Do a search in the source code of qimage.cpp for "sanity check for potential overflows" and you can see the checks that Qt is doing. Basically,
The number of bytes required (width * height * depth_for_format) must be less than INT_MAX.
It must be able to malloc those bytes at the point you are creating the QImage instance.

Are you building a 64 bit app? If not, you are going to run into memory issues very quickly. On Windows, even if the machine has 16GB ram, a 32 bit process will be limited to 2GB (Unless it is LARGEADDRESSAWARE then 3GB). A 16000x16000 image will be just under 1 GB, so you'll only be able to allocate enough memory for 1, maybe 2 if you are very lucky.
With a 64 bit app you should be able to allocate enough memory for several images.

When I try to load JPEG with size 6160x4120 to QPixmap I get this warning: "qt.gui.imageio: QImageIOHandler: Rejecting image as it exceeds the current allocation limit of 128 megabytes" and returns empty QPixmap.
This seems to be the most strict constraint I have found so far.
There is however an option to increase this limit with void QImageReader::setAllocationLimit(int mbLimit).


How to automatically determine the CUDA block size and grid size for a 2D array?

how to determine block size and grid size automatically for 2D array (e.g. image processing) in CUDA?
CUDA has cudaOccupancyMaxPotentialBlockSize() function to calculate block size for cuda kernel functions automatically. see here. In this case, it works well for 1D array.
For my case, I have a 640x480 image.
How to determine the block/grid size?
I use:
////image size: 640x480
int x_min_grid_size, x_grid_size, x_block_size;
int y_min_grid_size, y_grid_size, y_block_size;
&x_min_grid_size, &x_block_size,
0, image.width()
&y_min_grid_size, &y_block_size,
0, image.height()
x_grid_size = (image.width() + x_block_size - 1) / x_block_size;
y_grid_size = (image.height() + y_block_size - 1) / y_block_size;
dim3 grid_dim(x_grid_size, y_grid_size);
dim3 block_dim(x_block_size, y_block_size);
my_cuda_kernel<<<grid_dim, block_dim>>>(<arguments...>)
////check cuda kernel function launch error
cudaError_t error = cudaGetLastError();
if(cudaSuccess != error)
std::cout<<"CUDA Error! "<<cudaGetErrorString(error)<<std::endl;
Question 1
Can I calculate block/grid size using this method?
For this code, I got an error after the kernel function launched.
CUDA Error! invalid configuration arguments
If I set x_block_size = 32; y_block_size = 32 manually, it works and has no error.
Can I ask why CUDA get invalid configuration arguments error message? It seems that I cannot use cudaOccupancyMaxPotentialBlockSize() directly for 2D array?
Potential Solution
I got an idea about the potential solution:
What if I calculate thread number first, and then use cudaOccupancyMaxPotentialBlockSize() calculate block size for 2D array:
////total_thread_num = 640x480 = 307200
int total_thread_num = image.width * image.height;
////compute block/grid size
int min_grid_size, grid_size, block_size;
&min_grid_size, &block_size,
0, total_thread_num
grid_size = (total_thread_num + block_size - 1) / block_size;
//launch CUDA kernel function
my_cuda_kernel<<<grid_size, block_size>>>(<arguments...>);
In my_cuda_kernel, it computes the corresponding index based on image size:
__global__ void my_cuda_kernel()
//compute 2D index based on 1D index;
unsigned int idx = BlockIdx.x * blockDim.x + threadIdx.x;
unsigned int row_idx = idx / image.width;
unsigned int col_idx = idx % image_width;
/*kernel function code*/
Question 2
If the method in Question 1 is not feasible, can I use the method above?
Question 1 Can I calculate block/grid size using this method?
It is important to remember than these API calls provide the occupancy maximizing number of threads per block and not the block dimensions. If you run the API twice in each direction, you will likely get an illegal block size when the two values are combined. For example, if the occupancy maximizing thread count for a kernel was 256, then you could wind up with a 256 x 256 block size, which is far larger than 1024 total threads per block, thus the launch failure.
Question 2 If the method in Question 1 is not feasible, can I use the method above?
In principle, that should work, although you are taking a small performance penalty because the integer modulo operation isn't particularly fast on the GPU. Alternatively, you could calculate a 2D block size which satisfies your needs from the maximum threads per block return by the API.
For example, if you just want blocks with 32 threads in the block dimension which you will map to the major order of your data (for memory coalescing), then just divide the thread count by 32 (noting that the API will always return a round multiple of 32 threads per block because that is the warp size). So, as an example, if the threads per block return from the API was 384, then your block size would be 32 x 12.
If you really want some sort of tiling scheme which uses square blocks, then it is pretty easy to work out that only 64 (8 x 8), 256 (16 x 16), 576 (24 x 24) and 1024 (32 x 32) are the feasible block sizes which are both square numbers and round multiples of 32. In that case you probably want to select the larger block size which is less than or equal to the total thread count returned by the API.
Ultimately how you choose to do this will depend on the requirements of your kernel code. But it certainly is possible to devise a scheme for 2D block dimensioning which is compatible with the block sizing APIs which CUDA currently exposes

Figuring out best solution for a Maze solver, with animated output

My ultimate goal is to use Fltk to take user inputs of pixels, display a generated maze (either my own, or fetch it from the website mentioned in the details), and then show the animated solution.
This is what i've managed so far:
I'm in my first c++/algorithm class of a bachelors in CE.
As we've been learning about graphs, dijkstra etc. the last weeks i decided after watching Computerphile's video about Maze solving, to try to put the theory into "practice".
At first i wanted to output a maze from this site, http://hereandabove.com/maze/mazeorig.form.html, with the plotted solution. I chose that walls and paths should be 1x1 pixel, to make it easier to make into a 2D-vector, and then a graph.
This went well, and my program outputs a solved .png file, using dijkstra to find the shortest path.
I then wanted to put the entire solution in an animated gif.
This also works well. For each pixel it colors green/yellow, it passes an RGBA-vector to a gif-library, and in the end i end up with an animated step by step solution.
I also for each RGBA-vector passed to the gif-library, scale it up first, using this function:
//Both the buffer and resized buffer are member variables, and for each //plotted pixel in the path it updates 'buffer', and in this function makes a //larger version of it to 'resized_buffer'
// HEIGHT and WIDTH are the original size
// nHeight and nWidth are the new size.
bool Maze_IMG::resample(int nWidth, int nHeight)
if (buffer.size() == 0) return false;
for (int i = 0; i < nWidth * nHeight * 4; i++) resized_buffer.push_back(-1);
double scaleWidth = (double)nWidth / (double)WIDTH;
double scaleHeight = (double)nHeight / (double)HEIGHT;
for (int cy = 0; cy < nHeight; cy++)
for (int cx = 0; cx < nWidth; cx++)
int pixel = (cy * (nWidth * 4)) + (cx * 4);
int nearestMatch = (((int)(cy / scaleHeight) * (WIDTH * 4)) + ((int)(cx / scaleWidth) * 4));
resized_buffer[pixel] = buffer[nearestMatch];
resized_buffer[pixel + 1] = buffer[nearestMatch + 1];
resized_buffer[pixel + 2] = buffer[nearestMatch + 2];
resized_buffer[pixel + 3] = buffer[nearestMatch + 3];
return true;
The problem is that it takes a looong time to do this while scaling them up, even with "small" mazes at 50x50 pixels, when trying to scale them to say 300x300. I've spent a lot of time to make code as efficient and fast as possible, but after i added the scaling, stuff that used to take 10 minutes, now takes hours.
In fltk i use the Fl_Anim_Gif-library to display animated gifs, but it wont load the maze gifs that has been scaled up (still troubleshooting this).
My real questions
Is it possible to improve the scaling function, so that it does not take forever? Or is this a totally wrong approach?
Is it a stupid idea to try to display it as a gif in fltk, would it be easier to just draw it directly in fltk, or should i rather try to display the images one after another i fltk?
I'm just familiarizing myself with fltk. Would it be easier now to use something like Qt instead. Would that be more beneficial in the long run as far as learning a GUI-library goes?
I'm mainly doing this for learning, and to start building some sort of portfolio for when i graduate. Is it beneficial at all to make a gui for this, or is this a waste of time?
Any thoughts or input would be greatly appreciated.
Whatever graphics package you use, the performance will be similar. It depends on how you handle the internals. For instance,
If you write it to a buffer and BLT it to the screen, it would be faster than writing to the screen directly.
If you only BLT on the paint event, it would be faster than forcing and update every time the screen data changes.
If you preallocate the buffers then the system does not have to keep on reallocating whenever the buffer space runs out.
Assuming that the space is preallocated, it can be written to without clearing first. Every cell it going to be written to so no need to clear, allocate and and reallocate.

Create directx9 texture using a portion(s) of an image

i have an image(208x8) and i would like to copy 8x8 squares from it at different areas then join all the squares to create one IDirect3DTexture9*
Depending on exactly what you are trying to do IDirect3DDevice9::UpdateSurface or IDirect3DDevice9::StretchRect might help you.
For simple operations on very small textures like you are describing, it can be advantageous to manipulate them using the CPU (i.e. with IDirect3DTexture9::LockRect). With D3D9 this usually implies that the texture be re-uploaded to VRAM, so it is generally only useful for small or infrequently modified textures. But sometimes if you are render-bound and you are careful about where you update the texture within your loop, it's possible to hide the cost of operations like this and get them "for free".
To avoid the VRAM upload, you can use a POOL_MANAGED resource combined with the appropriate usage and lock flags to situate the resource within the AGP aperture which allows for high-speed access from both the CPU and GPU, see: http://msdn.microsoft.com/en-us/library/windows/desktop/ee418784(v=vs.85).aspx
If you are manipulating on the CPU, be aware of the tiling and alignment restrictions for the various texture formats. The best information about this is within the documentation that comes with the SDK (includes several whitepapers), the online documentation is incomplete.
Here's a basic example:
IDirect3DTexture9* m_tex = getYourTexture();
m_tex->LockRect(0, &outRect, d3dRect, D3DLOCK_DISCARD);
// Stride depends on your texture format - this is the number of bytes per texel.
// Note that this may be less than 1 for DXT1 textures in which case you'll need
// some bit swizzling logic. Can be inferred from Pitch and width.
int stride = 1;
int rowPitch = outRect.Pitch;
// Choose a pointer type that suits your stride.
unsigned char* pixels = (unsigned char*)outRect.pBits;
// Clear to black.
for (int y=0; y < d3dRect.height; ++y)
for (int x=0; x < d3dRect.width; ++x)
pixels[x + rowPitch * y] = 0x0;

C++ Dereferencing char-Pointer (image array) is very slow

I have some trouble getting fast access to an unsigned character array.
I want to actually copy a BGRABGRA....BGRABGRA.... linewise coded image array to the OpenCV-version which uses three layers. The code below works fine but is really slow (around 0.5 seconds for a 640*480 image). I pointed out that the dereferencing operator * makes it slow. Do you have any plan how to fix this? (Hint: BYTE is an unsigned char)
// run thorugh all pixels and copy image data
for (int y = 0; y<imHeight; y++){
BYTE* pLine= vrIm->mp_buffer + y * vrIm->m_pitch;
for (int x = 0; x<imWidth; x++){
BYTE* b= pLine++; // fast pointer operation
BYTE* g= pLine++;
BYTE* r= pLine++;
BYTE* a= pLine++; // (alpha)
BYTE bc = *b; // this is really slow!
BYTE gc = *g; // this is really slow!
BYTE rc = *r; // this is really slow!
Shouldn't be - there is no way that is taking 0.5sec for a 640x480 unless you are doing this on a 8086. Is there some other code you aren't showing? The destination memory doesn't currently go anywhere
ps take a look at cvCvtColor() it uses optimized SSE2/SIMD instructions to do this
What hardware is the memory you're reading located on? Perhaps that device has limited bandwidth to the memory it uses or just has slow RAM. If the memory is shared by many devices there may also be bottle necks on it's access. Try reading the entire screen(?) to local memory using memcpy(), performing your operations on it in local RAM, then writing it back using memcpy(). This will reduce the number of times you must negotiate access to it from 640*480 to 1.

GDI+ gif speed problem

I am using C++ GDI+ to open a gif
however I find the frame interval is really strange.
It is different from played it by window's pic viewer.
The code I written is as follow.
pMultiPageImg = new Bitmap(XXXXX);
int size = m_pMultiPageImg->GetPropertyItemSize(PropertyTagFrameDelay);
m_pTimeDelays = (PropertyItem*) malloc (size);
m_pMultiPageImg->GetPropertyItem(PropertyTagFrameDelay, size, m_pTimeDelays);
int frameSize = m_pMultiPageImg->GetFrameDimensionsCount();();
// the interal of frame FrameNumber:
long lPause = ((long*)m_pTimeDelays->value)[FrameNumber] * 10;
however I found some frame the lPause <= 0.
What does this mean?
And are code I listed right for get the interval?
Many thanks!
The frame duration field in the gif header is only two bytes long (interpreted as 100ths of a second - allowing values from 0 to 32.768 seconds).
You seem to be interpreting it as long, which is probably 4 bytes on your platform so you will be reading another field along with the duration. It is hard to tell from the code you provide, but I think this is the problem.
Frame delays should not be negative numbers. I think the error comes in during the array type conversion or "FrameNumber" goes out of bounds.
GetPropertyItemSize(PropertyTagFrameDelay) returns a native byte array. It'll be safer to convert it to an Int32 array instead of a "long" array. "long" is always 4 bytes long under 32-bit systems, but could be 8 bytes under some 64-bit systems.
m_pMultiPageImg->GetFrameDimensionsCount() returns the number of frame dimensions in the image, not the number of frames. The dimension of the first frame (master image) is usually used in order to get the frame count.
In your case, the code looks like
int count = m_pMultiPageImg->GetFrameDimensionsCount();
GUID* dimensionIDs = new GUID[count];
m_pMultiPageImg->GetFrameDimensionsList(dimensionIDs, count);
int frameCount = m_pMultiPageImg->GetFrameCount(&m_pDimensionIDs[0]);
Hope this helps.