C++ Dereferencing char-Pointer (image array) is very slow - c++

I have some trouble getting fast access to an unsigned character array.
I want to actually copy a BGRABGRA....BGRABGRA.... linewise coded image array to the OpenCV-version which uses three layers. The code below works fine but is really slow (around 0.5 seconds for a 640*480 image). I pointed out that the dereferencing operator * makes it slow. Do you have any plan how to fix this? (Hint: BYTE is an unsigned char)
// run thorugh all pixels and copy image data
for (int y = 0; y<imHeight; y++){
BYTE* pLine= vrIm->mp_buffer + y * vrIm->m_pitch;
for (int x = 0; x<imWidth; x++){
BYTE* b= pLine++; // fast pointer operation
BYTE* g= pLine++;
BYTE* r= pLine++;
BYTE* a= pLine++; // (alpha)
BYTE bc = *b; // this is really slow!
BYTE gc = *g; // this is really slow!
BYTE rc = *r; // this is really slow!
}
}
Thanks!

Shouldn't be - there is no way that is taking 0.5sec for a 640x480 unless you are doing this on a 8086. Is there some other code you aren't showing? The destination memory doesn't currently go anywhere
ps take a look at cvCvtColor() it uses optimized SSE2/SIMD instructions to do this

What hardware is the memory you're reading located on? Perhaps that device has limited bandwidth to the memory it uses or just has slow RAM. If the memory is shared by many devices there may also be bottle necks on it's access. Try reading the entire screen(?) to local memory using memcpy(), performing your operations on it in local RAM, then writing it back using memcpy(). This will reduce the number of times you must negotiate access to it from 640*480 to 1.

Related

Why can't I update an array of pixels using a for loop?

I'm working with a Uint8 array of pixels in SFML, and trying to update them all to appear white/0xFF/255 as a test, but for some reason using a for loop does nothing and I have no idea why, the logic should make perfect sense.
Using a memset() to set every byte to 0xFF works perfectly, but occasionally throws EXC_BAD_ACCESS at me upon running. Setting each of the RGBA of an individual pixel in the array to 0xFF works perfectly, I get a white dot on my screen at the correct location. But using a for loop to set every pixel to 0xFF does nothing, no error, but no result, which makes no sense.
// Create buffer
sf::Uint8 *buffer = new sf::Uint8[SCREEN_WIDTH*SCREEN_HEIGHT*4];
for(int i; i < SCREEN_WIDTH*SCREEN_HEIGHT*4; i+=4) {
buffer[i] = 0xFF;
buffer[i+1] = 0xFF;
buffer[i+2] = 0xFF;
buffer[i+3] = 0xFF;
}
Logically this for loop should work perfectly, but it does not, when I run this I have a black screen with some green dots spread around the middle (garbage from the memory locations being used). If anyone can explain to me why this is happening and how to fix it I would greatly appreciate it!
for(int i; i < SCREEN_WIDTH*SCREEN_HEIGHT*4; i+=4) {
You never set an initial value for i here, so it will have some indeterminate value, and in practice probably one larger enough for it to never loop at all. c and c++ do not initialise local primitive types by default, you must set a value.
for(int i = 0; i < SCREEN_WIDTH*SCREEN_HEIGHT*4; i+=4) {
As for EXC_BAD_ACCESS, you must have passed the wrong memory address or size to memset. Maybe another uninitialised variable?
In C/C++ accessing memory outside of an object often has no facility to catch the error (unlike many other languages that will range check every array access to give say a IndexOutOfRangeException), and it will just overwrite some random bytes, and maybe if you are lucky that is a completely invalid memory location and the OS/processor raises an error.

Loading a 3D byte array from a .raw file

As in my previous question, I'm interested in loading a .raw file of a volume dataset into a byte array. I think using a 3D byte array would make things easier when indexing the X,Y,Z coordinates, but I'm not sure about the read size that I should use to load the volume. Would this size declaration allow me to index the volume data correctly?
int XDIM=256, YDIM=256, ZDIM=256;
const int size = XDIM*YDIM*ZDIM;
bool LoadVolumeFromFile(const char* fileName) {
FILE *pFile = fopen(fileName,"rb");
if(NULL == pFile) {
return false;
}
GLubyte* pVolume=new GLubyte[XDIM][YDIM][ZDIM];
fread(pVolume,sizeof(GLubyte),size,pFile); // <-is this size ok?
fclose(pFile);
From the code you posted the fread() call appears to be safe, but consider if a 3D byte array is the best choice of a data structure.
I assume you are doing some kind of rendering as you are using GLubyte. And of course to do any rendering you need to access a vertex defined in 3D space. That will lead to:
pVolume[vertIndex][vertIndex][vertIndex]
This will constantly cause your cahce to be thrashed. The memory will be laid out will all the xs first, then all the ys, and then all the zs. Thus, each time you jump from an x to y to z you may hit a cache miss and really slow perf.

random access to buffer optimisation

I have colorBuffer Color[width*height] (most likely 800*600)
and during rasterization I call:
void setPixel(int x, int y, Color & color)
{
colorBuffer[y * width + x] = color;
}
It turns out that this random access to color buffer is really ineffective and slows my application down.
I think that it is caused the way I use it. I calculate some pixel (with rasterization algorithms) and call setPixel.
So I think my buffer is not in cache and this is the main problem. When trying to write into the whole buffer at once, it is much much faster.
Is there any way, how to optimize this?
edit
I do not use it to fill buffer with two for cycles.
I use it to paint "random" pixels.
eg when rasterize line I use it like
setPixel(10,10);
calculate next point
setPixel(10,11);
calculate next point
setPixel(next point)
...
They way I see it, the access-pattern to the buffer depends in the order in which your algorithm processes the pixels. Can you not simply change that order so that it creates a sequential access-scheme to your buffer?
Yes, you should try to be cache-friendly,
but the first thing I would do is find out what's taking time.
It's simple enough. Just pause it several times and see what it's doing.
If it's mostly in calculate next point, you should see what it's doing in there, because that's where the time is going.
(I assume you understand that by "in" I mean "on the stack".)
If it's mostly in SetPixel, when you pause it, look at the disassembly window.
If it's spending much time in the prologue/epilogue of the routine, it should be inlined.
If it's spending much time in the actual move instruction into colorBuffer, then you're hitting the cache issue.
If it's spending much time in the code for the index calculation y * width + x, then you might want to see if you could somehow use an initialized pointer that you step along.
If you fix anything, you should do it all again, because you may have uncovered another opportunity to speed it up further.
The first thing to notice is that the way you process your pixels makes a huge difference to speed. If you do
for (int x = 0; x < width;++x)
{
for (int y = 0; y < height; ++y)
{
setPixel(x,y,Color());
}
}
this will be really bad for performance because you're literally jumping around in memory width-wise (note that you do y*width + x).
If you simply change the order of processing to
for (int y = 0; y < height;++y)
{
for (int x = 0; x < width; ++x)
{
setPixel(x,y,Color());
}
}
you already should notice a performance gain as the processor now gets a chance to cache memory accesses (which it didn't before).
Furthermore you should check if you can determine that entire blocks of pixels will have the same color value before actually setting the memory. Then you can copy those constant color values block-wise to your image array which can save you also a good deal of performance.

Read From Media Buffer - Pointer Arithmetic C++ Syntax

This may well have come up before but the following code is taken from an MSDN example I am modifying. I want to know how I can iterate through the contents of the buffer which contains data about a bitmap and print out the colors. Each pixel is 4 bytes of data so I am assuming the R G B values account for 3 of these bytes, and possibly A is the 4th.
What is the correct C++ syntax for the pointer arithmetic required (ideally inside a loop) that will store the value pointed to during that iteration in to a local variable that I can use, eg. print to the console.
Many thanks
PS. Is this safe? Or is there a safer way to read the contents of an IMFMediaBuffer? I could not find an alternative.
Here is the code:
hr = pSample->ConvertToContiguousBuffer(&pBuffer); // this is the BitmapData
// Converts a sample with multiple buffers into a sample with a single IMFMediaBuffer which we Lock in memory next...
// IMFMediaBuffer represents a block of memory that contains media data
hr = pBuffer->Lock(&pBitmapData, NULL, &cbBitmapData); // pBuffer is IMFMediaBuffer
/* Lock method gives the caller access to the memory in the buffer, for reading or writing:
pBitmapData - receives a pointer to start of buffer
NULL - receives the maximum amount of data that can be written to the buffer. This parameter can be NULL.
cbBitmapData - receives the length of the valid data in the buffer, in bytes. This parameter can be NULL.
*/
I solved the problem myself and thought it best to add the answer here so that it formats correctly and maybe others will benefit from it. Basically in this situation we use 32 bits for the image data and what is great is that we are reading raw from memory so there is not yet a Bitmap header to skip because this is just raw color information.
NOTE: Across these 4 bytes we have (from bit 0 - 31) B G R A, which we can verify by using my code:
int x = 0;
while(x < cbBitmapData){
Console::Write("B: {0}", (*(pBitmapData + x++)));
Console::Write("\tG: {0}", (*(pBitmapData + x++)));
Console::Write("\tR: {0}", (*(pBitmapData + x++)));
Console::Write("\tA: {0}\n", (*(pBitmapData + x++)));
}
From the output you will see that the A value is 0 for each pixel because there is no concept of transparency or depth here, which is what we expect.
Also to verify that all we have in the buffer is raw image data and no other data I used this calculation which you may also find of use:
Console::Write("no of pixels in buffer: {0} \nexpected no of pixels based on dimensions:{1}", (cbBitmapData/4), (m_format.imageWidthPels * m_format.imageHeightPels) );
Where we divide the value of cbBitmapData by 4 because it is a count of the bytes, and as aforementioned for each pixel we have a width of 4 bytes (32-bit DWORDS in actual fact because the length of a byte is not always strictly uniform across hardware apparently!?). We compare this to the image width multiplied by its height. They are equal and thus we have just pixel color information in the buffer.
Hope this helps someone.

GDI+ gif speed problem

I am using C++ GDI+ to open a gif
however I find the frame interval is really strange.
It is different from played it by window's pic viewer.
The code I written is as follow.
pMultiPageImg = new Bitmap(XXXXX);
int size = m_pMultiPageImg->GetPropertyItemSize(PropertyTagFrameDelay);
m_pTimeDelays = (PropertyItem*) malloc (size);
m_pMultiPageImg->GetPropertyItem(PropertyTagFrameDelay, size, m_pTimeDelays);
int frameSize = m_pMultiPageImg->GetFrameDimensionsCount();();
// the interal of frame FrameNumber:
long lPause = ((long*)m_pTimeDelays->value)[FrameNumber] * 10;
however I found some frame the lPause <= 0.
What does this mean?
And are code I listed right for get the interval?
Many thanks!
The frame duration field in the gif header is only two bytes long (interpreted as 100ths of a second - allowing values from 0 to 32.768 seconds).
You seem to be interpreting it as long, which is probably 4 bytes on your platform so you will be reading another field along with the duration. It is hard to tell from the code you provide, but I think this is the problem.
Frame delays should not be negative numbers. I think the error comes in during the array type conversion or "FrameNumber" goes out of bounds.
GetPropertyItemSize(PropertyTagFrameDelay) returns a native byte array. It'll be safer to convert it to an Int32 array instead of a "long" array. "long" is always 4 bytes long under 32-bit systems, but could be 8 bytes under some 64-bit systems.
m_pMultiPageImg->GetFrameDimensionsCount() returns the number of frame dimensions in the image, not the number of frames. The dimension of the first frame (master image) is usually used in order to get the frame count.
In your case, the code looks like
int count = m_pMultiPageImg->GetFrameDimensionsCount();
GUID* dimensionIDs = new GUID[count];
m_pMultiPageImg->GetFrameDimensionsList(dimensionIDs, count);
int frameCount = m_pMultiPageImg->GetFrameCount(&m_pDimensionIDs[0]);
Hope this helps.