Fastest Conversion of Row-Ordered data to Column-Ordered data - c++

I have an IplImage from openCV, which stores its data in a row-ordered format;
image data is stored in a one dimensional array char *data; the element at position x,y is given by
elem(x,y) = data[y*width + x] // see note at end
I would like to convert this image as quickly as possible to and from a second image format that stores its data in column-ordered format; that is
elem(x,y) = data[x*height + y]
Obviously, one way to do this conversion is simply element-by-element through a double for loop.
Is there a faster way?
note for openCV afficionados, the actual location of elem(x,y) is given by data + y*widthstep + x*sizeof(element) but this gives the general idea, and for char data sizeof(element) = 1 and we can make widthstep = width, so the formula is exact

It is called "matrix transposition"
Optimal methods try to minimise the number of cache misses, swapping small tiles
with the size of one or a few cache slots. For a multi-level cache this will get difficult.
start reading here
this one is a bit more advanced
BTW the urls deal with "in place" transposition. Creating a transposed copy will be different (it uses twice as many cache slots, duh!)

Assuming you need a new array that has the elements all moved, the fastest you can manage in algorithmic speed is O(N) on the number of elements (i.e. width * height).
For actual time taken, it is possible to spawn multiple threads where each one copies some of the elements. This is only worthwhile of course if you really do have a lot of them.
If the threads are already created and they accept the tasks in queues, or whatever, this would be most efficient if you are going to process lots of these images.
within your individual "loops" you can avoid doing the same multiplication multiple times, of course, and pointer arithmetic is likely to be a bit faster than random-access.

You've kind of answered yourself but without a code. I think you need sth like:
typedef struct
{
unsigned char r;
unsigned char g;
unsigned char b;
}somePixelFormat;
#define HEIGHT 2
#define WIDTH 4
// let's say this is original image width=4 height=2 expresed as one dimentional
// array of structs that adhere to your pixel format
somePixelFormat src[ WIDTH * HEIGHT ] =
{
{0,0,0}, {1,1,1}, {2,2,2}, {3,3,3},
{4,4,4}, {5,5,5}, {6,6,6}, {7,7,7}
};
somePixelFormat dst[ WIDTH * HEIGHT ];
void printImage( void *img, int width, int height, int pixelByteCount )
{
for ( int row = 0; row < height; row++ )
{
for ( int col = 0; col < width; col++ )
{
printf( "(%02d,%02d,%02d) ", ((somePixelFormat*)img + width * row + col)->r,
((somePixelFormat*)img + width * row + col)->g,
((somePixelFormat*)img + width * row + col)->b );
}
printf ( "\n" );
}
printf("\n\n");
}
void flip( void *dstImg, void *srcImg, int srcWidth, int srcHeight, int pixelByteCount )
{
for ( int row = 0; row < srcHeight; row++ )
{
for ( int col = 0; col < srcWidth; col++ )
{
*((somePixelFormat*)dstImg + srcHeight * col + row) = *((somePixelFormat*)srcImg + srcWidth * row + col);
}
}
}
int main()
{
printImage( src, 4, 2, sizeof(somePixelFormat) );
flip( dst, src, 4, 2, sizeof(somePixelFormat) );
printImage( dst, 2, 4, sizeof(somePixelFormat) );
getchar();
return 0;
}
And here's example output:
(00,00,00) (01,01,01) (02,02,02) (03,03,03)
(04,04,04) (05,05,05) (06,06,06) (07,07,07)
(00,00,00) (04,04,04)
(01,01,01) (05,05,05)
(02,02,02) (06,06,06)
(03,03,03) (07,07,07)

Related

Optimize image buffer

Here is a code that decodes a WebM frame and put them in a buffer
image->planes[p] = pointer to the top left pixel
image->linesize[p] = strides betwen rows
framesArray = vector of unsigned char*
while ( videoDec->getImage(*image) == VPXDecoder::NO_ERROR)
{
const int w = image->getWidth(p);
const int h = image->getHeight(p);
int offset = 0;
for (int y = 0; y < h; y++)
{
// fwrite(image->planes[p] + offset, 1, w, pFile);
for(int i=0;i<w;i++){
framesArray.at(count)[i+(w*y)] = *(image->planes[p]+offset+ i) ;
}
offset += image->linesize[p];
}
}
.............................
How can I write intro buffer line by line not pixel by pixel or optimize the writing of frame intro buffer?
if the source image and destination buffer share the same Width, Height and bit per pixel, you can use std::copy to copy the whole image into it.
std::copy(image->planes[p] + offset, image->planes[p] + (image->getHeight(p) * image->linesize[p], framesArray.begin()) ;
if it is same bit per pixel but different width and height, you can use std::copy by line.

How to speed up this GSL code for selecting a submatrix?

I wrote a very simple function in GSL, to select a submatrix from an existing matrix in a struct.
EDIT: I had timed VERY INCORRECTLY and didn't notice the changed number of zeros in front.Still, I hope this can be sped up
For 100x100 submatrices of a 10000x10000 matrix, it takes 1.2E-5 seconds. So, repeating that 1E4 times, takes 50 times longer than I need to diagonalise the 100x100 matrix.
EDIT:
I realise, it happens even if I comment out everything except return(0);
Thus, I theorize, it must be something about struct TOWER. This is how TOWER looks:
struct TOWER
{
int array_level[TOWERSIZE];
int array_window[TOWERSIZE];
gsl_matrix *matrix_ordered_covariance;
gsl_matrix *matrix_peano_covariance;
double array_angle_tw[XISTEP];
double array_correl_tw[XISTEP];
gsl_interp_accel *acc_correl; // interpolating for correlation
gsl_spline *spline_correl;
double array_all_eigenvalues[TOWERSIZE]; //contains all eiv. of whole matrix
std::vector< std::vector<double> > cropped_peano_covariance, peano_mask;
};
Below comes my function!
/* --- --- */
int monolevelsubmatrix(int i, int j, struct TOWER *tower, gsl_matrix *result) //relying on spline!! //must addd auto vanishing
{
int firstrow, firstcol,mu,nu,a,b;
double aux, correl;
firstrow = helix*i;
firstcol = helix*j;
gsl_matrix_view Xi = gsl_matrix_submatrix (tower ->matrix_ordered_covariance, firstrow, firstcol, helix, helix);
gsl_matrix_memcpy (result, &(Xi.matrix));
return(0);
}
/* --- --- */
The problem is almost certainly gls_matric_memcpy. The source for that is in copy_source.c, with:
const size_t src_tda = src->tda ;
const size_t dest_tda = dest->tda ;
size_t i, j;
for (i = 0; i < src_size1 ; i++)
{
for (j = 0; j < MULTIPLICITY * src_size2; j++)
{
dest->data[MULTIPLICITY * dest_tda * i + j]
= src->data[MULTIPLICITY * src_tda * i + j];
}
}
This would be quite slow. Note that gls_matrix_memcpy returns a GLS_ERROR if the matrices are different sizes, so it's very likely the data member could be served with a CRT memcpy on the data members of dest and src.
This loop is very slow. Each cell is derefence through dest & src structs for the data member, and THEN indexed.
You could choose to write a replacement for the library, or write your own personal version of this matrix copy, with something like (untested suggestion code here):
unsigned int cellsize = sizeof( src->data[0] ); // just psuedocode here
memcpy( dest->data, src->data, cellsize * src_size1 * src_size2 * MULTIPLICITY )
Note that MULTIPLICITY is a define, usually 1 or 2, probably depends on library configuration - might not apply to your usage (if it's 1 )
Now, important caveat....if the source matrix is a subview, then you have to go by rows...that is, a loop of rows in i where crt's memcpy is limited to rows at a time, not the entire matrix as I show above.
In other words, you do have to account for the source matrix geometry from which the subview was taken...that's probably why they index each cell (makes it simple).
If, however, you KNOW the geometry, you can very likely optimize this WAY above the performance you're seeing.
If all you did was take out the src/dest derefence, you'd see SOME performance gain, as in:
const size_t src_tda = src->tda ;
const size_t dest_tda = dest->tda ;
size_t i, j;
float * dest_data = dest->data; // psuedocode here
float * src_data = src->data; // psuedocode here
for (i = 0; i < src_size1 ; i++)
{
for (j = 0; j < MULTIPLICITY * src_size2; j++)
{
dest_data[MULTIPLICITY * dest_tda * i + j]
= src_data[MULTIPLICITY * src_tda * i + j];
}
}
We'd HOPE the compiler recognized that anyway, but...sometimes...

reading TGA files in OpenGl to create a 3d ouse

I have a TGA file and a library that allready has everything that I need to read TGA and use them.
This class has a method called pixels(), that returns a pointer that is pointed to the memory area where pixel are stored as RGBRGBRGB...
My question is, how can I take the pixel value?
Cause if I make something like this:
img.load("foo.tga");
printf ("%i", img.pixels());
It gives back to me what is proprably the address.
I've found this code on this site:
struct Pixel2d
{
static const int SIZE = 50;
unsigned char& operator()( int nCol, int nRow, int RGB)
{
return pixels[ ( nCol* SIZE + nRow) * 3 + RGB];
}
unsigned char pixels[SIZE * SIZE * 3 ];
};
int main()
{
Pixel2d p2darray;
glReadPixels(50,50, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, &p.pixels);
for( int i = 0; i < Pixel2d::SIZE ; ++i )
{
for( int j = 0; j < Pixel2d::SIZE ; ++j )
{
unsigned char rpixel = p2darray(i , j , 0);
unsigned char gpixel = p2darray(i , j , 1);
unsigned char bpixel = p2darray(i , j , 2);
}
}
}
I think that It can work great for me, but how can I tell the program to read from my img?
Tga supports different pixel depths. And we don't know what library you're using. But generally speaking pixels() should return a pointer to a buffer containing pixels. Say for sake of argument it unpacks the pixels into 8-bit per channel subpixels, then each pixel is represented by 3 bytes.
So to access a pixel at a given offset in the buffer:
const u8* pixelBuffer = img.pixels():
u8 red = pixelBuffer[(offset*3)+0];
u8 green = pixelBuffer[(offset*3)+1];
u8 blue = pixelBuffer[(offset*3)+2];
If you know the width of the image buffer then you can get a pixel by its x and y coordinates:
u8 red = pixelBuffer[((x+(y*width))*3)+0];

Reorganize image/picture arrays in OpenGL to fit power of 2 textures size

I am having troubles in OpenGL due to the fact that textures have to be power of 2 in OpenGL.
What I am doing is the following:
I Load a PNG file into an array of unsigned char, using PNGLIB or SOIL. The idea is that I can run though this array and "Select" the parts that are relevant for me. For example, imagining I've loaded a person, but I just want to store the head in a separate texture. So im looping through the array and selecting only the necessary parts.
First Question: I believe that the data in the array is stored in RGBA mode, but I'm yet not sure if the data is filled rowise or columnwise. Is it possible to know this information?
Second Question: Since there is the need to always create power of 2 textures, it can happen that i have an image with 513pixels width so that I will need a texture with 1024px width. So what is happening is that the picture looks like it gets completly "destroyed" because the pixels are not on the places they should be - The texture has a different size than the relevant data filled in the array. So how can I manage to reorganize the array in order to get the contents of the image again? I tried the following but it doesn't work:
unsigned char* new_memory = 0;
int index = 0;
int new_index = 0;
new_memory = new unsigned char[new_tex_width * new_tex_height * 4];
for(int i=0; i<picture.width; i++) // WIDTH
{
for(int j=0; j<picture.height; j++) // HEIGHT
{
for(int k=0; k<4; k++) // DEPTH
new_memory[new_index++] = picture.memory[index++];//picture.memory[i + picture.height * (j + 4 * k)];
}
new_index += new_tex_height - picture.height;
}
glGenTextures(1, &png_texture);
glBindTexture(GL_TEXTURE_2D, png_texture);
glTexImage2D(GL_TEXTURE_2D, 0, 3, new_tex_width, new_tex_height, 0 , GL_RGBA, GL_UNSIGNED_BYTE, new_memory);
Non power of two textures has been supported since a good while back. However, creating textures atlases and rearranging textures still have a lot of merit, the way we do it is to simply use freeimage as they handle all of this for you and supports some of the compressed formats.
If you want to do it your way, and know that it's just a bitmap, then I'd do it more along the lines of ( not tested, and does not check inputs, but should give you an idea ):
void Blit( int xOffset, int yOffset, int targetW, int sourceW, int sourceH, unsigned char* source, unsigned char* target, unsigned int bpp )
{
for( unsigned int i = 0; i < sourceH; ++i )
{
memcpy( target + bpp * ( targetW * ( yOffset + i ) + xOffset ), source + sourceW * i * bpp, sourceW * bpp );
}
}
Basically, just take each row and memcpy it over.

glReadPixels store x, y values

I'm trying to store pixel data by using glReadPixels, but so far I managed to only store it one pixel at a time. I'm not sure if this is the way to go. I currently have this:
unsigned char pixels[3];
glReadPixels(50,50, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, pixels);
What would be a good way to store it in an array, so that I can get the values like this:
pixels[20][50][0]; // x=20 y=50 -> R value
pixels[20][50][1]; // x=20 y=50 -> G value
pixels[20][50][2]; // x=20 y=50 -> B value
I guess I could simple put it in a loop:
for ( all pixels on Y axis )
{
for ( all pixels in X axis )
{
unsigned char pixels[width][height][3];
glReadPixels(x,y, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, pixels[x][y]);
}
}
But I have the feeling that there must be a much better way to do this. But I do however need my array to be like I described above the code. So would the for loop idea be good, or is there a better way?
glReadPixels simply returns bytes in the order R, G, B, R, G, B, ... (based on your setting of GL_RGB) from the bottom left of the screen going up to the top right. From the OpenGL documentation:
glReadPixels returns pixel data from the frame buffer, starting with
the pixel whose lower left corner is at location (x, y), into client
memory starting at location data. Several parameters control the
processing of the pixel data before it is placed into client memory.
These parameters are set with three commands: glPixelStore,
glPixelTransfer, and glPixelMap. This reference page describes the
effects on glReadPixels of most, but not all of the parameters
specified by these three commands.
The overhead of calling glReadPixels thousands of times will most likely take a noticeable amount of time (depends on the window size, I wouldn't be surprised if the loop took 1-2 seconds).
It is recommended that you only call glReadPixels once and store it in a byte array of size (width - x) * (height - y) * 3. From there you can either reference a pixel's component location with data[(py * width + px) * 3 + component] where px and py are the pixel locations you want to look up, and component being the R, G, or B components of the pixel.
If you absolutely must have it in a 3-dimensional array, you can write some code to rearrange the 1d array after the glReadPixels call.
If you'll define pixel array like: this:
unsigned char pixels[MAX_Y][MAX_X][3];
And the you'll access it like this:
pixels[y][x][0] = r;
pixels[y][x][1] = g;
pixels[y][x][2] = b;
Then you'll be able to read pixels with one glReadPixels call:
glReadPixels(left, top, MAX_Y, MAX_X, GL_RGB, GL_UNSIGNED_BYTE, pixels);
What you can do is declare a simple one dimensional array in a struct and use operator overloading for convenient subscript notation
struct Pixel2d
{
static const int SIZE = 50;
unsigned char& operator()( int nCol, int nRow, int RGB)
{
return pixels[ ( nCol* SIZE + nRow) * 3 + RGB];
}
unsigned char pixels[SIZE * SIZE * 3 ];
};
int main()
{
Pixel2d p2darray;
glReadPixels(50,50, 1, 1, GL_RGB, GL_UNSIGNED_BYTE, &p.pixels);
for( int i = 0; i < Pixel2d::SIZE ; ++i )
{
for( int j = 0; j < Pixel2d::SIZE ; ++j )
{
unsigned char rpixel = p2darray(i , j , 0);
unsigned char gpixel = p2darray(i , j , 1);
unsigned char bpixel = p2darray(i , j , 2);
}
}
}
Here you are reading a 50*50 pixel in one shot and using operator()( int nCol, int nRow, int RGB) operator provides the needed convenience. For performance reasons you don't want to make too many glReadPixels calls