Pixel data unpacking to smaller sections - c++

I'm trying to write a function that unpacks an image into separate quads. But for some reason the results are distorted (kinda stretched 45 degrees), so I must be reading the pixel array incorrectly, though I can't see the problem with my function...
The function takes 2 unsigned char arrays, "source" and "target" and two unsigned int values, the "width" and "height" of the source image. Width is dividable by 4, and height is dividable by 3 (both return the same value, because the texture is 600 * 450) so each face is 150*150 px. So the w/h values are correct. Then it also takes in 2 ints, "xIt" and "yIt" which determine the offset - which 150*150 block should be read.
Here's the function:
const unsigned int trgImgWidth = width / 4;
const unsigned int trgImgHeight = height / 3;
unsigned int trgBufferOffset = 0;
// Compute pixel offset to start reading from
unsigned int Yoffset = yIt * trgImgHeight * width * 3;
unsigned int Xoffset = xIt * trgImgWidth * 3;
for (unsigned int y = 0; y < trgImgHeight; y++)
{
unsigned int o = Yoffset + Xoffset; // Offset of current line of pixels
for (unsigned int x = 0; x < trgImgWidth * 3; x++) // for each pixel component (rgb) in the line
{
target[trgBufferOffset] = source[o + x];
trgBufferOffset++;
}
Yoffset += width * 3;
}
Anyone see where I might be going wrong here?

Related

Broken BMP when save bitmap by SOIL. Screenshot area

This is continuation of my last question about saving screenshot to SOIL .here Now I wonder, how to make screenshot of part of screen and eliminate the reason that strange behaviour. My code:
bool saveTexture(string path, glm::vec2 startPos, glm::vec2 endPos)
{
const char *charPath = path.c_str();
GLuint widthPart = abs(endPos.x - startPos.x);
GLuint heightPart = abs(endPos.y - startPos.y);
BITMAPINFO bmi;
auto& hdr = bmi.bmiHeader;
hdr.biSize = sizeof(bmi.bmiHeader);
hdr.biWidth = widthPart;
hdr.biHeight = -1.0 * heightPart;
hdr.biPlanes = 1;
hdr.biBitCount = 24;
hdr.biCompression = BI_RGB;
hdr.biSizeImage = 0;
hdr.biXPelsPerMeter = 0;
hdr.biYPelsPerMeter = 0;
hdr.biClrUsed = 0;
hdr.biClrImportant = 0;
unsigned char* bitmapBits = (unsigned char*)malloc(3 * widthPart * heightPart);
HDC hdc = GetDC(NULL);
HDC hBmpDc = CreateCompatibleDC(hdc);
HBITMAP hBmp = CreateDIBSection(hdc, &bmi, DIB_RGB_COLORS, (void**)&bitmapBits, nullptr, 0);
SelectObject(hBmpDc, hBmp);
BitBlt(hBmpDc, 0, 0, widthPart, heightPart, hdc, startPos.x, startPos.y, SRCCOPY);
//UPDATE:
- int bytes = widthPart * heightPart * 3;
- // invert R and B chanels
- for (unsigned i = 0; i< bytes - 2; i += 3)
- {
- int tmp = bitmapBits[i + 2];
- bitmapBits[i + 2] = bitmapBits[i];
- bitmapBits[i] = tmp;
- }
+ unsigned stride = (widthPart * (hdr.biBitCount / 8) + 3) & ~3;
+ // invert R and B chanels
+ for (unsigned row = 0; row < heightPart; ++row) {
+ for (unsigned col = 0; col < widthPart; ++col) {
+ // Calculate the pixel index into the buffer, taking the
alignment into account
+ const size_t index{ row * stride + col * hdr.biBitCount / 8 };
+ std::swap(bitmapBits[index], bitmapBits[index + 2]);
+ }
+ }
int texture = SOIL_save_image(charPath, SOIL_SAVE_TYPE_BMP, widthPart, heightPart, 3, bitmapBits);
return texture;
}
When I run this if widthPart and heightPart is even number, that works perfect. But if something from this is odd number I get this BMP's.:
I checked any converting and code twice, but it seems to me the reason is in my wrong blit functions. Function of converting RGB is not affect on problem. What can be a reason? It's the right way blitting of area in BitBlt ?
Update No difference even or odd numbers. Correct picture produces when this numbers is equal. I don't know where is a problem.((
Update2
SOIL_save_image functions check parameters for errors and send to stbi_write_bmp:
int stbi_write_bmp(char *filename, int x, int y, int comp, void *data)
{
int pad = (-x*3) & 3;
return outfile(filename,-1,-1,x,y,comp,data,0,pad,
"11 4 22 4" "4 44 22 444444",
'B', 'M', 14+40+(x*3+pad)*y, 0,0, 14+40, // file header
40, x,y, 1,24, 0,0,0,0,0,0); // bitmap header
}
outfile function:
static int outfile(char const *filename, int rgb_dir, int vdir, int x, int
y, int comp, void *data, int alpha, int pad, char *fmt, ...)
{
FILE *f = fopen(filename, "wb");
if (f) {
va_list v;
va_start(v, fmt);
writefv(f, fmt, v);
va_end(v);
write_pixels(f,rgb_dir,vdir,x,y,comp,data,alpha,pad);
fclose(f);
}
return f != NULL;
}
The broken bitmap images are the result of a disagreement of data layout between Windows bitmaps and what the SOIL library expects1. The pixel buffer returned from CreateDIBSection follows the Windows rules (see Bitmap Header Types):
The scan lines are DWORD aligned [...]. They must be padded for scan line widths, in bytes, that are not evenly divisible by four [...].
In other words: The width, in bytes, of each scanline is (biWidth * (biBitCount / 8) + 3) & ~3. The SOIL library, on the other hand, doesn't expect pixel buffers to be DWORD aligned.
To fix this, the pixel data needs to be converted before being passed to SOIL, by stripping (potential) padding and exchanging the R and B color channels. The following code does so in-place2:
unsigned stride = (widthPart * (hdr.biBitCount / 8) + 3) & ~3;
for (unsigned row = 0; row < heightPart; ++row) {
for (unsigned col = 0; col < widthPart; ++col) {
// Calculate the source pixel index, taking the alignment into account
const size_t index_src{ row * stride + col * hdr.biBitCount / 8 };
// Calculate the destination pixel index (no alignment)
const size_t index_dst{ (row * width + col) * (hdr.biBitCount / 8) };
// Read color channels
const unsigned char b{ bitmapBits[index_src] };
const unsigned char g{ bitmapBits[index_src + 1] };
const unsigned char r{ bitmapBits[index_src + 2] };
// Write color channels switching R and B, and remove padding
bitmapBits[index_dst] = r;
bitmapBits[index_dst + 1] = g;
bitmapBits[index_dst + 2] = b;
}
}
With this code, index_src is the index into the pixel buffer, which includes padding to enforce proper DWORD alignment. index_dst is the index without any padding applied. Moving pixels from index_src to index_dst removes (potential) padding.
1 The tell-tale sign is scanlines moving to the left or right by one or two pixels (or individual color channels at different speeds). This is usually a safe indication, that there is a disagreement of scanline alignment.
2 This operation is destructive, i.e. the pixel buffer can no longer be passed to Windows GDI functions once converted, although the original data can be reconstructed, even if a bit more involved.

cuda multiple image erosion not work

i'm trying to implement multiple black(0) and white(255) image erosion with cuda,i use a square (5x5)structure element.The kernel that i had implemented take an unsigned char array buffer in which are stored nImg images 200X200 px . To allow erosion of multiple image simultaneosly i make a grid with 3D structure:
each block has the dimension of the strel (5x5)
the grid has height = image_height/blockDim.y , width = image_width/blockDim.x , z = nImg
i've try to implement it extending that sample.
the problem is that if i store the pixels that a block of threads consider into a shared buffer shared between the threads of the block;
to allow fast memory access, the algorithm doesn't work properly.I try to change the bindex that for me make mistake,but i cannot found a solution.
any suggestion?
here's my code:
//strel size
#define STREL_W 5
#define STREL_H 5
// distance from the cente of strel to strel width or height
#define R (STREL_H/2)
//size of the 2D region that each block consider i.e all the neighborns that each thread in a block consider
#define BLOCK_W (STREL_W+(2*R))
#define BLOCK_H (STREL_H+(2*R))
__global__ void erode_multiple_img_SM(unsigned char * buffer_in,
unsigned char * buffer_out,
int w,
int h ){
//array stored in shared memory,that contain all pixel neighborns that each thread in a block consider
__shared__ unsigned char fast_acc_arr[BLOCK_W*BLOCK_H];
// map thread in a 3D structure
int col = blockIdx.x * STREL_W + threadIdx.x -R ;
int row = blockIdx.y * STREL_H + threadIdx.y -R ;
int plane = blockIdx.z * blockDim.z + threadIdx.z;
// check if a foreground px of strel is not contain in a region of the image with size of strel (if only one px is not contain the image is eroded)
bool is_contain = true;
// clamp to edge of image
col = max(0,col);
col = min(col,w-1);
row = max(0,row);
row = min(row,h-1);
//map each thread in one dim coord to map 3D structure(grid) with image buffer(1D)
unsigned int index = (plane * h * w) + (row * w) + col;
unsigned int bindex = threadIdx.y * blockDim.y + threadIdx.x;
//each thread copy its pixel of the block to shared memory (shared with thread of a block)
fast_acc_arr[bindex] = buffer_in[index];
__syncthreads();
//the strel must be contain in image, thread.x and thread.y are the coords of the center of the mask that correspond to strel in image, and it must be contain in image
if((threadIdx.x >= R) && (threadIdx.x < BLOCK_W-R) && (threadIdx.y >= R) && (threadIdx.y <BLOCK_H-R)){
for(int dy=-R; dy<=R; dy++){
if(is_contain == false)
break;
for (int dx = -R ; dx <= R; dx++) {
//if only one element in mask is different from the value of strel el --> the strel is not contain in the mask --> the center of the mask is eroded (and it's no necessary to consider the other el of the mask this is the motivation of the break)
if (fast_acc_arr[bindex + (dy * blockDim.x) + dx ] != 255 ){
buffer_out[index ] = 0;
is_contain = false;
break;
}
}
}
// if the strel is contain into the image the the center is not eroded
if(is_contain == true)
buffer_out[index] = 255;
}
}
that are my kernel settings:
dim3 block(5,5,1);
dim3 grid(200/(block.x),200/(block.y),nImg);
my kernel call:
erode_multiple_img_SM<<<grid,block>>>(dimage_src,dimage_dst,200,200);
my image input and output:
input: output(150 buff element):
code without shared memory(low speed):
__global__ void erode_multiple_img(unsigned char * buffer_in,
unsigned char * buffer_out,
int w,int h ){
int col = blockIdx.x * blockDim.x + threadIdx.x;
int row = blockIdx.y * blockDim.y + threadIdx.y;
int plane = blockIdx.z * blockDim.z +threadIdx.z;
bool is_contain = true;
col = max(0,col);
col = min(col,w-1);
row = max(0,row);
row = min(row,h-1);
for(int dy=-STREL_H/2; dy<=STREL_H/2; dy++){
if(is_contain == false)
break;
for (int dx = -STREL_W/2 ; dx <= STREL_W/2; dx++) {
if (buffer_in[(plane * h * w) +( row + dy) * w + (col + dx) ] !=255 ){
buffer_out[(plane * h * w) + row * w + col ] = 0;
is_contain = false;
break;
}
}
}
if(is_contain == true)
buffer_out[(plane * h * w) + row * w +col ] = 255;
}
UPDATED ALGORITHM
i try to follow that samples to do convolution.I change the input image, now has 512x512 size and i wrote that algorithm:
#define STREL_SIZE 5
#define TILE_W 16
#define TILE_H 16
#define R (STREL_H/2)
#define BLOCK_W (TILE_W+(2*R))
#define BLOCK_H (TILE_H+(2*R))
__global__ void erode_multiple_img_SM_v2(unsigned char * buffer_in,
unsigned char * buffer_out,
int w,int h ){
// Data cache: threadIdx.x , threadIdx.y
__shared__ unsigned char data[TILE_W +STREL_SIZE ][TILE_W +STREL_SIZE ];
// global mem address of this thread
int col = blockIdx.x * blockDim.x + threadIdx.x;
int row = blockIdx.y * blockDim.y + threadIdx.y;
int plane = blockIdx.z * blockDim.z +threadIdx.z;
int gLoc = (plane*h/w)+ row*w +col;
bool is_contain = true;
// load cache (32x32 shared memory, 16x16 threads blocks)
// each threads loads four values from global memory into shared mem
int x, y; // image based coordinate
if((col<w)&&(row<h)) {
data[threadIdx.x][threadIdx.y]=buffer_in[gLoc];
if (threadIdx.y > (h-STREL_SIZE))
data[threadIdx.x][threadIdx.y + STREL_SIZE]=buffer_in[gLoc + STREL_SIZE];
if (threadIdx.x >(w-STREL_SIZE))
data[threadIdx.x + STREL_SIZE][threadIdx.y]=buffer_in[gLoc+STREL_SIZE];
if ((threadIdx.x >(w-STREL_SIZE)) && (threadIdx.y > (h-STREL_SIZE)))
data[threadIdx.x+STREL_SIZE][threadIdx.y+STREL_SIZE] = buffer_in[gLoc+2*STREL_SIZE];
//wait for all threads to finish read
__syncthreads();
//buffer_out[gLoc] = data[threadIdx.x][threadIdx.y];
unsigned char min_value = 255;
for(x=0;x<STREL_SIZE;x++){
for(y=0;y<STREL_SIZE;y++){
min_value = min( (data[threadIdx.x+x][threadIdx.y+y]) , min_value);
}
}
buffer_out[gLoc]= min_value;
}
}
my kernel settings now are:
dim3 block(16,16);
dim3 grid(512/(block.x),512/(block.y),nImg);
input:
output:
seems that the pixels of the apron are not copyied in the ouput buffer
You may want to read the following links for more detailed description and better example code on how to implement an image convolution CUDA kernel function.
http://igm.univ-mlv.fr/~biri/Enseignement/MII2/Donnees/convolutionSeparable.pdf
https://www.evl.uic.edu/sjames/cs525/final.html
Basically using a convolution filter of the size (5 x 5) does not mean setting the size of the thread block to be (5 x 5).
Typically, for a non-separable convolution, you could use a thread block of the size (16 x 16), to calculate a block of (16 x 16) pixels on the output image. To achieve this you need to read a block of ((2+16+2) x (2+16+2)) pixels from the input image to the shared memory, using the (16 x 16) threads collaboratively.

How to access every pixel?

If I have a PixelBuffer object of size (200 * 200 * 3) where each pixel has three consecutive spots for the RGB colors. How can I index them so that if i am trying to implement the DDA line drawing algorithm. I have seen a lot on the web that uses PutPixel(x,y) but im not sure how I can access the pixels in this method.
The pixels will be arranged row by row, with each pixel using 3 bytes. To address a point (x, y), you basically just need to multiply the y value by the size of a row (which is the width multiplied by 3), multiply the x value by the size of a pixel (3).
With a few constants for readability, the code for the function could look like this:
const int IMG_WIDTH = 200;
const int IMG_HEIGHT = 200;
const int BYTES_PER_PIXEL = 3;
const int BYTES_PER_ROW = IMG_WIDTH * BYTES_PER_PIXEL;
void PutPixel(uint8_t* pImgData, int x, int y, const uint8_t color[3])
{
uint8_t pPixel = pImgData + y * BYTES_PER_ROW + x * BYTES_PER_PIXEL;
for (int iByte = 0; iByte < BYTES_PER_PIXEL; ++iByte)
{
pPixel[iByte] = color[iByte];
}
}
Example how this function could be used:
// Allocate image data.
uint8_t* pImgData = new uint8_t[IMG_WIDTH * IMG_HEIGHT];
// Initialize image data, unless you are planning to set all pixels.
...
// Set pixel (50, 30) to yellow.
uint8_t yellow[3] = {255, 255, 0};
PutPixel(pImgData, 50, 30, yellow);
Once you have your image built in memory, you can store the content in a pixel buffer object using glBufferData():
GLuint bufId = 0;
glGenBuffers(1, &bufId);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufId);
glBufferData(GL_PIXEL_UNPACK_BUFFER, IMG_HEIGHT * BYTES_PER_ROW,
pImgData, GL_STATIC_DRAW);

array, copy pixels to correct index, algorithm

I have image size is 2x2, so count pixels = 4
one pixel - 4 bytes
so I have an array of 16 bytes - mas[16] - width * height * 4 = 16
I want to make the same image, but the size is more a factor of 2, this means that instead of one will be four pixels
new array will have size of 64 bytes - newMas[16] - width*2 * height*2 * 4
problem, that i can't correct copy pixels to newMas,that with different size image correctly copy pixels
this code copy pixels to mas[16]
size_t width = CGImageGetWidth(imgRef);
size_t height = CGImageGetHeight(imgRef);
const size_t bytesPerRow = width * 4;
const size_t bitmapByteCount = bytesPerRow * height;
size_t mas[bitmapByteCount];
UInt8* data = (UInt8*)CGBitmapContextGetData(bmContext);
for (size_t i = 0; i < bitmapByteCount; i +=4)
{
UInt8 a = data[i];
UInt8 r = data[i + 1];
UInt8 g = data[i + 2];
UInt8 b = data[i + 3];
mas[i] = a;
mas[i+1] = r;
mas[i+2] = g;
mas[i+3] = b;
}
In general, using the built-in image drawing API will be faster and less error-prone than writing your own image-manipulation code. There are at least three potential errors in the code above:
It assumes that there's no padding at the end of rows (iOS seems to pad up to a multiple of 16 bytes); you need to use CGImageGetBytesPerRow().
It assumes a fixed pixel format.
It gets the width/height from a CGImage but the data from a CGBitmapContext.
Assuming you have a UIImage,
CGRect r = {{0,0},img.size};
r.size.width *= 2;
r.size.height *= 2;
UIGraphicsBeginImageContext(r.size);
// This turns off interpolation in order to do pixel-doubling.
CGContextSetInterpolationQuality(UIGraphicsGetCurrentContext(), kCGInterpolationNone);
[img drawRect:r];
UIImage * bigImg = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();

Issue with writing YUV image frame in C/C++

I am trying to convert an RGB frame, which is taken from OpenGL glReadPixels(), to a YUV frame, and write the YUV frame to a file (.yuv). Later on I would like to write it to a named_pipe as an input for FFMPEG, but as for now I just want to write it to a file and view the image result using a YUV Image Viewer. So just disregard the "writing to pipe" for now.
After running my code, I encountered the following errors:
The number of frames shown in the YUV Image Viewer software is always 1/3 of the number of frames I declared in my program. When I declare fps as 10, I could only view 3 frames. When I declared fps as 30, I could only view 10 frames. However when I view the file in Text Editor, I could see that I have the correct amount of word "FRAME" printed in the file.
This is the example output that I got: http://www.bobdanani.net/image.yuv
I could not see the correct image, but just some distorted green, blue, yellow, and black pixels.
I read about YUV format from http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 and http://www.fourcc.org/fccyvrgb.php#mikes_answer and http://kylecordes.com/2007/pipe-ffmpeg
Here is what I have tried so far. I know that this conversion approach is quite in-efficient, and I can optimize it later. Now I just want to get this naive approach to work and have the image shown properly.
int frameCounter = 1;
int windowWidth = 0, windowHeight = 0;
unsigned char *yuvBuffer;
unsigned long bufferLength = 0;
unsigned long frameLength = 0;
int fps = 10;
void display(void) {
/* clear the color buffers */
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
/* DRAW some OPENGL animation, i.e. cube, sphere, etc
.......
.......
*/
glutSwapBuffers();
if ((frameCounter % fps) == 1){
bufferLength = 0;
windowWidth = glutGet(GLUT_WINDOW_WIDTH);
windowHeight = glutGet (GLUT_WINDOW_HEIGHT);
frameLength = (long) (windowWidth * windowHeight * 1.5 * fps) + 100; // YUV 420 length (width*height*1.5) + header length
yuvBuffer = new unsigned char[frameLength];
write_yuv_frame_header();
}
write_yuv_frame();
frameCounter = (frameCounter % fps) + 1;
if ( (frameCounter % fps) == 1){
snprintf(filename, 100, "out/image-%d.yuv", seq_num);
ofstream out(filename, ios::out | ios::binary);
if(!out) {
cout << "Cannot open file.\n";
}
out.write (reinterpret_cast<char*> (yuvBuffer), bufferLength);
out.close();
bufferLength = 0;
delete[] yuvBuffer;
}
}
void write_yuv_frame_header (){
char *yuvHeader = new char[100];
sprintf (yuvHeader, "YUV4MPEG2 W%d H%d F%d:1 Ip A0:0 C420mpeg2 XYSCSS=420MPEG2\n", windowWidth, windowHeight, fps);
memcpy ((char*)yuvBuffer + bufferLength, yuvHeader, strlen(yuvHeader));
bufferLength += strlen (yuvHeader);
delete (yuvHeader);
}
void write_yuv_frame() {
int width = glutGet(GLUT_WINDOW_WIDTH);
int height = glutGet(GLUT_WINDOW_HEIGHT);
memcpy ((void*) (yuvBuffer+bufferLength), (void*) "FRAME\n", 6);
bufferLength +=6;
long length = windowWidth * windowHeight;
long yuv420FrameLength = (float)length * 1.5;
long lengthRGB = length * 3;
unsigned char *rgb = (unsigned char *) malloc(lengthRGB * sizeof(unsigned char));
unsigned char *yuvdest = (unsigned char *) malloc(yuv420FrameLength * sizeof(unsigned char));
glReadPixels(0, 0, windowWidth, windowHeight, GL_RGB, GL_UNSIGNED_BYTE, rgb);
int r, g, b, y, u, v, ypos, upos, vpos;
for (int j = 0; j < windowHeight; ++j){
for (int i = 0; i < windowWidth; ++i){
r = (int)rgb[(j * windowWidth + i) * 3 + 0];
g = (int)rgb[(j * windowWidth + i) * 3 + 1];
b = (int)rgb[(j * windowWidth + i) * 3 + 2];
y = (int)(r * 0.257 + g * 0.504 + b * 0.098) + 16;
u = (int)(r * 0.439 + g * -0.368 + b * -0.071) + 128;
v = (int)(r * -0.148 + g * -0.291 + b * 0.439 + 128);
ypos = j * windowWidth + i;
upos = (j/2) * (windowWidth/2) + i/2 + length;
vpos = (j/2) * (windowWidth/2) + i/2 + length + length/4;
yuvdest[ypos] = y;
yuvdest[upos] = u;
yuvdest[vpos] = v;
}
}
memcpy ((void*) (yuvBuffer + bufferLength), (void*)yuvdest, yuv420FrameLength);
bufferLength += yuv420FrameLength;
free (yuvdest);
free (rgb);
}
This is just the very basic approach, and I can optimize the conversion algorithm later.
Can anyone tell me what is wrong in my approach? My guess is that one of the issues is with the outstream.write() call, because I converted the unsigned char* data to char* data that it may lose data precision. But if I don't cast it to char* I will get a compile error. However this doesn't explain why the output frames are corrupted (only account to 1/3 of the number of total frames).
It looks to me like you have too many bytes per frame for 4:2:0 data. ACcording to the spec you linked to, the number of bytes for a 200x200 pixel 4:2:0 frame should be 200 * 200 * 3 / 2 = 60,000. But you have ~90,000 bytes. Looking at your code, I don't see where you are convert from 4:4:4 to 4:2:0. So you have 2 choices - either set the header to 4:4:4, or convert the YCbCr data to 4:2:0 before writing it out.
I compiled your code and surely there is a problem when computing upos and vpos values.
For me this worked (RGB to YUV NV12):
vpos = length + (windowWidth * (j/2)) + (i/2)*2;
upos = vpos + 1;