I am trying to extract a bitmask from a QPixmap and pass it to OpenCV. My bitmask is created by "painting" operations.
My process so far has been:
Create a QPixmap, QPixmap::fill(QColor(0,0,0,0)) and use a QPainter with QPainter::setPen(QColor(255,0,0,255)) to QPainter::drawPoint(mouse_event->pos()
When ready to extract the bitmask QPixmap::toImage() then QImage::createAlphaMask(), which is documented to return QImage::Format_MonoLSB
I am now officially stuck though. I'm having trouble deciphering the documentation:
Each pixel stored in a QImage is represented by an integer. The size of the integer varies depending on the format. QImage supports several image formats described by the Format enum.
Monochrome images are stored using 1-bit indexes into a color table with at most two colors. There are two different types of monochrome images: big endian (MSB first) or little endian (LSB first) bit order.
...
The createAlphaMask() function builds and returns a 1-bpp mask from the alpha buffer in this image...
Also:
QImage::Format_MonoLSB --- 2 ---The image is stored using 1-bit per pixel. Bytes are packed with the less significant bit (LSB) first.
Could anyone help me clarify how to transfer this into a cv::Mat.
Also, am I supposed to read this that each pixel will be an unsigned char or will we be storing 8 pixels in a bit.
I've successfully managed to transfer a monochrome QImage to a cv::Mat. I hope the following code is helpful to others:
IMPORTANT EDIT: There was a major bug with this code. bytesPerLine is byte aligned as well as word aligned on some machines. Thus the width() should be used with cur_byte
QImage mask; //Input from wherever
cv::Mat workspace;
if(!mask.isNull() && mask.depth() == 1)
{
if(mask.width() != workspace.cols || mask.height() != workspace.rows)
workspace.create(mask.height(), mask.width(), CV_8UC1);
for(int i = 0; i < mask.height(); ++i)
{
unsigned char * cur_row = mask.scanLine(i);
//for(int cur_byte = 0, j = 0; cur_byte < mask.bytesPerLine(); ++cur_byte) wrong
for(int cur_byte = 0, j = 0; j < mask.width(); ++cur_byte)
{
unsigned char pixels = cur_row[cur_byte];
for(int cur_bit = 0; cur_bit < 8; ++cur_bit, ++j)
{
if(pixels & 0x01) //Least Significant Bit
workspace.at<unsigned char>(i, j) = 0xff;
else
workspace.at<unsigned char>(i, j) = 0x00;
pixels = pixels >> 1; //Least Significant Bit
}
}
}
}
Related
I am implementing an audio channel mixer and using Viktor T. Toth's algorithm. Trying to mix two audio channel streams.
In the code, quantization_ is the byte representation of the bit depth of a channel. My mix function, takes a pointer to destination and source uint8_t buffers, mixes two channels and writes into the destination buffer. Because I am taking data in a uint8_t buffer, doing that addition, division, and multiplication operations to get the actual 8, 16 or 24-bit samples and convert them again to 8-bit.
Generally, it gives the expected output sample values. However, some samples turn out to have near 0 value as they are not supposed to be when I look the output in Audacity. In the screenshot, bottom 2 signals are two mono channels and the top one is the mixed channel. It can be seen that there are some very low values, especially in the middle.
Below, is my mix function;
void audio_mixer::mix(uint8_t* dest, const uint8_t* source)
{
uint64_t mixed_sample = 0;
uint64_t dest_sample = 0;
uint64_t source_sample = 0;
uint64_t factor = 0;
for (int i = 0; i < channel_size_; ++i)
{
dest_sample = 0;
source_sample = 0;
factor = 1;
for (int j = 0; j < quantization_; ++j)
{
dest_sample += factor * static_cast<uint64_t>(*dest++);
source_sample += factor * static_cast<uint64_t>(*source++);
factor = factor * 256;
}
mixed_sample = (dest_sample + source_sample) - (dest_sample * source_sample / factor);
dest -= quantization_;
for (int k = 0; k < quantization_; ++k)
{
*dest++ = static_cast<uint8_t>(mixed_sample % 256);
mixed_sample = mixed_sample / 256;
}
}
}
It seems like you aren't treating the signed audio samples correctly. The horizontal line should be zero voltage from your audio signal.
If you look at the positive voltage audio samples they obey your equation correctly (except for the peak values in the center). The negative values are being compressed which makes me feel like they are being treated as small positive voltages instead of negative voltages.
In other words, maybe those unsigned ints should be signed ints so the top bit indicates the voltage polarity and you can have audio samples in the range +127 to -128.
Those peak values in the center seem like they are wrapping around modulo 255 which would be the peak value for an unsigned byte representation of your audio. I'm not sure how this would happen but it seems related to the unsigned vs signed signals.
Maybe you should try the other formula Viktor provided in his document:
Z = 2(A+B) - (AB/128) - 256
I am accessing the image like so:
pDoc = GetDocument();
int iBitPerPixel = pDoc->_bmp->bitsperpixel; // used to see if grayscale(8 bits) or RGB (24 bits)
int iWidth = pDoc->_bmp->width;
int iHeight = pDoc->_bmp->height;
BYTE *pImg = pDoc->_bmp->point; // pointer used to point at pixels in the image
int Wp = iWidth;
const int area = iWidth * iHeight;
int r; // red pixel value
int g; // green pixel value
int b; // blue pixel value
int gray; // gray pixel value
BYTE *pImgGS = pImg; // grayscale image pixel array
and attempting to change the rgb image to gray like so:
// convert RGB values to grayscale at each pixel, then put in grayscale array
for (int i = 0; i<iHeight; i++)
for (int j = 0; j<iWidth; j++)
{
r = pImg[i*iWidth * 3 + j * 3 + 2];
g = pImg[i*iWidth * 3 + j * 3 + 1];
b = pImg[i*Wp + j * 3];
r * 0.299;
g * 0.587;
b * 0.144;
gray = std::round(r + g + b);
pImgGS[i*Wp + j] = gray;
}
finally, this is how I try to draw the image:
//draw the picture as grayscale
for (int i = 0; i < iHeight; i++) {
for (int j = 0; j < iWidth; j++) {
// this should set every corresponding grayscale picture to the current picture as grayscale
pImg[i*Wp + j] = pImgGS[i*Wp + j];
}
}
}
original image:
and the resulting image that I get is this:
First check if image type is 24 bits per pixels.
Second, allocate memory to pImgGS;
BYTE* pImgGS = (BTYE*)malloc(sizeof(BYTE)*iWidth *iHeight);
Please refer this article to see how bmp data is saved. bmp images are saved upside down. Also, first 54 byte of information is BITMAPFILEHEADER.
Hence you should access values in following way,
double r,g,b;
unsigned char gray;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[(i*iWidth + j)*3 + 2];
g = (double)pImg[(i*iWidth + j)*3 + 1];
b = (double)pImg[(i*iWidth + j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
}
If there is padding present, then first determine padding and access in different way. Refer this to understand pitch and padding.
double r,g,b;
unsigned char gray;
long index=0;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[index+ (j)*3 + 2];
g = (double)pImg[index+ (j)*3 + 1];
b = (double)pImg[index+ (j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
index =index +pitch;
}
While drawing image,
as pImg is 24bpp, you need to copy gray values thrice to each R,G,B channel. If you ultimately want to save grayscale image in bmp format, then again you have to write bmp data upside down or you can simply skip that step in converting to gray here:
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
tl; dr:
Make one common path. Convert everything to 32-bits in a well-defined manner, and do not use image dimensions or coordinates. Refactor the YCbCr conversion ( = grey value calculation) into a separate function, this is easier to read and runs at exactly the same speed.
The lengthy stuff
First, you seem to have been confused with strides and offsets. The artefact that you see is because you accidentially wrote out one value (and in total only one third of the data) when you should have written three values.
One can get confused with this easily, but here it happened because you do useless stuff that you needed not do in the first place. You are iterating coordinates left to right, top-to-bottom and painstakingly calculate the correct byte offset in the data for each location.
However, you're doing a full-screen effect, so what you really want is iterate over the complete image. Who cares about the width and height? You know the beginning of the data, and you know the length. One loop over the complete blob will do the same, only faster, with less obscure code, and fewer opportunities of getting something wrong.
Next, 24-bit bitmaps are common as files, but they are rather unusual for in-memory representation because the format is nasty to access and unsuitable for hardware. Drawing such a bitmap will require a lot of work from the driver or the graphics hardware (it will work, but it will not work well). Therefore, 32-bit depth is usually a much better, faster, and more comfortable choice. It is much more "natural" to access program-wise.
You can rather trivially convert 24-bit to 32-bit. Iterate over the complete bitmap data and write out a complete 32-bit word for each 3 byte-tuple read. Windows bitmaps ignore the A channel (the highest-order byte), so just leave it zero, or whatever.
Also, there is no such thing as a 8-bit greyscale bitmap. This simply doesn't exist. Although there exist bitmaps that look like greyscale bitmaps, they are in reality paletted 8-bit bitmaps where (incidentially) the bmiColors member contains all greyscale values.
Therefore, unless you can guarantee that you will only ever process images that you have created yourself, you cannot just rely that e.g. the values 5 and 73 correspond to 5/255 and 73/255 greyscale intensity, respectively. That may be the case, but it is in general a wrong assumption.
In order to be on the safe side as far as correctness goes, you must convert your 8-bit greyscale bitmaps to real colors by looking up the indices (the bitmap's grey values are really indices) in the palette. Otherwise, you could be loading a greyscale image where the palette is the other way around (so 5 would mean 250 and 250 would mean 5), or a bitmap which isn't greyscale at all.
So... you want to convert 24-bit and you want to convert 8-bit bitmaps, both to 32-bit depth. That means you do all the annoying what-if stuff once at the beginning, and the rest is one identical common path. That's a good thing.
What you will be showing on-screen is always a 32-bit bitmap where the topmost byte is ignored, and the lower three are all the same value, resulting in what looks like a shade of grey. That's simple, and simple is good.
Note that if you do a BT.601 style YCbCr conversion (as indicated by your use of the constants 0.299, 0.587, and 0.144), and if your 8-bit greyscale images are perceptive (this is something you must know, there is no way of telling from the file!), then for 100% correctness, you need to to the inverse transformation when converting from paletted 8-bit to RGB. Otherwise, your final result will look like almost right, but not quite. If your 8-bit greycales are linear, i.e. were created without using the above constants (again, you must know, you cannot tell from the image), you need to copy everything as-is (here, doing the conversion would make it look almost-but-not-quite right).
About the RGB-to-greyscale conversion, you do not need an extra greyscale bitmap just to hold the values that you never need again afterwards. You can read the three color values from the loaded bitmap, calculate Y, and directly build the 32-bit ARGB word, which you then write out to the final bitmap. This saves one entirely useless round-trip to memory which is not necessary.
Something like this:
uint32_t* out = (uint32_t*) output_bitmap_data;
for(int i = 0; i < inputSize; i+= 3)
{
uint8_t Y = calc_greyscale(in[0], in[1], in[2]);
*out++ = (Y<<16) | (Y<<8) | Y;
}
Alternatively, you can also do the from-whatever-to-32 conversion, and then do the to-greyscale conversion in-place there. This, in turn, introduces an extra round-trip to memory, but the code becomes much, much easier overall.
I'm trying to follow this pseudocode to implement a water colour filter in Open CV.
http://supercomputingblog.com/graphics/oil-painting-algorithm/
I've previously achieved the effect using this method in javascript with a canvas because I could iterate over the pixels however I'm not sure how to do that with Open CV.
Mat im = imread(...); //input image
Mat paint; // output after processing
for(int i = 0; i < im.rows; i++)
{
for (int j = 0; j < im.cols; j++) //for each pixel
{
//here I need a reference to the pixel colour from im
}
}
I tried to use:
im.at<uchar>(i,j)
However this is giving me values around 350 for the most part which suggested to me that it's the cumulation of the rgb channels (a multi-channel array I think). So I tried to split it like this:
vector<Mat> three_channels;
split(im, three_channels);
But it just gives me the same value 3 times. Any suggestions?
I ended up just accessing them, thusly:
int r = im.at<cv::Vec3b>(y,x)[0];
int g = im.at<cv::Vec3b>(y,x)[1];
int b = im.at<cv::Vec3b>(y,x)[2];
as was mentioned in the answer to a previous question.
Most of the time, colors are just 8-bit combined rather than in an array, so you need masks to manipulate them.
short red = (color >> 16) & 0xFF;
short green = (color >> 8) & 0xFF;
short blue = (color) & 0xFF;
(via how to color mask in c)
Has anyone ever integrated FreeType with DirectX 11 for font rendering? The only article I seem to find is DirectX 11 Font Rendering. I can't seem to match the correct DXGI_FORMAT for rendering the grayscale bitmap that FreeType creates for a glyph.
There's three ways to handle greyscale textures in Direct3D 11:
Option (1): You can use an RGB format and replicate the channels. For example, you'd use DXGI_R8G8B8A8_UNORM and set R,G,B to the single monochrome channel and the A to all opaque (0xFF). You can handle Monochrome + Alpha (2 channel) data the same way.
This conversion is supported when loading .DDS luminance formats (D3DFMT_L8, D3DFMT_L8A8) by DirectXTex library and the texconv command-line tool with the -xlum switch.
This makes the texture up to 4 times larger in memory, but easily integrates using standard shaders.
Option (2): You keep the monochrome texture as a single channel using DXGI_FORMAT_R8_UNORM as your format. You then render using a custom shader which replicates the red channel to RGB at runtime.
This is in fact what the tutorial blog post you linked to is doing:
///////// PIXEL SHADER
float4 main(float2 uv : TEXCOORD0) : SV_Target0
{
return float4(Decal.Sample(Bilinear, uv).rrr, 1.f);
}
For Monochrome + Alpha (2-channel) you'd use DXGI_FORMAT_R8G8_UNORM and then your custom shader would use .rrrg as the swizzle.
Option (3): You can compress the monochrome data to the DXGI_FORMAT_BC2 format using a custom encoder. This is implemented in DirectX Tool Kit's MakeSpriteFont tool when using /TextureFormat:CompressedMono
// CompressBlock (16 pixels (4x4 block) stored as 16 bytes)
long alphaBits = 0;
int rgbBits = 0;
int pixelCount = 0;
for (int y = 0; y < 4; y++)
{
for (int x = 0; x < 4; x++)
{
long alpha;
int rgb;
// This is the single monochrome channel
int value = bitmapData[blockX + x, blockY + y];
if (options.NoPremultiply)
{
// If we are not premultiplied, RGB is always white and we have 4 bit alpha.
alpha = value >> 4;
rgb = 0;
}
else
{
// For premultiplied encoding, quantize the source value to 2 bit precision.
if (value < 256 / 6)
{
alpha = 0;
rgb = 1;
}
else if (value < 256 / 2)
{
alpha = 5;
rgb = 3;
}
else if (value < 256 * 5 / 6)
{
alpha = 10;
rgb = 2;
}
else
{
alpha = 15;
rgb = 0;
}
}
// Add this pixel to the alpha and RGB bit masks.
alphaBits |= alpha << (pixelCount * 4);
rgbBits |= rgb << (pixelCount * 2);
pixelCount++;
}
}
// The resulting BC2 block is:
// uint64_t = alphaBits
// uint16_t = 0xFFFF
// uint16_t = 0x0
// uint32_t = rgbBits
The resulting texture is then rendered using a standard alpha-blending shader. Since it uses 1 byte per pixel, this is effectively the same size as if you were using DXGI_FORMAT_R8_UNORM.
This technique does not work for 2-channel data, but works great for alpha-blended monochrome images like font glyphs.
I have a binary file of image data where each pixel is exactly 4 bits. Image data is laid out as follow:
There a N images where the first image is 1x1, the second image is 2x2, the third is 4x4, and so on (they are mipmaps if you care to know).
Given a pointer to the start of the data buffer, I want to skip to the biggest image.
Now I know how many bytes I want to skip, but there is this annoying 1x1 image at the start which is 4 bits. I am not aware of anyway to increment a pointer by bit.
How can I successfully retrieve the data without everything being off by 4 bits?
Assuming you can change your file format you can do either of the following:
Add padding to the 1x1 image
Store the images in reverse order (effectively the same as above, but not ideal for mip-maps because you don't necessarily know how many images you will have)
If you can't change your format, you have these choices:
Convert the data
Accept that the buffer is offset by half a byte and work with it accordingly
You said:
How can I successfully retrieve the data without everything being off
by 4 bits?
So that means you need to convert. When you calculate your offset in bytes, you will find that the first one contains half a byte of the previous image. So in a pinch you can shuffle them like this:
for( i = start; i < end; i++ ) {
p[i] = (p[i] << 4) | (p[i+1] >> 4);
}
That's assuming the first pixel is bits 4-7 and the second pixel is bits 0-3, and so on... If it's the other way around, just invert those two shifts.
// this assumes pixels points to bytes(unsigned chars)
index = ?;// your index to the pixel
byte_t b = pixels[index / 2];
if (index % 2) pixel = b >> 4;
else pixel = b & 15;
// Or you can use
byte_t b = pixels[index >> 1];
if (index & 1) pixel = b >> 4;
else pixel = b & 15;
Either way just compute the logical index into the file. Dividing by two takes you to the start of the byte where the pixel is. And then just read the correct half of the byte.
So make a function
byte_t GetMyPixel(unsigned char* pixels, unsigned index) {
byte_t b = pixels[index >> 1];
byte_t pixel;
if (index & 1) pixel = b >> 4;
else pixel = b & 15;
return pixel;
}
To read first image.
Image1x1 = GetMyPixel(pixels,0);
Image2x2_1 = GetMyPixel(pixels,1);// Top left pixel of second image
Image2x2_2 = GetMyPixel(pixels,2);// Top Right pixel of second image
Image2x2_3 = GetMyPixel(pixels,3);// Bottom left pixel of second image
... etc
So that is one way to go about it. You might need to take into account the endian-ness you are using so if it seems wrong then switch the logic for the pixel read thusly...
byte_t GetMyPixel(unsigned char* pixels, unsigned index) {
byte_t b = pixels[index >> 1];
byte_t pixel;
#if OTHER_ENDIAN
if (index & 1) pixel = b >> 4;
else pixel = b & 15;
#else
if (index & 1) pixel = b & 15;
else pixel = b >> 4;
#endif
return pixel;
}