WebP lossless format overview - c++

I am reading the official WebP lossless bitstream spec. and I have a feeling, that the document is missing some explanation.
Let me describe some fragments of the specification:
1. Introduction - clear
2. Riff header - clear
3. Transformations
The transformations are used only for the main level ARGB image: the
subresolution images have no transforms, not even the 0 bit indicating
the end-of-transforms.
Nowhere earlier was it mentioned, that the container holds some sub-resolution images. What are they? Where are they described, if not in the specification? How to they add to the final image?
Then, in the Predictor transform paragraph:
We divide the image into squares...
..what image? The main image or sub-resolution image? What if the image cannot be divided into squares (apart from pixel-size squares)?
The first 4 bits of prediction data define the block width and height
in number of bits. The number of block columns, block_xsize, is used
in indexing two-dimensionally.
Does this mean that the image width is block_xsize * block_width ?
The transform data contains the prediction mode for each block of the image.
In what way, what format?
I dont know why I am having a hard time understanding this. Maybe because I am not a native english speaker or because the description is too laconic.
I'd appreciate any help in decoding this specification :)

It was mentioned earlier. Right at the top of the document it says:
The format uses subresolution images, recursively embedded into the
format itself, for storing statistical data about the images, such as
the used entropy codes, spatial predictors, color space conversion,
and color table.
These are arrays (or a vector in the case of the color table) of data where each element applies to a block of pixels in the actual image, e.g. a 16x16 block. These "subresolution images" are not themselves subsamples of the image being compressed.
The format description calls them images because they are stored exactly like the main image is in the format. The transforms are instructions to the decoder to apply to the decompressed main image data. The entropy image is used to decompress the main image, by virtue of providing the Huffman codes for each block.

Related

OpenCV convertTo()

I came across this code:
image.convertTo(temp_image,CV_16SC3);
I saw the description of the convertTo() function from here, but what confuses me is image. How can we read the above code? What would be the relation between image and temp_image?
Thanks.
The other answers here are correct, but lack some details. Let me try.
image.convertTo(temp_image,CV_16SC3);
You have a source image image, and a destination image temp_image. You didn't specify the type of image, but probably is CV_8UC3 or CV_32FC3, i.e. a 3 channel image (since convertTo doesn't change the number of channels), where each channel has depth 8 bit (unsigned char, CV_8UC3) or 32 bit (float, CV_32FC3).
This line of code will change the depth of each channel, so that temp_image has each channel of depth 16 bit (short). Specifically it's a signed short, since the type specifier has the S: CV_16SC3.
Note that if you are narrowing down the depth, as in the case from float to signed short, then saturate_cast will make sure that all the values in temp_image will be in the correct range, i.e. in [–32768, 32767] for signed short.
Why you need to change the depth of an image?
Some OpenCV functions require input images with a specific depth.
You need a matrix to contain a different range of values. E.g. if you need to sum (or subtract) some images CV_8UC3 (tipically BGR images), you'd better store the result in a CV_16SC3 or you'll probably get wrong results due to saturations, since the range for CV_8U images is in [0,255]
You read with imread, or want to store with imwrite images with 16bit depth. This are usually used (AFAIK) in medical or graphics application to allow a wider range of colors. However, most monitors do not support 16bit image visualization.
There may be other cases, let me know if I miss the one important to you.
An image is a matrix of pixel information (i.e. a 1080p image will be a 1,920 × 1,080 matrix where each entry contains rbg values for that pixel). All you are doing is reformatting that matrix (each pixel entry, iteratively) into a new type (CV_16SC3) so it can be read by different programs.
The temp_image is a new matrix of pixel information based off of image formatted into CV_16SC3.
The first one is a source, the second one - destination. So, it takes image, converts it into type CV_16SC3 and stores in temp_image.

Distorted Image in Secondary Capture DICOM file

I want to create a secondary capture DICOM file as per the requirements.
I created one, but the image( pixel data in the tag 7FE0 0010 ) looks distorted. I am reading a JPEG image using Gdiplus::Bitmap and using API ::LockBits and 'btmpData.Scan0' to get the pixel data. The same is inserted into the pixel data tag - 7FE0,0010. But while viewing the same in a DICOM viewer, it is coming as distorted. The dicom tags Rows, Columns, PlannarConfiguration are updated properly. BitsAllocated, BitsStored and HighBit are given values 8,8 and 7 respectively.
While goggling I came to know that, instead of RGB format, the bits might be in the order BGR. Hence I tried to switch the bits in the place 'B' and 'R'.
But still the issue exist. Could anybody help me ?
Apparently you forgot to take into account Stride support from GDI+. An image being much more explicit than 1000 words here is what I mean:, the actual full article being here.

Correct display of DICOM images ITK-VTK (images too dark)

I read dicom images with ITK using itk::ImageSeriesReader and itk::GDCMImageIO after reading i flip the images with itk::FlipImageFilter (to get right orientation of the images) and convert the itkImageData to vtkImageData using itk::ImageToVTKImageFilter. I visualization images with VTK using vtkResliceImageViewer in QVTKWidget2.
I set:
(vtkResliceImageViewer)m_imageViewer[i]->SetColorWindow(windowWidthTAGvalue[0028|1051]);
(vtkResliceImageViewer)m_imageViewer[i]->SetColorLevel(windowCenterTAGvalue[0028|1050]);
and i set following blac&white LookUpTable:
vtkLookupTable* lutbw = vtkLookupTable::New();
lutbw->SetTableRange(0,1000);
lutbw->SetSaturationRange(0,0);
lutbw->SetHueRange(0,0);
lutbw->SetValueRange(0,1);
lutbw->Build();
And images shown into my software compared with the same images shown into other software are much darker, i can not get the same effect as other DICOM viewers
My software images are right other software image is left also when i use some other LookUpTable in this example Flow i can not get the same effect (2nd row images) my image on right is much darker then other.
What i am missing why my images are darker what can i do? i was research a lot into dicom and ikt/vtk can not find good solution any help is appreciate.
Please check the values for Rescale Slope (0028,1053) and Rescale Intercept(0028,1052) and apply the Modality LUT transformation before applying the Window level.
Your dataset may have VOI LUT Function (0028,1056) attribute value of "SIGMOID" instead of "LINEAR".
I extracted the image data from one of your DICOM file (brain_009.dcm) and looked at the histogram of the image data. It looks like, the minimum value stored in the image is 0 and maximum value is 960 regardless of interpreting the data is signed or unsigned. Also, the Window Width (0028:1051) has an invalid value of “0” and you cannot use that for displaying the image.
So your default display could set the Window Width to 960 and Window Center to half the window width plus the minimum value.

How to get grayscale value of pixels from grayscale image in xCode

I was wondering how to determine the equivalent of RGB values for a grayscale image. The original image is grayscale and everything I have found online is converting an RGB image pixel values to the grayscale pixel values. I already can read in the image. Ideally, this would be for xCode.
I was wondering if there was a class which would do this for me. If so, and you could point me to it, that would be great. I will read on it.
Any help is greatly appreciated.
NOTE: I am a beginner in C++ and do not have time to learn everything formally; I have to learn all of my programming on the fly.
You need more information to transform from a simple Greyscale to RGB, when you do reverse operation, the color information is "lost", as the three channels are set to same value(depending on the algorithm each channel will have a different/same weight in the final color computation).
Digital cameras, usually store more information per pixel, 12 bits per channel in 35mm and 14 bits per channel in medium format (those bits number are the average, some products offer less or even more quality).
Thanks to those additional bits per channel, the camera can compute the "real" color, or what it thinks is the real color based on some parameters.
TL;DR: You can't without more data from your source, in this case the image.
You can convert a gray value to RGB by setting each component of the RGB value to the gray value:
ColorRGB myColorRGB = ColorRGBMake(myGrayValue, myGrayValue, myGrayValue);

C++: How to interpret a byte array representation of an image?

I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.