x265 Encoder: order of values in 'planes' array - c++

During the encoding process with x265 encoder (https://x265.readthedocs.org/en/default/api.html) I want to write image pixel values (specifically values of Y channel) into .txt file after a new image is encoded (not important why). For that, I'm using 'planes' variable of class x265_picture:
x265_picture* pic_out; # variable where encoded image is to be stored
... # encoding process
uint8_t *plane = (uint8_t*)pic_out->planes[0];
uint32_t pixelCount = x265_picturePlaneSize(pic_out->colorSpace, m_param->sourceWidth, m_param->sourceHeight, 0);
ofstream out_file("out_file.txt");
for (uint32_t j = 0; j < pixelCount; j++) # loop for all pixels
{
int pix_val = plane[j];
out << pix_val;
}
ofstream.close()
But when I reconstruct the output data into image, I get
instead of
or another example:
instead of
(color is not important, the "stripes" are the concern)
In the output file there seem to be intervals of data in (apparently) correct order (let's say 89,90,102,98,...) followed always by long sequence of equal numbers (eg. 235,235,235,235... or 65,65,65,65...), that "create" the stripes. Could someone please tell me what I'm missing?

thanks guys, just solved this...the key is using 'src += srcStride':
ofstream out_file("out_file.txt");
int srcStride = pic_out->stride[0] / sizeof(pixel);
uint8_t* src = (uint8_t*) pic_out->planes[0];
for (int y = 0; y < m_param->sourceHeight; y++, src += srcStride)
{
for (int x = 0; x < m_param->sourceWidth; x++)
out_file << (int)(src[x]) << ",";
}
out_file.close();

Related

De-quantising audio with ffmpeg

I am using FFmpeg library to decode and (potentially) modify some audio.
I managed to use the following functions to iterate through all frames of the audio file:
avformat_open_input // Obtains formatContext
avformat_find_stream_info
av_find_best_stream // The argument AVMEDIA_TYPE_AUDIO is fed in to find the audio stream
avcodec_open2 // Obtains codecContext
av_init_packet
// The following is used to loop through the frames
av_read_frame
avcodec_decode_audio4
In the end, I have these three values available on each iteration
int dataSize; // return value of avcodec_decode_audio4
AVFrame* frame;
AVCodecContext* codecContext; // Codec context of the best stream
I supposed that a loop like this can be used to iterate over all samples:
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
However, when I plotted the data using gnuplot, I did not get a waveform as expected, and some of the values reached the the limit of 32 bits integers: (The audio stream is not silent in the first few seconds)
I suppose that some form of quantisation is going on to prevent the data from being interpreted mathematically. What should I do to de-quantise this?
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
Well... No. So, first of all, we'll need to know the data type. Check frame->format. It's an enum AVSampleFormat, most likely flt, fltp, s16 or s16p.
So, how do you interpret frame->data[] given the format? Well, first, is it planar or not? If it's planar, it means each channel is in frame->data[n], where n is the channel number. frame->channels is the number of channels. If it's not planar, it means all data is interleaved (per sample) in frame->data[0].
Second, what is the storage type? If it's s16/s16p, it's int16_t *. If it's flt/fltp, it's float *. So the correct interpretation for fltp would be:
for (int c = 0; c < frame->channels; c++) {
float *samples = frame->data[c];
for (int i = 0; i < frame->nb_samples; i++) {
float sample = samples[i];
// now this sample is accessible, it's in the range [-1.0, 1.0]
}
}
Whereas for s16, it would be:
int16_t *samples = frame->data[0];
for (int c = 0; c < frame->channels; c++) {
for (int i = 0; i < frame->nb_samples; i++) {
int sample = samples[i * frame->channels + c];
// now this sample is accessible, it's in the range [-32768,32767]
}
}

Weird but close fft and ifft of image in c++

I wrote a program that loads, saves, and performs the fft and ifft on black and white png images. After much debugging headache, I finally got some coherent output only to find that it distorted the original image.
input:
fft:
ifft:
As far as I have tested, the pixel data in each array is stored and converted correctly. Pixels are stored in two arrays, 'data' which contains the b/w value of each pixel and 'complex_data' which is twice as long as 'data' and stores real b/w value and imaginary parts of each pixel in alternating indices. My fft algorithm operates on an array structured like 'complex_data'. After code to read commands from the user, here's the code in question:
if (cmd == "fft")
{
if (height > width) size = height;
else size = width;
N = (int)pow(2.0, ceil(log((double)size)/log(2.0)));
temp_data = (double*) malloc(sizeof(double) * width * 2); //array to hold each row of the image for processing in FFT()
for (i = 0; i < (int) height; i++)
{
for (j = 0; j < (int) width; j++)
{
temp_data[j*2] = complex_data[(i*width*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*width*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) width; j++)
{
complex_data[(i*width*2)+(j*2)] = temp_data[j*2];
complex_data[(i*width*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, width, height); //tested
free(temp_data);
temp_data = (double*) malloc(sizeof(double) * height * 2);
for (i = 0; i < (int) width; i++)
{
for (j = 0; j < (int) height; j++)
{
temp_data[j*2] = complex_data[(i*height*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*height*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) height; j++)
{
complex_data[(i*height*2)+(j*2)] = temp_data[j*2];
complex_data[(i*height*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, height, width);
free(temp_data);
free(data);
data = complex_to_real(complex_data, image.size()/4); //tested
image = bw_data_to_vector(data, image.size()/4); //tested
cout << "*** fft success ***" << endl << endl;
void FFT(double* data, unsigned long nn, int f_or_b){ // f_or_b is 1 for fft, -1 for ifft
unsigned long n, mmax, m, j, istep, i;
double wtemp, w_real, wp_real, wp_imaginary, w_imaginary, theta;
double temp_real, temp_imaginary;
// reverse-binary reindexing to separate even and odd indices
// and to allow us to compute the FFT in place
n = nn<<1;
j = 1;
for (i = 1; i < n; i += 2) {
if (j > i) {
swap(data[j-1], data[i-1]);
swap(data[j], data[i]);
}
m = nn;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
};
// here begins the Danielson-Lanczos section
mmax = 2;
while (n > mmax) {
istep = mmax<<1;
theta = f_or_b * (2 * M_PI/mmax);
wtemp = sin(0.5 * theta);
wp_real = -2.0 * wtemp * wtemp;
wp_imaginary = sin(theta);
w_real = 1.0;
w_imaginary = 0.0;
for (m = 1; m < mmax; m += 2) {
for (i = m; i <= n; i += istep) {
j = i + mmax;
temp_real = w_real * data[j-1] - w_imaginary * data[j];
temp_imaginary = w_real * data[j] + w_imaginary * data[j-1];
data[j-1] = data[i-1] - temp_real;
data[j] = data[i] - temp_imaginary;
data[i-1] += temp_real;
data[i] += temp_imaginary;
}
wtemp = w_real;
w_real += w_real * wp_real - w_imaginary * wp_imaginary;
w_imaginary += w_imaginary * wp_real + wtemp * wp_imaginary;
}
mmax=istep;
}}
My ifft is the same only with the f_or_b set to -1 instead of 1. My program calls FFT() on each row, transposes the image, calls FFT() on each row again, then transposes back. Is there maybe an error with my indexing?
Not an actual answer as this question is Debug only so some hints instead:
your results are really bad
it should look like this:
first line is the actual DFFT result
Re,Im,Power is amplified by a constant otherwise you would see a black image
the last image is IDFFT of the original not amplified Re,IM result
the second line is the same but the DFFT result is wrapped by half size of image in booth x,y to match the common results in most DIP/CV texts
As you can see if you IDFFT back the wrapped results the result is not correct (checker board mask)
You have just single image as DFFT result
is it power spectrum?
or you forget to include imaginary part? to view only or perhaps also to computation somewhere as well?
is your 1D **DFFT working?**
for real data the result should be symmetric
check the links from my comment and compare the results for some sample 1D array
debug/repair your 1D FFT first and only then move to the next level
do not forget to test Real and complex data ...
your IDFFT looks BW (no gray) saturated
so did you amplify the DFFT results to see the image and used that for IDFFT instead of the original DFFT result?
also check if you do not round to integers somewhere along the computation
beware of (I)DFFT overflows/underflows
If your image pixel intensities are big and the resolution of image too then your computation could loss precision. Newer saw this in images but if your image is HDR then it is possible. This is a common problem with convolution computed by DFFT for big polynomials.
Thank you everyone for your opinions. All that stuff about memory corruption, while it makes a point, is not the root of the problem. The sizes of data I'm mallocing are not overly large, and I am freeing them in the right places. I had a lot of practice with this while learning c. The problem was not the fft algorithm either, nor even my 2D implementation of it.
All I missed was the scaling by 1/(M*N) at the very end of my ifft code. Because the image is 512x512, I needed to scale my ifft output by 1/(512*512). Also, my fft looks like white noise because the pixel data was not rescaled to fit between 0 and 255.
Suggest you look at the article http://www.yolinux.com/TUTORIALS/C++MemoryCorruptionAndMemoryLeaks.html
Christophe has a good point but he is wrong about it not being related to the problem because it seems that in modern times using malloc instead of new()/free() does not initialise memory or select best data type which would result in all problems listed below:-
Possibly causes are:
Sign of a number changing somewhere, I have seen similar issues when a platform invoke has been used on a dll and a value is passed by value instead of reference. It is caused by memory not necessarily being empty so when your image data enters it will have boolean maths performed on its values. I would suggest that you make sure memory is empty before you put your image data there.
Memory rotating right (ROR in assembly langauge) or left (ROL) . This will occur if data types are being used which do not necessarily match, eg. a signed value entering an unsigned data type or if the number of bits is different in one variable to another.
Data being lost due to an unsigned value entering a signed variable. Outcomes are 1 bit being lost because it will be used to determine negative or positive, or at extremes if twos complement takes place the number will become inverted in meaning, look for twos complement on wikipedia.
Also see how memory should be cleared/assigned before use. http://www.cprogramming.com/tutorial/memory_debugging_parallel_inspector.html

Convert cv::Mat to openni::VideoFrameRef

I have a kinect streaming data into a cv::Mat. I am trying to get some example code running that uses OpenNI.
Can I convert my Mat into an OpenNI format image somehow?
I just need the depth image, and after fighting with OpenNI for a long time, have given up on installing it.
I am using OpenCV 3, Visual Studio 2013, Kinect v2 for Windows.
The relevant code is:
void CDifodoCamera::loadFrame()
{
//Read the newest frame
openni::VideoFrameRef framed; //I assume I need to replace this with my Mat...
depth_ch.readFrame(&framed);
const int height = framed.getHeight();
const int width = framed.getWidth();
//Store the depth values
const openni::DepthPixel* pDepthRow = (const openni::DepthPixel*)framed.getData();
int rowSize = framed.getStrideInBytes() / sizeof(openni::DepthPixel);
for (int yc = height-1; yc >= 0; --yc)
{
const openni::DepthPixel* pDepth = pDepthRow;
for (int xc = width-1; xc >= 0; --xc, ++pDepth)
{
if (*pDepth < 4500.f)
depth_wf(yc,xc) = 0.001f*(*pDepth);
else
depth_wf(yc,xc) = 0.f;
}
pDepthRow += rowSize;
}
}
First you need to understand how your data is coming... If it is already in cv::Mat you should be receiving two images, one for the RGB information that usually is a 3 channel uchar cv::Mat and another image for the depth information that usually it is saved in a 16 bit representation in milimeters (you can not save float mat as images, but you can as yml/xml files using opencv).
Assuming you want to read and process the image that contains the depth information, you can change your code to:
void CDifodoCamera::loadFrame()
{
//Read the newest frame
//the depth image should be png since it is the one which supports 16 bits and it must have the ANYDEPTH flag
cv::Mat depth_im = cv::imread("img_name.png",CV_LOAD_IMAGE_ANYDEPTH);
const int height = depth_im.rows;
const int width = depth_im.cols;
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
if (depth_im<unsigned short>(y,x) < 4500)
depth_wf(y,x) = 0.001f * (float)depth_im<unsigned short>(y,x);
else
depth_wf(y,x) = 0.f;
}
}
}
I hope this helps you. If you have any question just ask :)

How to read data from buffer selectively

I have an 720x576 picture saved row by row in an unsigned char luma[414720] and I need to display a centered picture with size 640x480.
My question is:
What is the most efficient way to selectively access to the data saved in one buffer using just one for cycle?
Thanks for your answers.
Petr Duga
Try this:
newLuma is the new pic to be displayed.
int i= 0;
char newLuma[640*480];
int rowStart = (576 - 480)/2 -1 ;
int colStart = (720 - 640)/2 -1 ;
for ( i = 0; i < 480; i++)
{
memcpy(newLuma[i*640], luma[720*(rowStart + i) + colStart], 640);
}

How to access image Data from a RGB image (3channel image) in opencv

I am trying to take the imageData of image in this where w= width of image and h = height of image
for (int i = x; i < x+h; i++) //height of frame pixels
{
for (int j = y; j < y+w; j++)//width of frame pixels
{
int pos = i * w * Channels + j; //channels is 3 as rgb
// if any data exists
if (data->imageData[pos]>0) //Taking data (here is the problem how to take)
{
xPos += j;
yPos += i;
nPix++;
}
}
}
jeff7 gives you a link to a very old version of OpenCV. OpenCV 2.0 has a new C++ wrapper that is much better than the C++ wrapper mentioned in the link. I recommend that you read the C++ reference of OpenCV for information on how to access individual pixels.
Another thing to note is: you should have the outer loop being the loop in y-direction (vertical) and the inner loop be the loop in x-direction. OpenCV is in C/C++ and it stores the values in row major.
See good explanation here on multiple methods for accessing pixels in an IplImage in OpenCV.
From the code you've posted your problem lies in your position variable, you'd want something like int pos = i*w*Channels + j*Channels, then you can access the RGB pixels at
unsigned char r = data->imageData[pos];
unsigned char g = data->imageData[pos+1];
unsigned char b = data->imageData[pos+2];
(assuming RGB, but on some platforms I think it can be stored BGR).
uchar* colorImgPtr;
for(int i=0; i<colorImg->width; i++){
for(int j=0; j<colorImg->height; j++){
colorImgPtr = (uchar *)(colorImg->imageData) + (j*colorImg->widthStep + i-colorImg->nChannels)
for(int channel = 0; channel < colorImg->nChannels; channel++){
//colorImgPtr[channel] here you have each value for each pixel for each channel
}
}
}
There are quite a few methods to do this (the link provided by jeff7 is very useful).
My preferred method to access image data is the cvPtr2D method. You'll want something like:
for(int x = 0; x < width; ++x)
{
for(int y = 0; y < height; ++y)
{
uchar* ptr = cvPtr2D(img, y, x, NULL);
// blue channel can now be accessed with ptr[0]
// green channel can now be accessed with ptr[1]
// red channel can now be accessed with ptr[2]
}
}
(img is an IplImage* in the above code)
Not sure if this is the most efficient way of doing this etc. but I find it the easiest and simplest way of doing it.
You can find documentation for this method here.