Using the libsndfile library to read a WAV file in C++ - c++

I am using libsndfile in C++ to read a WAV file. There are two points I don't understand:
How can I get the "Bits per sample" value for the WAV file in question? I read the documentation at the website http://www.mega-nerd.com/libsndfile/api.html, but I didn't find a member for "Bits per sample" in the SF_INFO struct.
Using the WAV file, how can I create the data to use to draw vectors for describing the sound data, read by function sf_readf_float() in library sndfile.h? Is there any method to do this?

The format field will give you the BPS. For e.g.: SF_FORMAT_PCM_16.
The sf_readf_float will convert the sample into the -1.0 to 1.0 range no matter the bps of the input sound. You only have to take care that about the audio channels. If the audio has 2 channels and you read 4 floats, you will have:
sample-1 of left channel
sample-1 of right channel
sample-2 of left channel
sample-2 of right channel
Then, to draw the point you must convert the [-1.0;1.0] to the viewport height. For e.g. if viewport is at Y=20 and the height is 300px, the formula is:
PY = (int)(20.0 + (sample_value / 2.0 + 0.5) * 300.0);

Related

opencv can't open a yuv422 image while rawpixels.net can display the image

I am trying to open a yuv format image. I can open it with rawpixels.net and display it after setting the following
width:1920
height:1080
predefined format: yuv420 (nv12)
pixel format yuv
But if I open with opencv with the following code I failed to open.
#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/opencv.hpp>
int main() {
std::cout << "OpenCV version: " << CV_VERSION << std::endl;
cv::Mat image = cv::imread("camera_capture_256_2020_10_07_11_11_02.yuv");
if (image.empty() == true) {
std::cout << "image empty"<< std::endl;
return 0;
}
cv::imshow("opencv_logo", image);
cv::waitKey(0);
return 0;
}
The program prints as "image empty".
I am puzzled why I can't open the file with opencv.
The sample image is found here.
The yuv image opened with rawpixels.net would look like this.
Thanks,
The very first thing to do when dealing with raw (RGB, BGR, YUV, NV12 and others) images is to know the dimensions in pixels of the image - you are really quite lost without those - though you can do certain tricks to look for correlation to find the row width since each row is essentially similar to the one above normally.
The next thing is to check the filesize is correct. So if it is RGB and 8-bit 1920x1080, your file must be 1920x1080x3 pixels in size - if not there is a problem. Your image is 1920x1080 and NV12 which is 12-bits or 1.5 bytes per pixel, so I expect your file to be 1920x1080*1.5 bytes. It is not that, so there is immediately a problem. There is either a header, or multiple frames or trailing data or some other issue.
So, where is the image data in the file? At the start? At the end? One way to solve this is to look at the file as though it was purely a greyscale image and see if there are large blocks of black which are zero bytes or padding. As there is no known image size, I generally take the file size in bytes and go to Wolfram Alpha website and type in "factors of XXX" where XXX is the file size and then choose 2 numbers near the square-root of the file size so I get a square-ish image. So for yours, I chose 2720x3072 and treated your file as a single greyscale image of that size. Using ImageMagick in Terminal:
magick -depth 8 -size 2720x3072 gray:camera_preview_250_2020_10_07_11_11_02.yuv image.jpg
I can see, at a glance that the data are at the start of the file and the end of the file is zero-padding, i.e. black. If the black had been at the start of the image, I would have taken the final H x W x 1.5 bytes.
Another alternative for this step, is to take the file size in bytes and divide it by the image width to get a number of lines and see how that looks. So your file is 8355840 bytes, that would be 8355840/1920 or 4,325 lines. Let's try that:
magick -depth 8 -size 1920x4352 gray:camera_preview_250_2020_10_07_11_11_02.yuv image.jpg
That is very encouraging because we can see the Y (greyscale) image at the start of the file and some lower-resolution UV channels following, and the fact that there are not 2 separate channels following probably means they are interlaced, alternating U and V samples rather than planar U samples followed by V samples.
Ok, if your data is YUV or NV12, the best tool for that is ffmpeg. We already know that the data is at the start of the file and we know the dimensions and the format. We also know that there is padding after the image, so we need to just take the first frame like this:
ffmpeg -s 1920x1080 -pix_fmt nv12 -i cam*yuv -frames:v 1 image.png
Now we have confidence about the dimensions and format, we need OpenCV to read that. The normal cv2.imread() cannot read that because it is just raw data, and unlike JPEG or PNG or TIFF, there is no image height and width in a header - it is just pure sensor data.
So, you need to use the regular C/C++ read() system call to get the first 1920x1080x1.5 bytes. Then you need to call cv2.cvtColor() on the received buffer to convert it to a regular BGR format Mat.

ffmpeg Get Audio Samples in a specific AVSampleFormat from AVFrame

I am looking at the example from ffmpeg docs:
Here
static int output_audio_frame(AVFrame *frame)
{
size_t unpadded_linesize = frame->nb_samples * av_get_bytes_per_sample(frame->format);
printf("audio_frame n:%d nb_samples:%d pts:%s\n",
audio_frame_count++, frame->nb_samples,
av_ts2timestr(frame->pts, &audio_dec_ctx->time_base));
/* Write the raw audio data samples of the first plane. This works
* fine for packed formats (e.g. AV_SAMPLE_FMT_S16). However,
* most audio decoders output planar audio, which uses a separate
* plane of audio samples for each channel (e.g. AV_SAMPLE_FMT_S16P).
* In other words, this code will write only the first audio channel
* in these cases.
* You should use libswresample or libavfilter to convert the frame
* to packed data. */
fwrite(frame->extended_data[0], 1, unpadded_linesize, audio_dst_file);
return 0;
}
The issue is decoder's format cant be set so it will give me audio samples in any of the following types:
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = -1, AV_SAMPLE_FMT_U8, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_S32,
AV_SAMPLE_FMT_FLT, AV_SAMPLE_FMT_DBL, AV_SAMPLE_FMT_U8P, AV_SAMPLE_FMT_S16P,
AV_SAMPLE_FMT_S32P, AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_DBLP, AV_SAMPLE_FMT_S64,
AV_SAMPLE_FMT_S64P, AV_SAMPLE_FMT_NB
}
I am working with a sound engine and the engine requres me to send float [-1 to 1] PCM data to the engine so I would like to obtain the frame's audio data as float for the two channels (stereo music). How may I do that? Do I need to use libswresample? If so can anyone sent me an example for my case
Encoding audio Example
Resampling audio Example
Transcoding Example
If you don't get the desired format from the decoder, you have to resample it, encode with AV_SAMPLE_FMT_FLT.
According to enum AVSampleFormat
The floating-point formats are based on full volume being in the range [-1.0, 1.0]. Any values outside this range are beyond full volume level.
All the Examples are well documented and not that complicated. The function names alone are very explanatory, so it shouldn't be that hard to understand.

Distorted Image in Secondary Capture DICOM file

I want to create a secondary capture DICOM file as per the requirements.
I created one, but the image( pixel data in the tag 7FE0 0010 ) looks distorted. I am reading a JPEG image using Gdiplus::Bitmap and using API ::LockBits and 'btmpData.Scan0' to get the pixel data. The same is inserted into the pixel data tag - 7FE0,0010. But while viewing the same in a DICOM viewer, it is coming as distorted. The dicom tags Rows, Columns, PlannarConfiguration are updated properly. BitsAllocated, BitsStored and HighBit are given values 8,8 and 7 respectively.
While goggling I came to know that, instead of RGB format, the bits might be in the order BGR. Hence I tried to switch the bits in the place 'B' and 'R'.
But still the issue exist. Could anybody help me ?
Apparently you forgot to take into account Stride support from GDI+. An image being much more explicit than 1000 words here is what I mean:, the actual full article being here.

FFMPEG API: decode MPEG to YUV frames and change these frames

I need save all frames from MPEG4 or H.264 video to YUV-frames using C++ library. For example, in .yuv, .y4m or .y format. Then I need read these frames like a digital files and change some samples (Y-value). How can I do it without convert to RGB?
And how store values of AVFrame->data? Where store Y-, U- and V-values?
Thanks and sorry for my English=)
If you use libav* to decode, you will receive the frames in their native colorspace (usually YUV 420) But it is what ever was chosen at encode time. Assuming you are in YUV420 or convert to YUV420 y: AVFrame->data[0], u: AVFrame->data[1], v: AVFrame->data[2]
For Y, 1 byte per pixel AVFrame->data[0][(x*AVFrame->linesize[0]) + y]
For U and V its 4 pixles per byte (quarter resolution of Y plane). So
AVFrame->data[1][(x/2*AVFrame->linesize[1]) + y/2], AVFrame->data[2][(x/2*AVFrame->linesize[2]) + y/2]

audio waveform to Integer sequence

I need to create an Integer Sequence from an Audio file. I was checking the waveform libraries as that draw a linear graph. But I am searching for the key information, What is the source of the integer that is used to draw the graph ? is it amplitude ? frequency ? or something else ? There are libraries available but I need to know what unit of information I need to extract to have a data that I can feed to a graph. However drawing a graph is not my objective. I just want that raw integer array.
Of course, it's the amplitudes what you need to get a wave oscillogram, and it's the way PCM data are stored in wav files, for example (data which come directly after the file header). Note that there are 8-bit and 16-bit formats, the latter may be also big-endian or little-endian depending on the byte order (just to keep you aware of it).
Audio is simply a curve - when you plot it with time across the X axis then Y axis is amplitude - similar to plotting a sin math function - each point on the curve is a number which gets stored in the audio file - WAV format this number typically is a 16 bit unsigned integer - so ignoring the 44 byte header - the rest of the file is just a sequence of these integer numbers. When this curve varies up and down quickly over time the frequency is higher than if the curve varies more slowly over time. If you download the audio workbench application : Audacity you can view this curve of any audio file (WAV, mp3,...)