What is the format for a mono channel 16bit PCM - pcm

How are mono channel 16bit PCM arranged in LSB and MSB since it will no longer be [LEFT LSB][LEFT MSB][RIGHT LSB][RIGHT MSB]
EDIT
I don't understand the reason for the downvote but to better explain the question...
In a stereo channel (LEFT and RIGHT channel) 16bit PCM, 4 byte represent a sample stored as [LEFT CHANNEL LSB][LEFT CHANNEL MSB][RIGHT CHANNEL LSB][RIGHT CHANNEL MSB]. But in a mono channel 16bit PCM there are no left or right channel, how are the LSB and MSB stored?

Solved my confusion after hours of thinking! The confusion started from the thought that since the PCM is 16bit, a sample will always be 4 bytes whether it is Mono or Stereo but I was wrong! A Stereo sample will be 4 bytes when a Mono sample will be 2 bytes so it is stored as [LSB][MSB] for a sample!

Related

GStreamer Exposure Compensation (Exposure Value)

I have an imaging source which has a bit depth of 12 feeding into GStreamer. The eventual output will have a bit depth of 8. The scene being imaged is very low light. I can take images captured from this source (16-bit TIFFs), and in GIMP I can adjust the Exposure (Colors -> Exposure) to make details visible which were not previously.
Currently, when GStreamer is converting from 12-bit to 8-bit, it seems like it is compressing the entire 12-bit range into the smaller 8-bit range. What I'd like to do instead is maybe chop off the upper 4 bits, or ideally apply some sort of level curve to the image (which is what I think GIMP is doing).
I've looked around, but can't seem to find any way to do this. I've tried the videobalance plugin, but it doesn't seem to do what I want.
Any ideas?

How to read raw audio data in c++?

I'm trying to do a Fourier Transform on audio file. So far I've managed to read the header of the file with the help of this answer. This is the output.
The audio format is 1 which means PCM and I should really easily be able to work with the data. However, this is what I can't figure out.
Is the data binary and I should convert it to float or something else that I can't understand?
Yes, it's binary. Specifically, it's signed 16-bit integers.
You may want to convert it to float or double depending on your FFT needs.
I suggest you use a mono channel input audio file ... the sample you showed has two channels ( stereo ) which complicates the data slightly ... for a mono PCM file the structure is
two-bytes-sample-A immediately followed by two-bytes-sample-B ... etc.
in PCM each such sample directly corresponds to a point on the analog audio curve as the microphone diaphragm (or your eardrum) wobbles ... paying attention to correct use of endianness of your data each of these samples will result in an integer using all 16 bits so the unsigned integers go from values of 0 up to (2^16 - 1) which is 0 to 65535 .... confirm your samples stay inside this range IF they are unsigned

What is the raw form of a compressed image file format(jpeg, PNG , gif)?

As we know jpeg , PNG , gif are all compressed file formats, my question is what is the original source of input we provide to these compression algorithms and in which form a image data is stored before it gets converted into one of these file formats.
That depends.
PNG is generally lossless, but it does have a limit on the number of bits/pixel. GIF turns out to be lossless, too, but it is more complicated to get a high number of colors. These formats are still compressed, but use a compression that doesn't lose data.
JPEG is lossy. If you save as a JPEG, you will not be able to revert back to another format without losing some clarity. By representing the data as equations it can get quite small, but it can start to look "blurry" as the approximations get worse.
There are other images formats, like TIFF, RAW and BMP, which generally don't do any compression, although they are really more like containers and technically can contain compressed data, but they usually don't.
The original, uncompressed, data depends on what generates it. A photoshop file will save as a PSD but internally may represent it differently in memory. Every digital camera may have a different way of laying out its internal memory, and the photo sensors tend to map 1 to 1 from a sensor to a memory location of a set number of bits.
The common pattern, however, is that each pixel of the image is stored as 3 (sometimes 4) color values, each one between 8 and 16 bits. The 3 values may represent Red, Green and Blue, or alternatively Hue, Saturation and Value. For design, it could be CMYK (Cyan, Magenta, Yellow and blacK). There could also be an alpha value. It's unusual to use more than 16 bits for each color channel and most common to use 8. Using 12 bits is considered by most to be full color, but that doesn't align very well on 32 bit or even 64 bit machines. Still, 12 bit is used sometimes in digital video signals since when broadcast serially the color values don't need to fit into words.
Different formats will go in a different order. Usually rows first, but some formats start at the bottom row and some start at the top.
So, the real answer is it depends on what the particular compressor is looking for. Most software that saves as JPEG or PNG will accept multiple formats and the most common is probably 32bit/pixels with 8 bytes each for RGB (red, green, blue) and one either unused or alpha. It will need width and height of the image so the image data should be width*height*4 in bytes. You generally pass in a defined constant that tells it the byte order: RBGA, ARGB, BGR, RGB, etc.

FFmpeg: How to estimate number of samples in audio stream?

I'm currently writing a small application that's making use of the FFmpeg library in order to decode audio files (especially avformat and swresample) in C++.
Now I need the total number of samples in an audio stream. I know that the exact number can only be found out by actually decoding all the frames, I just need an estimation. What is the preferred method here? How can I find out the duration of a file?
There's some good info in this question about how to get info out of ffmpeg: FFMPEG Can't Display The Duration Of a Video.
To work out the number of samples in an audio stream, you need three basic bits of info:
The duration (in seconds)
The sample rate (in samples per second)
The number of channels in the stream (e.g. 2 for stereo)
Once you have that info, the total number of samples in your stream is simply [duration] * [rate] * [channels].
Note that this is not equivalent to bytes, as the samples are likely to be at least 16 bit, and possibly 24.
I believe what you need is the formula that is AUDIORATE / FRAMERATE. For instance, if ar=48000, and frame rate of video is let's say 50fps then 48000/50 = 960 samples per frame you need.
Buffer calculation comes later as samples_per_frame * nChannels * (audiobit/8).
AudioBit is usually 16bit (24 or 32bits also possible). So for 8 channels audio at 16bit 48Khz, you'll need 960 * 8 * 2 = 15360 bytes per audio frame.
Offical way to do this last calculation is to use :
av_samples_get_buffer_size(NULL, nChannels, SamplesPerFrame, audio_st->codec->sample_fmt, 0)
function.
av_samples_get_buffer_size(NULL, 8, 960, audio_st->codec->sample_fmt, 0)
will return also 15360 (For experts: yes I'm assuming format is pcm_s16le).
So this answers first part of your question. Hope that helps.

Are there any supported high bit-depth video or image formats in DirectShow

In the Microsoft DirectShow documentation there appear to be no RGB video or image formats with more than 8 bits per channel. There are some YUV formats with 10 or 16 bits per channel but I've not found much support for them by googling.
Video Subtype GUIDs
10-bit and 16-bit YUV Video Formats
Are there any supported DirectShow formats or FourCC video or image formats (third party or not) for greater definition than 8 bits per channel?
DirectShow itself - as the framework - has no problems supporting 10 bpp and more, it's a matter of video renderers and video adapters to support these formats. Some professional hardware from Balckmagic definitely supports 10 bpp formats, including within DirectShow API.