ffmpeg: Endianness of audio samples - c++

I use ffmpeg's avcodec to retrieve raw audio samples from music files in my c++ application. For files I test it on it appears that these files samples' endianness is little-endian, but I wonder if that will be always true for all files I'd try to decode (i.e. that comes from ffmpeg's implementation or at least it's architecture-specific since mine computer's architecture uses little endian). If not, I assume it would depend on particular file's encoding format. In that case how can I check which endianess applies for each file I'm decoding? I can't find any relevant information in the docs.

Internally ffmpeg always uses native endianness for audio samples since it makes it easier to perform various manipulations on the data (see libavutil/samplefmt.h file for some documentation on the matter); it is codec's task to convert to/from an appropriate endianness as dictated by file format. As a simple example of this: there is a family of trivial audiocodecs for reading/writing raw samples called pcm_*; e.g. there are pcm_s16le and pcm_s16be. On little-endian architecture pcm_s16le will do no conversion while pcm_s16be will swap bytes when decoding/encoding data.

As Andrey said, FFMpeg internally decodes to native endianness. This is mentioned in the header file for libavutil/samplefmt.h
* Audio sample formats
*
* - The data described by the sample format is always in native-endian order.
* Sample values can be expressed by native C types, hence the lack of a signed
* 24-bit sample format even though it is a common raw audio data format.
It doesn't describe though *le or *be. The formats available are:
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
Generally you will get planar 16 bit signed samples.

Related

Does JPEG allow lossless encoding?

As far as I know, a modern JPEG decoder produces the same image when given the same input JPEG file.
Normally, we create JPEG files in such a way that the decoded image is an approximation of some input image.
Is the JPEG format flexible enough to allow lossless encoding of arbitrary input images with a custom encoder?
I'd image you'd at least have to fiddle with how quantization tables are used to essentially disable them? Perhaps something else?
(To be clear, I don't mean the special 'lossless' mode in JPEG that many decoders don't support. I am talking about using the default, mainstream code path through the decoder.)
No. Even with no quantization, the RGB to YCbCr transformation is lossy in the low few bits. Also the chroma channels are then downsampled, but that step can be skipped. While the DCT is mathematically lossless, in reality it is lossy in the least significant bit or two in the integer representation.

How to read raw audio data in c++?

I'm trying to do a Fourier Transform on audio file. So far I've managed to read the header of the file with the help of this answer. This is the output.
The audio format is 1 which means PCM and I should really easily be able to work with the data. However, this is what I can't figure out.
Is the data binary and I should convert it to float or something else that I can't understand?
Yes, it's binary. Specifically, it's signed 16-bit integers.
You may want to convert it to float or double depending on your FFT needs.
I suggest you use a mono channel input audio file ... the sample you showed has two channels ( stereo ) which complicates the data slightly ... for a mono PCM file the structure is
two-bytes-sample-A immediately followed by two-bytes-sample-B ... etc.
in PCM each such sample directly corresponds to a point on the analog audio curve as the microphone diaphragm (or your eardrum) wobbles ... paying attention to correct use of endianness of your data each of these samples will result in an integer using all 16 bits so the unsigned integers go from values of 0 up to (2^16 - 1) which is 0 to 65535 .... confirm your samples stay inside this range IF they are unsigned

What is the raw form of a compressed image file format(jpeg, PNG , gif)?

As we know jpeg , PNG , gif are all compressed file formats, my question is what is the original source of input we provide to these compression algorithms and in which form a image data is stored before it gets converted into one of these file formats.
That depends.
PNG is generally lossless, but it does have a limit on the number of bits/pixel. GIF turns out to be lossless, too, but it is more complicated to get a high number of colors. These formats are still compressed, but use a compression that doesn't lose data.
JPEG is lossy. If you save as a JPEG, you will not be able to revert back to another format without losing some clarity. By representing the data as equations it can get quite small, but it can start to look "blurry" as the approximations get worse.
There are other images formats, like TIFF, RAW and BMP, which generally don't do any compression, although they are really more like containers and technically can contain compressed data, but they usually don't.
The original, uncompressed, data depends on what generates it. A photoshop file will save as a PSD but internally may represent it differently in memory. Every digital camera may have a different way of laying out its internal memory, and the photo sensors tend to map 1 to 1 from a sensor to a memory location of a set number of bits.
The common pattern, however, is that each pixel of the image is stored as 3 (sometimes 4) color values, each one between 8 and 16 bits. The 3 values may represent Red, Green and Blue, or alternatively Hue, Saturation and Value. For design, it could be CMYK (Cyan, Magenta, Yellow and blacK). There could also be an alpha value. It's unusual to use more than 16 bits for each color channel and most common to use 8. Using 12 bits is considered by most to be full color, but that doesn't align very well on 32 bit or even 64 bit machines. Still, 12 bit is used sometimes in digital video signals since when broadcast serially the color values don't need to fit into words.
Different formats will go in a different order. Usually rows first, but some formats start at the bottom row and some start at the top.
So, the real answer is it depends on what the particular compressor is looking for. Most software that saves as JPEG or PNG will accept multiple formats and the most common is probably 32bit/pixels with 8 bytes each for RGB (red, green, blue) and one either unused or alpha. It will need width and height of the image so the image data should be width*height*4 in bytes. You generally pass in a defined constant that tells it the byte order: RBGA, ARGB, BGR, RGB, etc.

jpeg compression - lossy or lossless

I have few questions on JPEG Compression.
In my windows system, I have some image processing application. For example, Windows msPaint: which provides an option to convert BMP image to JPEG format.
Can anybody please tell me, what is the JPEG compression here using in mspaint- is it lossy or lossless.
If somebody is referring to "JPEG Standard compression", which compression it is internally using: lossy or lossless?
Thanks in advance.
Alvin
JPEG is a family of related compression techniques. There is lossless JPEG but is it generally relegated to 12bit, medical applications.
Any JPEG that you are likely to use creates loss. This occurs at several steps.
The transformation from the RGB to YCbCR. The two color spaces intersect but do not have the same gamut of colors. RGB colors outside of YCbCr get clamped into range. Also the transformation from RGB to YCbCr is a floating point operation that creates integer values, so there are rounding errors.
The Discrete Cosine Transform is usually performed on the data using scaled integers. This introduces small rounding errors. Even if you do this in floating point there will be some small errors and the values have to be rounded to integers for the final output.
Quantization is the big one. This divides the DCT output by integer values. You can eliminate rounding at this step by making all the quantization values 1.
JPEG compression is considered a lossy compression, because it is not possible to build the exact binary from an original source through uncompression.
Even at the highest quality, JPEG works by discarding data. You control the quality to trade off what you think is an acceptable loss to still have a fair representation of your image. Although data is lost, what can be seen might still be identical to the untrained eye - and that is the point. The same as what minidisc used to do for audio.
The intent for JPEG is to make photographic images smaller in file size for internet transmission, you get to decide how small, but if you want absolute quality a format like TIFF is better suited.
Incidently, TIFF offers a lossless compression, but the file sizes are still massive!
One more thing... If you take a 300 x 500 bitmap and convert it to JPEG, then convert it back. The file size will still be the same, because bitmap works by storing a common number of bits per pixel. But the contents of the file will be quite different. In this regard it might be naively viewed as lossless, but in practical terms it is far from it.

Writing 10,12 bit TIFF files with LibTIFF C++

I'm trying to write 10,12 bit RGB TIFF files with LibTIFF.
The pixel data is saved locally in an unsigned short buffer (16bits)
1) If I set TIFFTAG_BITSPERSAMPLE to 10 or 12, not enough bits are being read from the buffer, and the output is incorrect. (I understand that it is just reading 10 or 12 bits per component, instead of 16 and this is the problem)
2) I tried packing the bits in the buffer, so that it is really 12-R, 12-G, 12-B. In this case, I think the file is being written correctly but no viewer I could find could display this image properly.
3) If I set TIFFTAG_BITSPERSAMPLE to 16, viewers can display the TIFF image, but then I have a problem that I don't know if the image was originally 10 or 12 bits (If I want to later read it with LibTIFF). Also, the viewer expects the dynamic range to be 16 bits and not 10 or 12, also resulting in a bad view.
4) The most annoying part is that I couldn't find one 10, 12, or 14 bit TIFF image on the web to see what the header is supposed to look like.
So finally, what is the proper way to write 10 or 12 bit Image data to a TIFF file ?????
The TIFF specification does not specify a way to store 10, 12 or 14 bits per channel in an image. Depending on the encoder and decoder, it may still be possible to work with such images, but it is effectively an implementation detail, as they are not required to do this.
If you want more than 8 bits of precision in a TIFF, your only choice is 16 (or floating point, but that's a different story).
I'm not aware of any image format with specific support for these bitdepths, so viewers will likely be a problem anyway if you must store the image with that specific bitdepth. The simplest workaround I can think of would be to just store as 16 bits per pixel and put the original bitdepth as metadata (e.g. in an ImageDescription tag), but it all depends on what the images will be used for and why you need this information.
You can store the image as a multi-image file. For example, with a 12 bit source, one image would be an RGB(8) image using the upper 8 bits and a second 16bit gray scale that was a combination of the low four bits and four bits of padding. This gives a TIFF that can be viewed with on a monitor with standard programs and the extra precision can be retrieved with custom software.
I disagree that 'exotic' bit depths are not good. This format would reduce the image size by 5/6. You could even just store the 2nd image as a re-scaled version that would have the 4 bits tightly packed without padding for a 3/4 size reduction. This savings can be significant with very large data sets, where compression is not an option due to the nature of the data. Ie, many scientific and machine vision applications may want the un-adultered bits. The ability to convert from the multi-image tiff to a 16-bit tiff would allow the use of standard programs and image libraries.