I'm trying to do a Fourier Transform on audio file. So far I've managed to read the header of the file with the help of this answer. This is the output.
The audio format is 1 which means PCM and I should really easily be able to work with the data. However, this is what I can't figure out.
Is the data binary and I should convert it to float or something else that I can't understand?
Yes, it's binary. Specifically, it's signed 16-bit integers.
You may want to convert it to float or double depending on your FFT needs.
I suggest you use a mono channel input audio file ... the sample you showed has two channels ( stereo ) which complicates the data slightly ... for a mono PCM file the structure is
two-bytes-sample-A immediately followed by two-bytes-sample-B ... etc.
in PCM each such sample directly corresponds to a point on the analog audio curve as the microphone diaphragm (or your eardrum) wobbles ... paying attention to correct use of endianness of your data each of these samples will result in an integer using all 16 bits so the unsigned integers go from values of 0 up to (2^16 - 1) which is 0 to 65535 .... confirm your samples stay inside this range IF they are unsigned
Related
I am doing a fun project to change Philips Hue bulb lights color based on the sound that is coming from the default ALSA device.
I want to write small C++ program that captures and analyzes default audio stream and split it into 3 changes low, mid, and high, then assing those channels to red, green, and blue.
I am trying to read how to create ALSA devices but I am struggling to figure out and Google how to capture streams with ALSA. This is the first time I work with Audio and ALSA. I am trying to avoid using python for now as I want to learn a bit more.
If you believe that it is not worth writing this on C++ I will do it in python.
This answer is broken into two parts. The first part discusses how to take the audio data and use it to represent LED "bits" for use in LED brightness setting. The second part discusses how to use C++ to read audio data from the ALSA sound card.
Part 1
An idea for splitting into RGB, you could work out how to convert the audio samples into 24 bit representation in a "perceptual manner". As we hear nonlinearly, you probably want to take the logarithm of the audio data. Because the audio data is both positive and negative, you probably want to do this on its absolute value. Finally for each buffer read from the ADC audio input, you probably want to take the RMS first (which will handle doing the absolute value for you).
So the steps in processing would be :
Capture the audio buffer
Take the RMS for each column of the audio buffer (each column is an audio channel).
Take the logarithm of the RMS value for each column.
Work out how to map each channel's log(RMS) value onto the LEDs. One idea is to use log base 2 (log2) of the RMS of the audio data as that will give you 32 bits of data, which you can divide down (rotate by 8 : log2(RMS) << 8) to get a 24 bit representation. Then work out how to map these 24 bits onto LEDs to achieve your aim.
For example in pseudo code :
float loudness=log2(RMS(buffer);
if (loudness)>pow(2.,16.))
setTheRedLED(loudness/pow(2.,16.));
else if (loudness)>pow(2.,8.))
setTheBlueLED(loudness/pow(2.,8.));
else
setTheGreenLED(loudness);
Part 2
You can use gtkiostream to implement C++ classes for handling audio with ALSA.
For example this ALSA::Capture class allows you to capture audio for processing.
To use it you include it into your code :
#include "ALSA/ALSA.H"
using namespace ALSA;
Then you can stream in audio to a matrix (matrix columns are audio channels). First however you instantiate the class in your C++ code :
Capture capture("hw:0"); // to open the device hw:0 you could use "default" or another device
// you can now reset params if you don't want to use the default, see here : https://github.com/flatmax/gtkiostream/blob/master/applications/ALSACapture.C#L82
capture.setParams(); // set the parameters
if (!capture.prepared()){
cout<<"should be prepared, but isn't"<<endl;
return -1;
}
// now define your audio buffer you want to use for signal processing
Eigen::Array<int, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> buffer(latency, chCnt);
// start capturing
if ((res=capture.start())<0) // start the device capturing
ALSADebug().evaluateError(res);
cout<<"format "<<capture.getFormatName(format)<<endl;
cout<<"channels "<<capture.getChannels()<<endl;
cout<<"period size "<<pSize<<endl;
// now do an infinite loop capturing audio and then processing it to do what you want.
while (true){
capture>>buffer; // capture the audio to the buffer
// do something with the audio in the buffer to separate out for red blue and green
}
A more complete capture example is available here.
I use ffmpeg's avcodec to retrieve raw audio samples from music files in my c++ application. For files I test it on it appears that these files samples' endianness is little-endian, but I wonder if that will be always true for all files I'd try to decode (i.e. that comes from ffmpeg's implementation or at least it's architecture-specific since mine computer's architecture uses little endian). If not, I assume it would depend on particular file's encoding format. In that case how can I check which endianess applies for each file I'm decoding? I can't find any relevant information in the docs.
Internally ffmpeg always uses native endianness for audio samples since it makes it easier to perform various manipulations on the data (see libavutil/samplefmt.h file for some documentation on the matter); it is codec's task to convert to/from an appropriate endianness as dictated by file format. As a simple example of this: there is a family of trivial audiocodecs for reading/writing raw samples called pcm_*; e.g. there are pcm_s16le and pcm_s16be. On little-endian architecture pcm_s16le will do no conversion while pcm_s16be will swap bytes when decoding/encoding data.
As Andrey said, FFMpeg internally decodes to native endianness. This is mentioned in the header file for libavutil/samplefmt.h
* Audio sample formats
*
* - The data described by the sample format is always in native-endian order.
* Sample values can be expressed by native C types, hence the lack of a signed
* 24-bit sample format even though it is a common raw audio data format.
It doesn't describe though *le or *be. The formats available are:
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
Generally you will get planar 16 bit signed samples.
Apologies if this sounds like a stupid question, I'm relatively new to VST development. I'm trying to build a plugin using the JUCE framework and I'm currently trying to test it with a sine wave .wav file. When I open the .wav file in Audacity it tells me it's 44100Hz and it's a 32 bit float. When I load this same file into matlab the first three samples are something like 0.00, 0.0443, 0.0884... However, when I put the same file into Ableton and Reaper, and try to step through the code I find the first three samples of the same file are 0.00000000, 0.00012068315, 0.00048156900... I see this when I peak into the memory in VS and look at it in 32 bit floating point view. Why are my sample values so much smaller?
My problem is that I need the audio to have the same sample values as they are in Matlab for my algorithm to work. Obviously there's a conversion happening that I have no control of. Can anyone shed any light on this problem and how I should go about fixing it. It looks like a scaling problem maybe. Ableton is being run in 32-bit mode and my VST is being compiled as 32-bit.
I can also provide more samples if that helps.
Thanks
The problem is because Ableton and Reaper were converting the 32 bit audio to 16 bit audio. I was able to check this by loading the sine.wav into Ableton and export it at 16bit. I then loaded the file into Matlab and got the smaller samples like above. My next problem is to figure out a way to convert the 16bit audio to 32bit audio within the VST.
What you seem to describe is a very low amplitude, thus you will have little values (you could convert your sample values in dB to verify that). Usually an audio signal is ranging between -1 and +1, where the extrema represent the max possible volume in the digital world (aka 0dB).
I believe the dilemma between 16 bits and 32 bits has nothing to do with your issue.
Can anyone tell me how can an JPEG image be divided in 8 x 8 blocks in C++.
Thanks.
Ah, the die-hard approach. My heart goes out to you. Expect to learn a lot, but be forewarned that you will lose time, blood and pain doing so.
The Compression FAQ has some details on how JPEG works. A good starting point is Part 2: Subject 75: Introduction to JPEG.
In a nutshell, for a typical JPEG file, you will have to reverse the encoding steps 6 to 4:
(6) extract the appropriate headers and image data from the JFIF container
(5) reverse the Huffman coding
(4) reverse the quantization
You should then be left with 8x8 blocks you could feed into an appropriate inverse DCT.
Wikipedia has some details on the JFIF format as well as Huffman tables and structure of the JPEG data within the JFIF.
I'm assuming you're looking to play with JPEG to learn about it? Because access to the raw encoded blocks is almost certainly not necessary if you have some practical application.
EDIT after seeing comments: If you just want to get a part of a very large JPEG without reading/decompressing the whole file, you could use ImageMagick's stream command. It allows you to get a subimage without reading the whole file. Use like e.g. stream -extract 8x8+16+16 large.jpeg block.rgb to get a 8x8 block starting at (16,16).
You have to decompress the image, use the turbojpg library (it's very fast), which will give you an array of unsigned char as RGB (or RGBA). Now you have an uncompressed image, which has a byte value for R G and B respectively.
You can from here, go and make a simple for loop that will go through 3*8 char blocks and copy them, using memcpy to some other memory location.
You have to keep in mind that the array returned from the turbojpg library is a one dimensional linear array of bytes. So the scanlines are stored one after the other. Take this into account when creating your blocks, cause depending on your needs, you'll have to traverse the array differently.
I'm trying to write 10,12 bit RGB TIFF files with LibTIFF.
The pixel data is saved locally in an unsigned short buffer (16bits)
1) If I set TIFFTAG_BITSPERSAMPLE to 10 or 12, not enough bits are being read from the buffer, and the output is incorrect. (I understand that it is just reading 10 or 12 bits per component, instead of 16 and this is the problem)
2) I tried packing the bits in the buffer, so that it is really 12-R, 12-G, 12-B. In this case, I think the file is being written correctly but no viewer I could find could display this image properly.
3) If I set TIFFTAG_BITSPERSAMPLE to 16, viewers can display the TIFF image, but then I have a problem that I don't know if the image was originally 10 or 12 bits (If I want to later read it with LibTIFF). Also, the viewer expects the dynamic range to be 16 bits and not 10 or 12, also resulting in a bad view.
4) The most annoying part is that I couldn't find one 10, 12, or 14 bit TIFF image on the web to see what the header is supposed to look like.
So finally, what is the proper way to write 10 or 12 bit Image data to a TIFF file ?????
The TIFF specification does not specify a way to store 10, 12 or 14 bits per channel in an image. Depending on the encoder and decoder, it may still be possible to work with such images, but it is effectively an implementation detail, as they are not required to do this.
If you want more than 8 bits of precision in a TIFF, your only choice is 16 (or floating point, but that's a different story).
I'm not aware of any image format with specific support for these bitdepths, so viewers will likely be a problem anyway if you must store the image with that specific bitdepth. The simplest workaround I can think of would be to just store as 16 bits per pixel and put the original bitdepth as metadata (e.g. in an ImageDescription tag), but it all depends on what the images will be used for and why you need this information.
You can store the image as a multi-image file. For example, with a 12 bit source, one image would be an RGB(8) image using the upper 8 bits and a second 16bit gray scale that was a combination of the low four bits and four bits of padding. This gives a TIFF that can be viewed with on a monitor with standard programs and the extra precision can be retrieved with custom software.
I disagree that 'exotic' bit depths are not good. This format would reduce the image size by 5/6. You could even just store the 2nd image as a re-scaled version that would have the 4 bits tightly packed without padding for a 3/4 size reduction. This savings can be significant with very large data sets, where compression is not an option due to the nature of the data. Ie, many scientific and machine vision applications may want the un-adultered bits. The ability to convert from the multi-image tiff to a 16-bit tiff would allow the use of standard programs and image libraries.