get loudness level from raw data recieved from microphone in DirectShow - c++

How I can get loudness level from raw data received from microphone in DirectShow?
IMediaSample keep data in bytes. And how I can read this bytes and get something?

Loudness is an aural quality, not a physic formula. There are many many definitions for it.
It's a also a temporal value. As a consequence, this value changes during the time.
The simplest implementation I remember I had seen some years ago, was simply putting a time out on the maximum value of the amplitude. But the log of the amplitude is surely better to approximate the ear sensitivity much closer.
You can also consider the power of the signal ( signal * signal ... but there are also more definitions that takes into account the frequency spectrum components...).
It's kitchen recipes. Choose the simplest.
Edit: it seems my answer was too fast and fuzzy, I probably mistake Volume and Loudness. this wikipedia article states there are units for measuring loudness. Sone and Phon.

You need to process data to calculate loudness out of raw bytes. One of the method is defined in BS.1770 : Algorithms to measure audio programme loudness and true-peak audio level specification and describes the algorithm involved.

Related

how to get txPower to calculate distance from RSSI

I got this code from google code :
void QBluetoothDeviceDiscoveryAgent::deviceDiscovered(const QBluetoothDeviceInfo &info)
QBluetoothDeviceInfo::rssi().
But how to get rssi distance from `QBluetoothServiceDiscoveryAgent ?
I tried with
QBluetoothServiceDiscoveryAgent serviceInfo;
quint i =serviceInfo.device().rssi();
here i = -43
how to convert it to distance?
I got the link
Understanding ibeacon distancing
but how to get the transmitter power? to calculate the distance according to formula?
Make sure you understood the implications of QBluetoothDeviceInfo::rssi(). Calling this functions returns immediately with the last stored value when the device was scanned last. If you only receive one advertisement-packet, which happens to be at e.x. -90dB, and then immediately connect, this function will keep returning -90 until you disconnect from it and scan it again. Connected devices usually don't send advertisement-packets so the RSSI you can read via Qt won't be updated during the connection.
As for proximity, it's not so easy to get good values. To accurately convert from RSSI to geometric distance you must know the sender's original/intended signal-strength (or TX-power-level == RSSI at 1m distance). This value will differ between devices. To make things worse, in practice it can also vary by a huge margin depending on things like the sender's battery-level, physical orientations of sender/receiver to eachother, quality of individual parts, random interference from other RF devices....
The BLE-folk has a blog explaining how you should do it. You can read it up here. The linked article doesn't read or assume the theoretical maximum RSSI of the sender but instead it propoposes to gather multiple RSSI-values over time (+ do some mean/mode filtering), and use the current mean-value in comparison with the previous value to determine if you are approaching or moving away from the sender. Paired with some fine-tuning using real-world data you gotta collect, plus documentation-reading and common-sense, you could probably develop a proximity calculation for many or even most sender-devices which would be accurate to about one meter or even less at close proximity. In the end it's a tradeoff between how many devices you wish to 'calibrate' for and those you are okay with having shifted values due to higher or lower TX-power-levels.
The downside being - you can't test for every possible device on the market and as I said earlier, different devices have different TX-power-levels. With this approach you can develop an algorithm to get pretty good measurements for devices which have approximately equal signal-configurations but others will seem far off. The article's author talks about creating different profiles for different vendors but that's not really gonna help (consider two identical beacons ("big/small"), one for large and one for small indoor locations - with RSSI alone you can't reliably determine if you're close to the small beacon or in medium range to the big one unless they identify themselves via GAP or otherwise (forget MAC-addresses if you plan to deploy on MacOS or iOS).
Also, prepare yourself for the joyride that is Android BLE development. Some vendors know that their BLE implementation is so terribly bad and broken, they even disabled the HCI-Logging-Feature on all their ROMs to hide it. Others can be BLE-nuked like Win98 by ethernet, back in the days.

time-domain signal in MFCC

I have read about MFCC and Speech Recognition, and I don't understand one point. According to the document in this page http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/, what is the "time-domain signal"?? Is that the float number in data sub-chunk which I read in header-file of a wave file?
P/s: Sorry for my poor English :D
Yes, you are right. And quoted from wiki:
Time domain is the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various separate instants in the case of discrete time.

Can someone here has a sample of high pass filter for PCM audio data?

Good day.
I am poor of DSP. I have difficulties understanding the algorithm.
I have a c# application, a recorder function that will record a sound waves
but this sound is a mixture of all sounds. specifically, when i receive
the data i will filter this data to save only the filtered audio data with high frequency,
example cutoff frequency is 15khz.
For this filter, given are
the samples of data with size,
and the cutoff frequency
C/C++ is fine
At the time i received that samples of data, apply highpass filter then save the result to the wav.
-thong
You need to know the sample rate and also have a reasonable idea of your filter specification before you can design a suitable filter. Just specifying a 15 kHz cut-off is not really enough, e.g. you might want something like this:
Sample rate: 44.1 kHz
Stop-band: < 12 kHz
Stop-band rejection: > 80 dB
Pass-band: > 15 kHz
Passband ripple: +/- 1 dB
You can then feed these parameters into a filter design package and this will give you all the filter coefficients etc.
Note that the complexity of the filter (i.e. filter order = number of stages or "taps") will be highly dependent on the filter specification, so ideally you want to use a filter design package which lets you play around with the spec easily so that you can trade off your design requirements against the required compute bandwidth.
You will also need to decide whether phase and/or group delay are important to you - use a linear phase FIR for a constant group delay (more expensive) or a recursive IIR if phase/delay are not critical (much cheaper to implement).
Note that there are free online filter design packages available, e.g. http://www-users.cs.york.ac.uk/~fisher/mkfilter/ looks pretty good (it can even generate a C code filter implementation for you), although it may require at least beginner-level signal-processing knowledge when it comes to selecting filter types etc.
To help understand basic filter design parameters, here is a useful diagram from http://dspguru.com. Note that this is for a low pass filter, but the same parameters apply in the high pass case.

Is it possible to see if two MP3 files are the same song by analyzing the files' bytes?

This is to be done in C++ or C....
I know we can read the MP3s' meta data, but that information can be changed by anyone, can't it?
So is there a way to analyze a file's contents and compare it against another file and determine if it is in fact the same song?
edit
Lots of interesting things coming out that I hadn't thought of. Not at all a good idea to attempt this.
It's possible, but very hard.
Even the same original recording may well be encoded differently by different MP3 encoders or the same encoder with different settings... leading to different results when the MP3 is then decoded. You'd need to work out an aural model to "understand" how big the differences are, and make a judgement.
Then there's the matter of different recordings. If I sing "Once in Royal David's City" and Aled Jones sings it, are those the same song? What if there are two different versions of a song where one has slightly modified lyrics? The key could be different, it could be in a different vocal range - all kinds of things.
How different can two songs be but still count as "the same song"? Once you've decided that, then there's the small matter of implementing it ;)
If I really had to do this, my first attempt would be to take a Fourier transform of both songs and compare the histograms. You can use FFTW (http://www.fftw.org/) to take the Fourier transform, and then compare the histograms by summing the squares of the differences at each frequency. If the resultant sum is greater than some threshold (which you must determine by experimentation) then the songs are deemed to be different, otherwise they are the same.
No. Not SO simple.
You can check they contain the same encoded data, BUT:
Could be a different bitrate
Could be the same song, just a 1/100ths of a second off
In both cases the bytes would not match.
Basically, if a solution looks too simple to be true, it often is.
If you mean "same song" in the iTunes sense of "same recording", it would be possible to compares two audio files, but not by byte-by-byte comparison of an encoded file since even for the same format there are variables such as data rate and compression that are selected at time of encoding.
Also each encoding of the same recording may include different lead-in/lead-out timings, different amplitude and equalisation, and may have come from differing original sources (vinyl, CD, original master etc.). So you need a comparison method that takes all these variables into account, and even then you will end up with a 'likelihood' of a match rather than a definitive match.
If you genuinely mean "same song", i.e. any recording by any artist of the same composition and lyrics, then you are unlikely to get a high statistical correlation in most cases since pitch, tempo, range, instrumental arrangement will be very different.
In the "same recording" scenario, relatively simple signal processing and statistical techniques could be applied, in the "same song" scenario, AI techniques would need to be deployed, and even then the results I suspect would be poor.
If you want to compare MP3 files that originated from the same MP3, but have tagged with metadata differently, it would be straight forward to just compare the actual audio data. Since it originated from the same MP3 encoding, you should be able to do a byte by byte comparison. You would have to compare all byte. It should be sufficient to sample just a few to get a unique key that would be statistically almost impossible to find in another song.
If the files have been produced by different encoders, you would have to extract some "fuzzy" feature keys from the data and compare those keys. In a hurry I would probably construct an algorithm like this:
Decode audio to pulse-code modulation (wave) in a standard bit rate.
Find a fixed number of feature starting points using some dynamic location algorithm. For example find top 10 highest wave peaks ordered from beginning of wave or simply spread evenly across the wave (it would be a good idea to fix the first and last position dynamically though, since different encodings might not start and end at exactly the same point). An improvement would be to select feature points at positions in the wave that are not likely to be too repetitive.
Extract a set of one-dimensional feature key scalars from the feature points. For example, for each feature normalize the following n-sample values and count the number of zero-crossings, peak to average ratio, mean zero-crossing distance, signal-energy. The goal is to extract robust features that are relatively unique, while still characteristic even if some noise and distortion is added to the signal. This can obviously be improved almost infinitely.
Compare the extracted feature keys of the two files using some accuracy measurement (f.eks. 9 out of 10 feature extractions must match at least 99% on 4 out of 5 of their extracted feature keys).
The benefit of a feature extraction approach is that you can build a database of features for all your mp3-files and for a single file ask the question: What other media files have exactly or almost exactly the same feature as this one. The feature lookup could be implemented very efficiently with R*-trees or similar, which could be used to give you a fast distance measurement between the n-dimensional feature sets.
The above technique is essentially a variant of what is used in image search algorithms such as SIFT, which is probably the base of such application as Photosynth and Google Goggles. In image searching you filter the image for good candidate points for relatively unique features (such as corners of shapes), then you normalize the area around that feature to get normalized color, intensity, scale and direction of features. Finally you extract the features and search an n-dimensional database of features of other images and verify that found features in other images are geometrically positioned in the same pattern as in your search image. The technique for searching audio would be the same, only simpler, since audio is one dimensional.
Use the open source EchoPrint library to create a signature of the two audio files, and compare them with each other.
The library is very easy to use, and has clear examples on how to create the signatures.
http://echoprint.me/
You can even query their database with the signature and find matching song metadata (such as title, artist, etc).
I think the Fast Fourier-Transform (FFT) approach hinted by jstanley is pretty good for most use cases; in particular, it works for verifying that the two are the same release/ same recording by the same artist/ same bitrate / audio quality.
To be more explicit, sox and spek (via command line and GUI, respectively) can do this pretty painlessly.
Spek is pretty foolproof -- just open the software and point it to the two audio files in question.
sox can generate spectograms (FFTs) from the command line line so:
sox "$file" -n spectrogram -o "$outfile".
The result from either are two images; if they look basically identical, then for almost all intents and purposes, the two songs will be equivalent.
For example, I wanted to test if these two files:
Soundtrack to an imaginary film mixtape 2011.mp3
DJRUM - Sountrack to an imaginary film mixtape 2011 (for mary-anne hobbs).mp3
were the same. diff reported a difference in the binary files (perhaps due to metadata differences or minor encoding differences), but a quick glance at their spectrograms resolved it:

Real time plotting/data logging

I'm going to write a program that plots data from a sensor connected to the computer. The sensor value is going to be plotted as a function of the time (sensor value on the y-axis, time on the x-axis). I want to be able to add new values to the plot in real time. What would be best to do this with in C++?
Edit: And by the way, the program will be running on a Linux machine
Are you particularly concerned about the C++ aspect? I've done 10Hz or so rate data without breaking a sweat by putting gnuplot into a read/plot/refresh loop or with LiveGraph with no issues.
Write a function that can plot a std::deque in a way you like, then .push_back() values from the sensor onto the queue as they come available, and .pop_front() values from the queue if it becomes too long for nice plotting.
The exact nature of your plotting function depends on your platform, needs, sense of esthetics, etc.
You can use ring buffers. In such buffer you have read position and write position. This way one thread can write to buffer and other read and plot a graph. For efficiency you usually end up writing your own framework.
Size of such buffer can be estimated using eg.: data delivery speed from sensor (40KHz?), size of one probe and time span you would like to keep for plotting purposes.
It also depends whether you would like to store such data uncompressed, store rendered plot - all for further offline analysis. In non-RTOS environment your "real-time" depends on processing speed: how fast you can retrieve/store/process and plot data. Usually it is near-real time efficiency.
You might want to check out RRDtool to see whether it meets your requirements.
RRDtool is a high performance data logging and graphing system for time series data.
I did a similar thing for a device that had a permeability sensor attached via RS232.
package bytes received from sensor into packets
use a collection (mainly a list) to store them
prevent the collection to go over a fixed size by trashing least recent values before new ones arrive
find a suitable graphics library to draw with (maybe SDL if you wanna keep it easy and cross-platform), but this choice depends on what kind of graph you need (ncurses may be enough)
last but not least: since you are using a sensor I suppose your approach will be multi-threaded so think about it and use a synchronized collection or a collection that allows adding values when other threads are retrieving them (so forgot iterators, maybe an array is enough)
Btw I think there are so many libraries, just search for them:
first
second
...
I assume that you will deploy this application on a RTOS. But, what will be the data rate and what are real-time requirements! Therefore, as written above, a simple solution may be more than enough. But, if you have hard-real time constraints everything changes drastically. A multi-threaded design with data pipes may solve your real-time problems.