I am trying to modify waveform data that I am getting through waveInOpen via WAVEHDR-structs. I want to change the pitch of the sound.
All I have is a pointer to the raw audio data and the number of the used bytes.
I am a little lost because I cant find any examples on how to do this.
I would be really thankful for a starting point on how to edit raw waveform data (or even an example of how to change the pitch would be really awesome).
Thanks!
You can change the pitch by changing the playback rate. Say for example you recorded a waveform at 48kHz sampling rate and then when you played it back you told the system that the sample rate was 96kHz. The pitch of everything would double. Also the playback duration would halve which you may not want. An alternative to changing the sample rate is to add or remove samples to achieve basically the same effect. Contrary to the other answer it is not as arbitrary as adding or removing samples. When you remove samples you need to apply low pass filtering to prevent aliasing. And when inserting samples you need to apply an interpolation filter. These are not trivial if you don't have a signal processing background. Finally, if your goal is to shift the pitch but leave the duration at the original duration then it you need to look at something like a phase vocoder.
Related
Goal:
Measure the whole Pipeline time thats​ need a frame from the Stream src to the sink. The src is a IP camera and we should detect how long take a frame from the camera to the sink, If the time to high we should Show something in the Display.
Can you guys explain me how is this measurment possible in gstreamer ?
Our gstreamer Applikation is written in c++ some hints or code examples are welcome
Thank you du mucj guys
You can do this with pad probes perhaps:
https://gstreamer.freedesktop.org/documentation/application-development/advanced/pipeline-manipulation.html#using-probes
Depending on your pipeline behavior - you would choose the earliest element that can access reasonable data (not sure what the camera delivers as samples in your case) record the current system's time to the sample's DTS/PTS (frame reordering may be a pitfall here) and do the same thing at the last pad you have access to.
Compare the system's times of a sample with the same PTS/DTS and you should have the time delta the sample spend in the pipeline. Depending on your required accuracy this may be a good enough estimate.
I am playing back audio files in a program, and in the audio rendering callbacks, I apply a gain multiplier to the input signal and add it to the output buffer. Here's some pseudo code to illustrate my actions:
void audioCallback(AudioOutputBuffer* ao, AudioInput* ai, int startSample, int numSamples){
for (int i=startSample; i<numSamples+startSample; i++){
ao[i] = ai[i]*gain;
}
}
Basically I just multiply the data by some multiplier. In this case, gain is a float member that is being adjusted via a GUI callback. If I adjust this value while the audio is still playing, I can hear that the audio is getting softer or louder when I move the slider, but I hear lots of little pops and clicks.
Not really sure what the deal is. I know about interpolation, and I do that if the audio is pitch shifted, but I'm not sure if I need to do any extra interpolation or something if the gain is being adjusted in real time before the audio file is finished playing.
If I adjust the slider before the audio start playing, the gain is set properly and I get no clicks.
Am I missing something here? How else is gain implemented but a multiplier on the input signal?
Question: how does the multiplication operator know which operand is the audio signal and which one is the gain? Answer: it doesn't. They're both audio signals, and anything audible in either one will be audible in the output.
A flat, unchanging signal doesn't produce any audible sounds. As long as the gain remains constant, it won't introduce any sound of its own.
A signal that changes abruptly will be very audible, it sounds like a click, containing lots of high frequencies.
As you've determined on your own, one way to reduce the high frequency content and thus the audibility is to stretch out the change over a number of samples, using a constant slope. This would certainly suffice in an application where you have lots of time to make the gain change.
Another way would be to run a low-pass filter on the gain signal and use that as the input to the multiplication.
I fixed it by changing the gain in increments of the amount changed. For instance, if the gain multiplier was set to 1.0, then changed to 0.8, that's a difference of 0.2 gain. For each sample in the callback, add the difference / numSamples to the previous volume to create a slurring or gradual gain change.
I'm currently working on a small project where I'm loading 16bit wave files with a sample rate of 44100Hz. In normal playback the audio seems fine but as soon as I start to play with things like amplitude size to change the volume it starts giving a little bit of static noise.
What I'm doing is getting a sample from the buffer in the case of this 16bit type a short, converting this to a float in the range of -1 to 1 to start doing mixing and other effects. In this I also change the volume, when I just multiply it by 1 giving the same output its fine but as soon as I start to change the volume I hear the static noise. It happens when going over 1.0 as well as going below 1.0. And it gets worse the bigger or smaller the scale.
Anyone an idea how to reduce or remove the noise ?
"Static", otherwise known as "clicks and pops" are the result of discontinuities in the output signal. Here is a perfect example of a discontinuity:
http://en.wikipedia.org/wiki/File:Discontinuity_jump.eps.png
If you send a buffer of audio to the system to play back, and then for the next buffer you multiply every sample by 1.1, you can create a discontinuity. For example, consider a buffer that contains a sine wave with values from [-0.5, 0.5]. You send a piece of this wave to the output device, and the last sample happens to be 0.5.
Now on your next buffer you try to adjust the volume by multiplying by 1.1. The first sample of the new buffer will be close to 0.5 (since the previous sample was 0.5). Multiply that by 1.1 and you get 0.55.
A change from one sample to the next of 0.05 will probably sound like a click or a pop. If you create enough of these, it will sound like static.
The solution is to "ramp" your volume change over the buffer. For example, if you want to apply a gain of 1.1 to a buffer of 100 samples, and the previous gain was 1.0, then you would loop over all 100 samples starting with gain 1 and smoothly increase the gain until you reach the last sample, at which point your gain should be 1.1.
If you want an example of this code look at juce::AudioSampleBuffer::applyGainRamp:
http://www.rawmaterialsoftware.com/api/classAudioSampleBuffer.html
I found the flaw, I was abstracting different bit data types by going to their data using char*, I did not cast the usage of it to the correct datatype pointer. This means bytes were cut off when giving it data. This created the noise and volume changing bugs amongst others.
A flaw of my implementation and me not thinking about this when working with the audio data. A tip for anyone doing the same kind of thing, keep a good eye when modifying data, check which type your data is when using abstractions.
Many thanks to the guys trying to help me, the links were really interesting and it did learn me more things about audio programming.
I'm writing a file compressor utility in C++ that I want support for PCM WAV files, however I want to keep it in PCM encoding and just convert it to a lower sample rate and change it from stereo to mono if applicable to yield a lower file size.
I understand the WAV file header, however I have no experience or knowledge of how the actual sound data works. So my question is, would it be relatively easy to programmatically manipulate the "data" sub-chunk in a WAV file to convert it to another sample rate and change the channel number, or would I be much better off using an existing library for it? If it is, then how would it be done? Thanks in advance.
PCM merely means that the value of the original signal is sampled at equidistant points in time.
For stereo, there are two sequences of these values. To convert them to mono, you merely take piecewise average of the two sequences.
Resampling the signal at lower sampling rate is a little bit more tricky -- you have to filter out high frequencies from the signal so as to prevent alias (spurious low-frequency signal) from being created.
I agree with avakar and nico, but I'd like to add a little more explanation. Lowering the sample rate of PCM audio is not trivial unless two things are true:
Your signal only contains significant frequencies lower than 1/2 the new sampling rate (Nyquist rate). In this case you do not need an anti-aliasing filter.
You are downsampling by an integer value. In this case, downampling by N just requires keeping every Nth sample and dropping the rest.
If these are true, you can just drop samples at a regular interval to downsample. However, they are both probably not true if you're dealing with anything other than a synthetic signal.
To address problem one, you will have to filter the audio samples with a low-pass filter to make sure the resulting signal only contains frequency content up to 1/2 the new sampling rate. If this is not done, high frequencies will not be accurately represented and will alias back into the frequencies that can be properly represented, causing major distortion. Check out the critical frequency section of this wikipedia article for an explanation of aliasing. Specifically, see figure 7 that shows 3 different signals that are indistinguishable by just the samples because the sampling rate is too low.
Addressing problem two can be done in multiple ways. Sometimes it is performed in two steps: an upsample followed by a downsample, therefore achieving rational change in the sampling rate. It may also be done using interpolation or other techniques. Basically the problem that must be solved is that the samples of the new signal do not line up in time with samples of the original signal.
As you can see, resampling audio can be quite involved, so I would take nico's advice and use an existing library. Getting the filter step right will require you to learn a lot about signal processing and frequency analysis. You won't have to be an expert, but it will take some time.
I don't think there's really the need of reinventing the wheel (unless you want to do it for your personal learning).
For instance you can try to use libsnd
I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks.
My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do get 0 for chunks with silence and differing values for chunks with audio, but the values only differ slighly and don't seem to resemble the actual volume.
What would be a better algorithem calculate the volume ?
I hear G.711 audio is logarithmic PCM. How should I take that into account ?
Note, I haven't worked with G.711 PCM audio myself, but I presume that you are performing the correct conversion from the encoded amplitude to an actual amplitude before processing the values.
You'd expect the average value of most samples to be approximately zero as sound waveforms oscillate either side of zero.
A crude volume calculation would be rms (root mean square), i.e. taking a rolling average of the square of the samples and take the square root of that average. This will give you a postive quantity when there is some sound; the quantity is related to the power represented in the waveform.
For something better related to human perception of volume you may want to investigate the sort of techniques used in Replay Gain.
If you're feeling ambitious, you can download G.711 from the ITU-web site, and spend the next few weeks (or maybe more) implementing it.
If you're lazier (or more sensible) than that, you can download G.191 instead -- it includes source code to compress and decompress G.711 encoded data.
Once you've decoded it, visualizing the volume should be a whole lot easier.