FFTW 3.3.3 basic usage with real datas - c++

I'm a newbie in FFT and I was asked to find a way to analyse/process a particular set of data collected by oil drilling rigs.
There is a lot of noise in the collected data due to rig movements (up & down with tides and waves for example).
I was asked to clean the collected data up with FFT=>filtering=>IFFT.
I use C++ and the FFTW 3.3.3 library.
An example is better than anything else so :
I have a DB with, for example, the mudflow (liters per minutes). The mudflow is collected every 5 seconds, there is a timestamp in the DB for every measure (ex. 1387411235).
So my IN_data for my FFT is a couple of timestamp/mudflow (ex. 1387456630/3955.94, 1387456635/3954.92, etc...)
Displaying theses data really looks like a noisy sound signal and relevant events may be masked by the noise.
Using examples found on the Internet I can manage to perform FFT but my lack of knowledge and understanding is a big problem as I've never worked on signal processing and Fourier Transforms.
I don't really know how to proceed to start with this job, which version of FFTW routine to use (c2c, r2c, etc...), if there is any pre-data-processing and/or post-processing to do.
There are a lot of examples and tutorials that I've read on the internet but I'm french (sorry for my mistakes here) and it doesn't always make sense to me especially with OUT_data units, OUT_data type, In and Out data array size, windowing (what is that by the way), to put it in a nutshell I'm lost...
I suppose that my problem would be pretty straightforward for someone used to FFTW but for me it's very complicated right now.
So my questions :
What FFTW routine to use in both ways (FFT & IFFT) (what kind, type and size, of array for IN_data and OUT_data).
How to interpret the resulting array (what are the units that FFTW will return).
For now a short sample of what I've done is :
fftw_plan p;
p = (fftw_plan)fftw_plan_dft_1d(size,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
fftw_execute(p);
fftw_destroy_plan(p);
with "in" and "out" as fftw_complex (the complex element of my In_data array is set to 1 for every data, don't really know why but the tutorial said to do that).
This code is based on an example found on the Internet but my lack of knowledge/understanding is a big drag and I was wondering if there was someone here who could give me explanations/workflow/insights/links on how to pull this out.
I'm in a trial period for my new job and I really want to implement this feature for my boss even if it means asking around for help, I've seen a lot of FFTW skilled posts here...

This is quite an ambitious project for someone who is completely new to DSP, but you can start by reading about the overlap-add method, which is essentially the method you need for your FFT-filter-IFFT approach to cleaning up this data. You should also check out the DSP StackExchange site dsp.stackexchange.com, where the theoretical background and application of frequency domain filtering is covered in several similar questions/answers.

Related

How to add sound effects to PCM buffered audio in C++

I have an int16_t[] buffer with PCM raw audio data and I want to apply some effects (like echo, reverb, gain...) into it.
I thought that SoX or similar can do the trick for me, but SoX only works with files and other similar libraries that supports adding sound effects seems to add the effects only when the sound is played. So my problem with this is that I want to apply the effect to the samples into my buffer without playing them.
I have never worked with audio, but reading about PCM data I have learned that I can apply gain multiplying each sample value, for example. But I'm looking for any library or relatively easy algorithms that I can use directly in my buffer to get the sound effects applied.
I'm sure there are a lot of solutions to my problem out there if you know what to look for, but it's my first time with audio "processing" and I'm lost, as you can see.
For everyone like me, interested in learning DSP related to audio processing with C++ I want to share my little research results and opinion, and perhaps save you some time :)
After trying several DSP libraries, finally I have found The Synthesis ToolKit in C++ (STK), an open-source library that offer easy and clear interfaces and easy to understand code that you can dive in to learn about various basic DSP algorithms.
So, I recommend to anyone who is starting out and have no previous experience to take a look at this library.
Your int16_t[] buffer contains a sequence of samples. They represent instantaneous amplitude levels. Think of them as the voltage to apply to the speaker at the corresponding instant in time. They are signed numbers with values in the range (-32767,32767]. A stream of constant zeros means silence. A stream of constant -32000 (for example) also means silence, but it will eventually burn your your speaker coil. The position in the array represents time, and the value of each sample represents voltage.
If you want to mix two sample streams together, for example to apply a chirp, you get yourself a sample stream with the chirp in it (record a bird or something). You then add the two sounds sample by sample.
You can do a super-cheesy reverb effect by taking your original sound buffer, lowering its volume (perhaps by dividing all the samples by a constant), and adding it back to your original stream, but shifting the samples by a tenth of a second's worth of array position.
Those are the basics of audio processing. Things get very sophisticated indeed. This field is known as "digital signal processing" and there are plenty of books on the subject.
You can do it either with hacking the audio buffer and trying to do some effects like gain and threshold with simple math operations or do it correct using proper DSP algorithms. If you wish to do it correct, I would recommend using the Speex Library. It's open source and and well tested. www (dot)speex (dot)org. The code should compile on MSVC or linux with minimal effort. This is the fastest way to get a good audio code working with proper DSP techniques. Your code would look like .. please read the AEC example.
st = speex_echo_state_init(NN, TAIL);
den = speex_preprocess_state_init(NN, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
You need to setup the states, the code testecho includes these.

Testing ZXing - some discoveries

Just to start off, thanks for making ZXing freely available.
I have been working in the barcode field since '98, and have a fair bit of experience decoding barcodes from images.
I consult now, and one of my products in a program that tests barcode decoders. This is an extremely CPU intensive test, with literally millions of images being tested per symbology. I test with different amounts of blur, different angles, different brightness and contrast, different sizes (pixels per module), curvature, perspective distortion, ripple, varying illumination ... the whole shebang. I am testing the C++ implementation on a Linux machine.
I have come across a few issues that may interest the developers:
1) Aztec has a bug that crashes sometimes. In AztecDecoder.cpp, in Decoder::correctBits, sometimes numECCCodewords takes on a negative value. Bad things happen after that. I was able to patch with a simple test and a "throw" to complete my testing.
2) Aztec has a huge memory leak. In no time at all, my program is taking over one GB of RAM, and the number steadily increases over time. It gets bad enough that the OS starts bogging down and reboots after a while. I don't seem to have this problem with other symbologies.
3) You don't seem to make the version number available in the code. I know, you should keep track of which version you downloaded. Life isn't always like that, sometimes you inherit code, and you had nothing to do with downloading it. Even a simple #define in a header file would suffice. I like to display the version number of the decoder in my test program. I get this value directly from the decoder so that if I upgrade decoders, the reported version automatically changes. Just a nice thing to have.
4) Aztec appears to be extremely weak. I know, it's only alpha, but it hardly ever decodes. I am not meaning to pick on aztec, but it gave me the most issues.
5) The entire UPC family misdecodes (returns incorrect information for a correctly encoded barcode) extremely frequently. You may want to put some protection in there. ITF does a fair bit as well. All well-known weaknesses.
6) Aztec misdecodes as well. This should pretty well never happen with a Reed-Solomon error correction system. I haven't taken the time to look into this yet.
Those were the major problems, now for comments:
1) There is no support for barcodes at angles (omnidirectional decoding). This is supported by all major commercial packages.
2) Decoding appears to be weak overall in terms of handling blur, and low pixels per module. Yes, I realize that it is a free package, but just stating the weak points.
That's about all that I found now. I'll update when I have more information.

Looking for Ideas: How would you start to write a geo-coder?

Because the open source geo-coders cannot begin to compare to Google's or even Yahoo's, I would like to start a project to create a good open source geo-coder. Just to clarify, a geo-coder takes some text (usually with some constraints) and returns one or more lat/lon pairs.
I realize that this is a difficult and garguntuan task, so I am wondering how you might get started. What would you read? What algorithms would you familiarize yourself with? What code would you review?
And also, assuming you were going to develop this very agilely, what would you want the first prototype to be able to do?
EDIT: Let's set aside the data question for now. I am going to use OpenStreetMap data, along with a database of waypoints that I have. I would later plan to include other data sets as well, and I realize the geo-coder would be inherently limited by the quality of the original data.
The first (and probably blocking) problem would be: where do you get your data from? (unless you are willing to pay thousands of dollars for proprietary sets).
You could build a geocoding-api on top of OpenStreetMap (they publish their data in dumps on a regular basis) I guess, but that one was still very incomplete last time I checked.
Algorithms are easy. Good mapping data, however, is expensive. Very expensive.
Google drove their cars all over the world, collecting this data among other things.
From a .NET point of view these articles might be interesting for you:
Writing Your Own GPS Applications: Part I
Writing Your Own GPS Applications: Part 2
Writing GIS and Mapping Software for .NET
I've only glanced at the articles but they've been on CodeProject's 'Most Popular' list for a long time.
And maybe this CodePlex project which the author of the articles above made available.
I would start at the absolute beginning by figuring out how you're going to get the data that matches a street address with a geocode. Either Google had people going around with GPS units, OR they got the information from some existing source. That existing source may have been... (all guesses)
The Postal Service
Some existing maps(printed)
A bunch of enthusiastic users that were early adopters of GPS technology who ere more than willing to enter in street addresses and GPS coordinates
Some government entity (or entities)
Their own satellites
etc
I guess what I'm getting at is the information was either imported from somewhere or was input by someone via some interface. As my starting point I would look at how to get that information. In an open source situation, you may be able to get a bunch of enthusiastic people to enter information.
So for my first prototype, boring as it would be, I would create a form for entering information.
Then you need to know the math for figuring out the closest distance (as the crow flies). From there, try to figure out how to include roads. (My guess is you would have to have data point for each and every curve, where you hold the geocode location of the curve, and the angle of the road on a north/south and east/west vector. You'd probably need to take incline into account, too to get accurate road measurements.)
That's just where I'd start.
But in all honesty, I wouldn't even start on this. Other programmers have done it already, I'm more interested in what hasn't already been done.
get my free raw data from somewhere like http://ipinfodb.com/ip_database.php
load it into a database, denormalizing for fast lookups
design my API
build it out as a RESTful web service
return results in varying formats: JSON, XML, CSV, raw text
The first prototype should accept a ZIP code and return lat/lon in raw text.

Help with algorithm to dynamically update text display

First, some backstory:
I'm making what may amount to be a "roguelike" game so i can exersize some interesting ideas i've got floating around in my head. The gameplay isn't going to be a dungeon crawl, but in any case, the display is going to be done in a similar fasion, with simple ascii characters.
Being that this is a self exercise, I endeavor to code most of it myself.
Eventually I'd like to have the game runnable on arbitrarily large game worlds. (to the point where i envision havening the game networked and span over many monitors in a computer lab).
Right now, I've got some code that can read and write to arbitrary sections of a text console, and a simple partitioning system set up so that i can path-find efficiently.
And now the question:
I've ran some benchmarks, and the biggest bottleneck is the re-drawing of text consoles.
Having a game world that large will require an intelligent update of the display. I don't want to have to re-push my entire game buffer every frame... I need some pointers on how to set it up so that it only draws sections of the game have have been updated. (and not just individual characters as I've got now)
I've been manipulating the windows console via windows.h, but I would also be interested in getting it to run on linux machines over a puTTY client connected to the server.
I've tried adapting some video-processing routines, as there is nearly a 1:1 ratio between pixel and character, but I had no luck.
Really I want a simple explanation of some of the principles behind it. But some example (psudo)code would be nice too.
Use Curses, or if you need to be doing it yourself, read about the VTnnn control codes. Both of these should work on windows and on *nix terms and consoles (and Windows). You can also consult the nethack source code for hints. This will let you change characters on the screen wherever changes have happened.
I am not going to claim to understand this, but I believe this is close to the issue behind James Gosling's legendary Gosling Emacs redrawing code. See his paper, titled appropriately, "A Redisplay Algorithm", and also the general string-to-string correction problem.
Having a game world that large will
require an intelligent update of the
display. I don't want to have to
re-push my entire game buffer every
frame... I need some pointers on how
to set it up so that it only draws
sections of the game have have been
updated. (and not just individual
characters as I've got now)
The size of the game world isn't really relevant, as all you need to do is work out the visible area for each client and send that data. If you have a typical 80x25 console display then you're going to be sending just 2 or 3 kilobytes of data each time, even if you add in colour codes and the like. This is typical of most online games of this nature: update what the person can see, not everything in the world.
If you want to experiment with trying to find a way to cut down what you send, then feel free to do that for learning purposes, but we're about 10 years past the point where it is inefficient to update a console display in something approaching real time and it would be a shame to waste time fixing a problem that doesn't need fixing. Note that the PDF linked above gives an O(ND) solution whereas simply sending the entire console is half of O(N), where N is defined as the sum of the lengths of A and B and D.

How do I write a Perl script to filter out digital pictures that have been doctored?

Last night before going to bed, I browsed through the Scalar Data section of Learning Perl again and came across the following sentence:
the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings.
An idea immediately hit me that I could actually let Perl scan the pictures that I have stored on my hard disk to check if they contain the string Adobe. It seems by doing so, I can tell which of them have been photoshopped. So I tried to implement the idea and came up with the following code:
#!perl
use autodie;
use strict;
use warnings;
{
local $/="\n\n";
my $dir = 'f:/TestPix/';
my #pix = glob "$dir/*";
foreach my $file (#pix) {
open my $pic,'<', "$file";
while(<$pic>) {
if (/Adobe/) {
print "$file\n";
}
}
}
}
Excitingly, the code seems to be really working and it does the job of filtering out the pictures that have been photoshopped. But problem is many pictures are edited by other utilities. I think I'm kind of stuck there. Do we have some simple but universal method to tell if a digital picture has been edited or not, something like
if (!= /the origianl format/) {...}
Or do we simply have to add more conditions? like
if (/Adobe/|/ACDSee/|/some other picture editors/)
Any ideas on this? Or am I oversimplifying due to my miserably limited programming knowledge?
Thanks, as always, for any guidance.
Your best bet in Perl is probably ExifTool. This gives you access to whatever non-image information is embedded into the image. However, as other people said, it's possible to strip this information out, of course.
I'm not going to say there is absolutely no way to detect alterations in an image, but the problem is extremely difficult.
The only person I know of who claims to have an answer is Dr. Neal Krawetz, who claims that digitally altered parts of an image will have different compression error rates from the original portions. He claims that re-saving a JPEG at different quality levels will highlight these differences.
I have not found this to be the case, in my investigations, but perhaps you might have better results.
No. There is no functional distinction between a perfectly edited image, and one which was the way it is from the start - it's all just a bag of pixels in the end, after all, and any other metadata you can remove or forge all you want.
The name of the graphics program used to edit the image is not part of the image data itself but of something called meta data - which may be stored in the image file but, as others have noted, is neither required (so some programs may not store it, some may allow you an option of not storing it) nor reliable - if you forged an image, you might have forged the meta data as well.
So the answer to your question is "no, there's no way to universally tell if the pic was edited or not, although some image editing software may write its signature into the image file and it'll be left there by carelessness of the editing person.
If you're inclined to learn more about image processing in Perl, you could take a look at some of the excellent modules CPAN has to offer:
Image::Magick - read, manipulate and write of a large number of image file formats
GD - create colour drawings using a large number of graphics primitives, and emit the drawings in various formats.
GD::Graph - create charts
GD::Graph3d - create 3D Graphs with GD and GD::Graph
However, there are other utilities available for identifying various image formats. It's more of a question for Super User, but for various unix distros you can use file to identify many different types of files, and for MacOSX, Graphic Converter has never let me down. (It was even able to open the bizarre multi-file X-ray of my cat's shattered pelvis that I got on a disc from the vet.)
How would you know what the original format was? I'm pretty sure there's no guaranteed way to tell if an image has been modified.
I can just open the file (with my favourite programming language and filesystem API) and just write whatever I want into that file willy-nilly. As long as I don't screw something up with the file format, you'd never know it happened.
Heck, I could print the image out and then scan it back in; how would you tell it from an original?
As other's have stated, there is no way to know if the image was doctored. I'm guessing what you basically want to know is the difference between a realistic photograph and one that has been enhanced or modified.
There's always the option of running some extremely complex image recognition algorithm that would analyze every pixel in your image and do some very complicated stuff to determine if the image was doctored or not. This solution would probably involve AI which would examine millions of photos that are both doctored and those that are not and learn from them. However, this is more of a theoretical solution and isn't very practical... you would probably only see it in movies. It would be extremely complex to develop and probably take years. And even if you did get something like this to work, it probably still wouldn't be 100% correct all the time. I'm guessing AI technology still isn't at that level and could take a while until it is.
A not-commonly-known feature of exiftool allows you to recognize the originating software through an analysis of the JPEG quantization tables (not relying on image metadata). It recognizes tables written by many applications. Note that some cameras may use the same quantization tables as some applications, so this isn't a 100% solution, but it is worth looking into. Here is an example of exiftool run on two images, the first was edited by photoshop.
> exiftool -jpegdigest a.jpg b.jpg
======== a.jpg
JPEG Digest : Adobe Photoshop, Quality 10
======== b.jpg
JPEG Digest : Canon EOS 30D/40D/50D/300D, Normal
2 image files read
This will work even if the metadata has been removed.
There is existing software out there which uses various techniques (compression artifacting, comparison to signature profiles in a database of cameras, etc.) to analyze the actual image data for evidence of alteration. If you have access to such software and the software available to you provides an API for external access to these analysis functions, then there's a decent chance that a Perl module exists which will interface with that API and, if no such module exists, it could probably be created rather quickly.
In theory, it would also be possible to implement the image analysis code directly in native Perl, but I'm not aware of anyone having done so and I expect that you'd be better off writing something that low-level and processor-intensive in a fully-compiled language (e.g., C/C++) rather than in Perl.
http://www.impulseadventure.com/photo/jpeg-snoop.html
is a tool that does the job almost good
If there has been any cloning , there is a variation in the pixel density..or concentration which sometimes shows up.. upon manual inspection
a Photoshop cloned area will have even pixel density(my meaning is variation of Pixels wrt a scanned image)