I am trying to read data from .wav and put it to fft.
To read wav file I am using sndfile library.
SNDFILE* infile;
SF_INFO sfinfo ;
memset (&sfinfo, 0, sizeof (sfinfo)) ;
infile = sf_open ("sound.wav", SFM_READ, &sfinfo);
double data [BUF_SIZE];
while (readcount = (int)sf_readf_double (infile, data, BUF_SIZE))
{
for (int i = 0; i < readcount; i++)
{
cout << data[i] << " ";
}
}
But every values in this (and other files) are between (-1 ; 1).
Is this correct? Why every values are so small? I was expected to read amplitude in time domain (volume of sound).
This is the canonical format of floating point samples. With float values, you get full 32-bit precision. Clipping is also easy to represent. If a sample value is higher than 1 or lower than -1, it means the sample clipped. With integer values, there's no way to know that.
Floating point is also an easy sample format to apply operations to. Mixing for example is trivial (you just add the sample values together.)
So even if it looks weird at first, it is the best format for audio sample representation. Once you applied the operations you need to the float values, you then convert them to the format you want for output (like 16-bit integers.) This operation is trivial. Here's a function that converts and clips float samples to any known integer sample format in use today:
#include <limits>
/* Convert and clip a float sample to an integer sample. This works for
* all usual integer sample types (8-bit, 16-bit, 32-bit, signed or
* unsigned.)
*/
template <typename T>
T floatSampleToInt(float src) noexcept
{
if (src >= 1.f)
return std::numeric_limits<T>::max();
if (src < -1.f)
return std::numeric_limits<T>::min();
return src * (float)(1UL << (sizeof(T) * 8 - 1))
+ ((float)(1UL << (sizeof(T) * 8 - 1))
+ (float)std::numeric_limits<T>::min());
}
If you want to convert a float sample to a signed 16-bit integer sample for example, you do:
int16_t intSample = floatSampleToInt<int16_t>(floatSample);
Note that 24-bit integer samples are covered by 32-bit. A 32-bit sample is also a valid 24-bit sample; its lower 8 bits are just truncated.
I am in the process of making a simple program, which loads vertices and triangles from a file (uints and floats).
They will be used in OpenGL and i want them to be 16-bit (to conserve memory), however i only know how to convert to 32-bit. I don't want to use assembly, because i want it to run on ARM as well.
So, is it possible to convert a string to a 16-bit int/float?
One possible answer would be to something like this :
#include <string>
#include <iostream>
std::string str1 = "345";
std::string str2 = "3.45";
int myInt(std::stoi(str1));
uint16_t myInt16(0);
if (myInt <= static_cast<int>(UINT16_MAX) && myInt >=0) {
myInt16 = static_cast<uint16_t>(myInt);
}
else {
std::cout << "Error : Manage your error the way you want to\n";
}
float myFloat(std::stof(str2));
For the vertex coordinates, you have a floating point number X and you need to convert it to one of the 16 bit alternatives in OpenGL: GL_SHORT or GL_UNSIGNED_SHORT or GL_HALF_FLOAT. First, you need to decide whether you want to use integers or floating point.
If you're going with integers, I recommend unsigned integers, so that zero maps to the minimal value and 65536 maps to the maximal value. With integers, you need to decide on the range of valid values for X.
Suppose you know that X is between Xmin and Xmax. Then, you can calculate a GL_UNSIGNED_SHORT-compatible representation by:
unsigned short convert_to_GL_UNSIGNED_SHORT(float x, float xmin, float xmax) {
if (x<=xmin)
return 0;
else if (x>=xmax)
return 65535;
else
return (unsigned short)((X-Xxmin)/(X-Xmax)*65535 + 0.5)
}
If you go with half floats, I suggest you look at 16-bit floats and GL_HALF_FLOAT
For the face indices, you have unsigned 32 bit integers, right? If they are all below 65536, you can easily convert them to 16 bit unsigned shorts by
unsigned short i16 = (unsigned short)i32;
The real inverse FFT gives me an array full of NaNs instead of floats.
kiss_fftri(conf,complex_array,output);
The complex_array is normal, nothing wrong with the values i guess.
kiss_fftr_cfg conf = kiss_fftr_alloc(size,1,NULL,NULL);
The conf should be fine too as far as I know.
Anything wrong with the size? I know that the output size of the forward FFT must be N/2 + 1 and the size above should be N.
I've already made an simple working example with audio convolution in
the frequency domain and everything, but I've no idea whats happening
here.
NaNs and some samples of the , complex_array above.
The size parameter in my example is always 18750. Thats the number of samples. N / 2 + 1 is therefore 7876.
First I'm having a mono channel with 450k samples. Then I'm splitting it on 24 parts. Every part is now 18750 samples. With each of those samples I'm making an convolution with an impulse response. So basically the numbers I'm printing above are lets say the first 20 samples in each of the 24 rounds the for loop is going. Nothing wrong here I guess.
I even did on kiss_fftr_next_fast_size_real(size) and it stays the same so the size should be optimal.
Here's my convolution:
kiss_fft_cpx convolution(kiss_fft_cpx *a, kiss_fft_cpx *b, int size)
{
kiss_fft_cpx r[size];
memset(r,0,size*sizeof(kiss_fft_cpx));
int skalar = size * 2; // for the normalisation
for (int i = 0; i < size; ++i){
r[i].r = ((a[i].r/skalar) * (b[i].r)/skalar) - ((a[i].i/skalar) * (b[i].i)/skalar);
r[i].i = ((a[i].r/skalar) * (b[i].i)/skalar) + ((a[i].i/skalar) * (b[i].r)/skalar);
}
return r;
}
The size I input here via the argument is the N/2 + 1.
It's not kiss which causes the problem here. It is how the result array is (mis)handled.
To really "keep it simple and stupid" (KISS), I recommend to use STL containers for your data instead of raw c++ arrays. This way, you can avoid the mistakes you did in the code. Namely returning the array you created on the stack.
kiss_fft_cpx convolution(kiss_fft_cpx *a, kiss_fft_cpx *b, int size)
... bears various problems. The return type is just a complex number, not a series.
I would change the signature of the function to:
#include <vector>
typedef std::vector<kiss_fft_cpx> complex_vector;
void
convolution
( const kiss_fft_cpxy *a
, const kiss_Fft_cpx *b
, int size
, complex_vector& result
);
Then, in the code, you can indeed resize the result vector to the necessary size and use it just like a fixed size array as far as your convolution computation is concerned.
{
result.resize(size);
// ... use as you did in your code: result[i] etc..
}
I have an assembly segment of the program that does a huge malloc (typically of the order of 8Gb), populates it and does computations on it.
For debugging purposes I want to be able to convert this allocated and pre-filled memory as a 3-D array in C/C++. I specifically do not want to allocate another 8 GB because declaring unsigned char* debug_arr[crystal_size][crystal_size][crystal_size] and doing an element-by-element copy will result in a stack overflow.
I would ideally love to type cast the memory pointer to an 3D array pointer ... Is it possible ?
Objective is to verify the computation results done in Assembly segment.
My C/C++ knowledge is average. I mostly use 64-bit assembly, so request give me the C++ typecasting in some detail, please?
Env : Intel Core i7 2600K #4.4 GHz with 16 GB RAM, 64 bit assembly programming on 64 bit Windows 7, Visual Studio Express 2012
Thanks...
If you want to access a single unsigned char entry as if from a 3D array, you obviously need the relevant dimensions (call them nXDim, nYDim, nZDim for the sake of argument) and you need to know what dimension order has been assumed during writing.
If we assume that z changes less frequently than y and y less frequently than x then you can access your array via a function such as this:
unsigned char* GetEntry(int nX, int nY, int nZ)
{
return &pYourArray[(nZ * nXDim * nYDim) + (nY * nXDim) + nX];
}
First check what orderin is done in your memory . there are two types raw major orderin or column major
For row major ordering
Address = Base + ((depthindex*col_size+colindex) * row_size + rowindex) * Element_Size
For column major ordering
Address = Base + ((rowindex*col_size+colindex) * depth_size + depthindex) * Element_Size
Here is an example for you to expand on:
char array[10000]; // One dimensional array
char * mat[100]; // Matrix for 2D array
for ( int i = 0; i < 100; i++ )
mat[i] = array + i * 100;
Now, you have the matrix as a 100x100 element 2D array in the same memory as the array.
If you know the dimensions at compile time, then something like this
void * crystal_cube = 0; // set by asm magic;
typedef unsigned char * DEBUG_CUBE[2044][2044][2044];
DEBUG_CUBE debug_cube = (DEBUG_CUBE) crystal_cube;
Supposing I am given an image of 2048x2048 and i want to know the total number of colors present in the image, what is the fastest possible algorithm? I came up with two algorithm but they are slow.
Algorithm 1:
Compare the current pixel an the next pixel and if they are different
Check a temporary variable, which contains all the detected colors, to see if the color is present or not
If not present add it to the array(List) and increment noOfColors.
This Algorithm works but is slow. For a 1600x1200 pixels image it takes around 3 sec.
Algorithm 2:
The obvious method of checking the each pixel with all other pixels and recording the no of occurences of the color and incrementing the count. This is very very slow, almost like a hung app. So is there any better approach? I need all the pixel info.
You could use std::set (or std::unordered_set), and simply do a single loop though the pixels, adding the colors to the set. Then the number of colors is the size of the set.
Well, this is suited for parallelization. Split the image in several parts and execute the algorithm for each part in a separate task. To avoid syncing each should have its own storage for the unique colors. When all tasks are done, you aggregate the results.
DRAM is dirt cheap. Use brute force. Fill a tab, count.
On a core2duo # 3.0GHz :
0.35secs for 4096x4096 32 bits rgb
0.20secs after some trivial parallelization (I do know nothing of omp)
However, if you are to use 64bit rgb (one channel = 16 bits) it is another question (not enough memory).
You shall probably need a good hash table function.
Using random pixels, same size takes 10 secs.
Remark: at 0.15 secs, the std::bitset<> solution is faster (it gets slower trivially parallelized !).
Solution, c++11
#include <vector>
#include <random>
#include <iostream>
#include <boost/chrono.hpp>
#define _16M 256*256*256
typedef union {
struct { unsigned char r,g,b,n ; } r_g_b_n ;
unsigned char rgb[4] ;
unsigned i_rgb;
} RGB ;
RGB make_RGB(unsigned char r, unsigned char g , unsigned char b) {
RGB res;
res.r_g_b_n.r = r;
res.r_g_b_n.g = g;
res.r_g_b_n.b = b;
res.r_g_b_n.n = 0;
return res;
}
static_assert(sizeof(RGB)==4,"bad RGB size not 4");
static_assert(sizeof(unsigned)==4,"bad i_RGB size not 4");
struct Image
{
Image (unsigned M, unsigned N) : M_(M) , N_(N) , v_(M*N) {}
const RGB* tab() const {return & v_[0] ; }
RGB* tab() {return & v_[0] ; }
unsigned M_ , N_;
std::vector<RGB> v_;
};
void FillRandom(Image & im) {
std::uniform_int_distribution<unsigned> rnd(0,_16M-1);
std::mt19937 rng;
const int N = im.M_ * im.N_;
RGB* tab = im.tab();
for (int i=0; i<N; i++) {
unsigned r = rnd(rng) ;
*tab++ = make_RGB( (r & 0xFF) , (r>>8 & 0xFF), (r>>16 & 0xFF) ) ;
}
}
size_t Count(const Image & im) {
const int N = im.M_ * im.N_;
std::vector<char> count(_16M,0);
const RGB* tab = im.tab();
#pragma omp parallel
{
#pragma omp for
for (int i=0; i<N; i++) {
count[ tab->i_rgb ] = 1 ;
tab++;
}
}
size_t nColors = 0 ;
#pragma omp parallel
{
#pragma omp for
for (int i = 0 ; i<_16M; i++) nColors += count[i];
}
return nColors;
}
int main() {
Image im(4096,4096);
FillRandom(im);
typedef boost::chrono::high_resolution_clock hrc;
auto start = hrc::now();
std::cout << " # colors " << Count(im) << std::endl ;
boost::chrono::duration<double> sec = hrc::now() - start;
std::cout << " took " << sec.count() << " seconds\n";
return 0;
}
The only feasible algorithm here is building a sort of a histogram of the image colors. The only difference in your case is that instead of calculating the population of each color you need just to know if it's zero or not.
Depending on which color space you work, you may use either an std::set to tag existing colors (as Joachim Pileborg suggested), or just use something like std::bitset, which is obviously faster. This depends on how much distinct colors exist in your color-space.
Also, like Marius Bancila noted, this procedure is a perfect match for parallelization. Calculated the histogram-like data for image parts, and then merge it. Naturally the image division should be based on its memory partition, not the geometric properties. In simple words - split the image vertically (by batches of scan lines), not horizontally.
And, if possible, you should either use some low-level library/code to run through pixels, or try to write your own. At least you must obtain a pointer to scan line and run on its pixels in a batch, rather than doing something like GetPixel for each pixel.
The point, here, is that the ideal representation of an image as 2D array of colors is not the one that happens the way the image is stored on memory (color components can be arranged in "planes", there could be "padding" etc. So getting the pixels using a GetPixel-like function may take time.
The question, then, may even be somehow meaningless if the image is not the result of a "vectorial draw": think to a photograph: between two nearby "greens" you find all the shade of green, so the colors -in this case- are no more no less the ones supported by the encoding of the image itself (2^24, or 256, or 16 or ...), so, unless you are interested on the color distribution (how differently used they are), just counting them makes very few sense.
A workaround can be:
Create an in-memory bitmap having pixel in a "single plane format"
Blit your image into that bitmap using BitBlt or similar (this let the OS to make pixel
conversion from the GPU,if any)
Get the bitmap-bits (this lets you
access the stored values)
Play your "counting algorithm" (whatever
it is) onto those values.
Note that step 1 and 2 can be avoided if you already know that the image is already in planar format.
If you have a multicore system, step 4 can also be assigned to different threads, each working part of the image.
You can use bitset which allows you to set individual bits and has a count function.
You have a bit for each colour, there are 256 values for each of RGB, so that's 256*256*256 bits (16,777,216 colours). The bitset will use a byte for every 8 bits so it will use 2MB.
Use the pixel colour as an index into the bitset:
bitset<256*256*256> colours;
for(int pixel: pixels) {
colours[pixel] = true;
}
colours.count();
This has linear complexity.
Late comer to this answer, but could not help it since this algorithm is brutally fast, developed about 2 or more decades ago, when it really mattered.
3-D Lookup Table Color Matching
http://www.ddj.com/cpp/184403257
Basically, it creates a 3d color loop up table and the search is very fast, I've done some modifications to suit my purpose for image binarization, so I reduced the color space from ff ff ff to f f f, and it's even 10 times faster. As it is right out of the box, I haven't found anything even close, including hash tables.
char * creatematcharray(struct rgb_color *palette, int palettesize)
{
int rval=16, gval=16, bval=16, len, r, g, b;
char *taken, *match, *same;
int i, set, sqstep, tp, maxtp, *entryr, *entryg, *entryb;
char *table;
len=rval*gval*bval;
// Prepare table buffers:
size_t size_of_table = len*sizeof(char);
table=(char *)malloc(size_of_table);
if (table==nullptr) return nullptr;
// Select colors to use for fill:
set=0;
size_t size_of_taken = (palettesize * sizeof(int) * 3) +
(palettesize*sizeof(char)) + (len * sizeof(char));
taken=(char *)malloc(size_of_taken);
same=taken + (len * sizeof(char));
entryr=(int*)(same + (palettesize * sizeof(char)));
entryg=entryr + palettesize;
entryb=entryg + palettesize;
if (taken==nullptr)
{
free((void *)table);
return nullptr;
}
std::memset((void *)taken, 0, len * sizeof(char));
// std::cout << "sizes: " << size_of_table << " " << size_of_taken << std::endl;
match=table;
for (i=0; i<palettesize; i++)
{
same[i]=0;
// Compute 3d-table coordinates of palette rgb color:
r=palette[i].r&0x0f, g=palette[i].g&0x0f, b=palette[i].b&0x0f;
// Put color in position:
if (taken[b*rval*gval+g*rval+r]==0) set++;
else same[match[b*rval*gval+g*rval+r]]=1;
match[b*rval*gval+g*rval+r]=i;
taken[b*rval*gval+g*rval+r]=1;
entryr[i]=r; entryg[i]=g; entryb[i]=b;
}
// ### Fill match_array by steps: ###
for (set=len-set, sqstep=1; set>0; sqstep++)
{
for (i=0; i<palettesize && set>0; i++)
if (same[i]==0)
{
// Fill all six sides of incremented cube (by pairs, 3 loops):
for (b=entryb[i]-sqstep; b<=entryb[i]+sqstep; b+=sqstep*2)
if (b>=0 && b<bval)
for (r=entryr[i]-sqstep; r<=entryr[i]+sqstep; r++)
if (r>=0 && r<rval)
{ // Draw one 3d line:
tp=b*rval*gval+(entryg[i]-sqstep)*rval+r;
maxtp=b*rval*gval+(entryg[i]+sqstep)*rval+r;
if (tp<b*rval*gval+0*rval+r)
tp=b*rval*gval+0*rval+r;
if (maxtp>b*rval*gval+(gval-1)*rval+r)
maxtp=b*rval*gval+(gval-1)*rval+r;
for (; tp<=maxtp; tp+=rval)
if (!taken[tp])
taken[tp]=1, match[tp]=i, set--;
}
for (g=entryg[i]-sqstep; g<=entryg[i]+sqstep; g+=sqstep*2)
if (g>=0 && g<gval)
for (b=entryb[i]-sqstep; b<=entryb[i]+sqstep; b++)
if (b>=0 && b<bval)
{ // Draw one 3d line:
tp=b*rval*gval+g*rval+(entryr[i]-sqstep);
maxtp=b*rval*gval+g*rval+(entryr[i]+sqstep);
if (tp<b*rval*gval+g*rval+0)
tp=b*rval*gval+g*rval+0;
if (maxtp>b*rval*gval+g*rval+(rval-1))
maxtp=b*rval*gval+g*rval+(rval-1);
for (; tp<=maxtp; tp++)
if (!taken[tp])
taken[tp]=1, match[tp]=i, set--;
}
for (r=entryr[i]-sqstep; r<=entryr[i]+sqstep; r+=sqstep*2)
if (r>=0 && r<rval)
for (g=entryg[i]-sqstep; g<=entryg[i]+sqstep; g++)
if (g>=0 && g<gval)
{ // Draw one 3d line:
tp=(entryb[i]-sqstep)*rval*gval+g*rval+r;
maxtp=(entryb[i]+sqstep)*rval*gval+g*rval+r;
if (tp<0*rval*gval+g*rval+r)
tp=0*rval*gval+g*rval+r;
if (maxtp>(bval-1)*rval*gval+g*rval+r)
maxtp=(bval-1)*rval*gval+g*rval+r;
for (; tp<=maxtp; tp+=rval*gval)
if (!taken[tp])
taken[tp]=1, match[tp]=i, set--;
}
}
}
free((void *)taken);`enter code here`
return table;
}
The answer: unordered_map
I use unordered_map, based on my testing.
You should test because your compiler / library may exhibit different performance Comment out #define USEHASH to use map instead.
On my machine, the vanilla unordered_map (a hash implementation) is about twice as fast as map. Inasmuch as different compilers, libraries can vary enormously, you must test to see which is better. In production, I build a fake image on first start of the app, run both algorithms on it and time them, save an indication of which one is faster, and then preferentially use that for all subsequent starts on that the machine. It's nit-picky, but hey, the user's time is valuable to them.
For a DSLR image with 12,106,244 pixels (about 12 megapixels, not a typo) and 11,857,131 distinct colors (also not a typo), map takes about 14 seconds, while unordered map takes about 7 seconds:
Test Code:
#define USEHASH 1
#ifdef USEHASH
#include <unordered_map>
#endif
size = im->xw * im->yw;
#ifdef USEHASH
// unordered_map is about twice as fast as map on my mac with qt5
// --------------------------------------------------------------
#include <unordered_map>
std::unordered_map<qint64, unsigned char> colors;
colors.reserve(size); // pre-allocate the hash space
#else
std::map<qint64, unsigned char> colors;
#endif
...use of either is in a loop where I build a 48-bit value of 0RGB in a 64-bit variable corresponding to the 16-bit RGB values of the image pixels, like so:
for (i=0; i<size; i++)
{
pel = BUILDPEL(i); // macro just shovels 0RGB into 64 bit pel from im
// You'd do the same for your image structure
// in whatever way is fastest for you
colors[pel] = 1;
}
cc = colors.size();
// time here: 14 secs for map, 7 secs for unordered_map with
// 12,106,244 pixels containing 11,857,131 colors on 12/24 core,
// 3 GHz, 64GB machine.