KISS_FTT applied to o large array with windows of 2048 samples moved from 100 to 100 - kissfft

I have to go through a large array (86 464 resulted from reading a audio file ) with a kiss_fft window of 2048 samples
from 100 to 100 and to put each of the results (a 2048 array) as columns in a matrix.
The goal is to get a image of a voice from the audio file .
I did this in Matlab:
for i=0:len-1
window = x((i*100+1):(i*100)+2048); % getting 2048 samples from array
real = abs(fft(window)); % appling FFT and take the real part of it
matrix(1:2048,i+1) = real; % adding result to matrix
It is working very well but i have problems with my C implementation:
`kiss_fft_cpx* copycpx(double *mat, int nframe)
int i;
kiss_fft_cpx *mat2;
kiss_fft_scalar zero;
memset(&zero,0,sizeof(zero) );
for(i=0; i<nframe ; i++)
mat2[i].r = mat[i];
mat2[i].i = zero;
return mat2;
int main(void)
int i;
double adoublevariable ;
int dimens = 0;
char line[100];
double window[2048];
int f;
int j;
int len =0;
FILE *vector;
vector=fopen("1.txt", "r");
while( fgets( line,100,vector ))
double v[dimens];
vector=fopen("1.txt", "r");
while( fgets( line,100,vector ) )
if( 1==sscanf(line,"%lf",&adoublevariable) )
v[i] = adoublevariable;
puts("not a double variable");
len = int(floor((dimens-2048)/100+1));
double **mat;
mat = (double **)malloc(2048*sizeof(double*));
kiss_fft_cfg cfg = kiss_fft_alloc( 2048 ,0 ,0,0 );
kiss_fft_cpx out_cpx[2048],*cpx_buf;
for(i=0 ;i < len ;i++)
for(j=0 ;j < 2048 ;j++)
cpx_buf = copycpx(window,2048);
kiss_fft( cfg ,cpx_buf, out_cpx );
mat[f][i] = fabs(out_cpx[f].r);
return 0;
I have no errors or warnings but it crushes.


FFTW Complex to Real Segmentation Fault

I am attempting to write a naive implementation of the Short-Time Fourier Transform using consecutive FFT frames in time, calculated using the FFTW library, but I am getting a Segmentation fault and cannot work out why.
My code is as below:
// load in audio
AudioFile<double> audioFile;
audioFile.load ("assets/example-audio/file_example_WAV_1MG.wav");
int N = audioFile.getNumSamplesPerChannel();
// make stereo audio mono
double fileDataMono[N];
if (audioFile.isStereo())
for (int i = 0; i < N; i++)
fileDataMono[i] = ( audioFile.samples[0][i] + audioFile.samples[1][i] ) / 2;
// setup stft
// (test transform, presently unoptimized)
int stepSize = 512;
int M = 2048; // fft size
int noOfFrames = (N-(M-stepSize))/stepSize;
// create Hamming window vector
double w[M];
for (int m = 0; m < M; m++) {
w[m] = 0.53836 - 0.46164 * cos( 2*M_PI*m / M );
double* input;
// (pads input array if necessary)
if ( (N-(M-stepSize))%stepSize != 0) {
noOfFrames += 1;
int amountOfZeroPadding = stepSize - (N-(M-stepSize))%stepSize;
double ipt[N + amountOfZeroPadding];
for (int i = 0; i < N; i++) // copy values from fileDataMono into input
ipt[i] = fileDataMono[i];
for (int i = 0; i < amountOfZeroPadding; i++)
ipt[N + i] = 0;
input = ipt;
} else {
input = fileDataMono;
// compute stft
fftw_complex* stft[noOfFrames];
double frames[noOfFrames][M];
fftw_plan fftPlan;
for (int i = 0; i < noOfFrames; i++) {
stft[i] = (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * M);
for (int m = 0; m < M; m++)
frames[i][m] = input[i*stepSize + m] * w[m];
fftPlan = fftw_plan_dft_r2c_1d(M, frames[i], stft[i], FFTW_ESTIMATE);
// compute istft
double* outputFrames[noOfFrames];
double output[N];
for (int i = 0; i < noOfFrames; i++) {
outputFrames[i] = (double*)fftw_malloc(sizeof(double) * M);
fftPlan = fftw_plan_dft_c2r_1d(M, stft[i], outputFrames[i], FFTW_ESTIMATE);
for (int m = 0; i < M; m++) {
output[i*stepSize + m] += outputFrames[i][m];
for (int i = 0; i < noOfFrames; i++) {
// output audio
AudioFile<double>::AudioBuffer outputBuffer;
outputBuffer.resize (1);
outputBuffer[0].assign(output, output+N);
bool ok = audioFile.setAudioBuffer(outputBuffer);
audioFile.setAudioBufferSize (1, N);
audioFile.setBitDepth (16);
audioFile.setSampleRate (8000); ("out/audioOutput.wav");
The segfault seems to be being raised by the first fftw_malloc when computing the forward STFT.
Thanks in advance!
The relevant bit of code is:
double* input;
if ( (N-(M-stepSize))%stepSize != 0) {
double ipt[N + amountOfZeroPadding];
input = ipt;
input[i*stepSize + m];
Your input pointer points at memory that exists only inside the if statement. The closing brace denotes the end of the lifetime of the ipt array. When dereferencing the pointer later, you are addressing memory that no longer exists.

Saliency Map with openCV

I'm trying to use the code proposed here for saliency detection on colored images. The code proposed is associated with a GUI developed in windows. In my case, I want to use it on Mac OsX with OpenCv library for reading the initial image and writing the saliency map result. Therefore I pick up the four main functions and modify the reading and writing block using OpenCV. I got the following results which are a bit different from what the authors have obtained:
Original Image
Author saliency map
Obtained saliency map
Here are the four functions. Is there something wrong that I did wrong ? I was careful to consider that in OpenCV, colors are described as B-G-R and not R-G-B.
#include <stdio.h>
#include <opencv2/opencv.hpp>
#include <iostream>
using namespace cv;
using namespace std;
void RGB2LAB2(
const vector<vector<uint> > &ubuff,
vector<double>& lvec,
vector<double>& avec,
vector<double>& bvec){
int sz = int(ubuff.size());
cout<<"sz "<<sz<<endl;
for( int j = 0; j < sz; j++ ){
int sR = ubuff[j][2];
int sG = ubuff[j][1];
int sB = ubuff[j][0];
// sRGB to XYZ conversion
// (D65 illuminant assumption)
double R = sR/255.0;
double G = sG/255.0;
double B = sB/255.0;
double r, g, b;
if(R <= 0.04045) r = R/12.92;
else r = pow((R+0.055)/1.055,2.4);
if(G <= 0.04045) g = G/12.92;
else g = pow((G+0.055)/1.055,2.4);
if(B <= 0.04045) b = B/12.92;
else b = pow((B+0.055)/1.055,2.4);
double X = r*0.4124564 + g*0.3575761 + b*0.1804375;
double Y = r*0.2126729 + g*0.7151522 + b*0.0721750;
double Z = r*0.0193339 + g*0.1191920 + b*0.9503041;
// XYZ to LAB conversion
double epsilon = 0.008856; //actual CIE standard
double kappa = 903.3; //actual CIE standard
double Xr = 0.950456; //reference white
double Yr = 1.0; //reference white
double Zr = 1.088754; //reference white
double xr = X/Xr;
double yr = Y/Yr;
double zr = Z/Zr;
double fx, fy, fz;
if(xr > epsilon) fx = pow(xr, 1.0/3.0);
else fx = (kappa*xr + 16.0)/116.0;
if(yr > epsilon) fy = pow(yr, 1.0/3.0);
else fy = (kappa*yr + 16.0)/116.0;
if(zr > epsilon) fz = pow(zr, 1.0/3.0);
else fz = (kappa*zr + 16.0)/116.0;
lvec[j] = 116.0*fy-16.0;
avec[j] = 500.0*(fx-fy);
bvec[j] = 200.0*(fy-fz);
void GaussianSmooth(
const vector<double>& inputImg,
const int& width,
const int& height,
const vector<double>& kernel,
vector<double>& smoothImg){
int center = int(kernel.size())/2;
int sz = width*height;
vector<double> tempim(sz);
int rows = height;
int cols = width;
int index(0);
for( int r = 0; r < rows; r++ ){
for( int c = 0; c < cols; c++ ){
double kernelsum(0);
double sum(0);
for( int cc = (-center); cc <= center; cc++ ){
if(((c+cc) >= 0) && ((c+cc) < cols)){
sum += inputImg[r*cols+(c+cc)] * kernel[center+cc];
kernelsum += kernel[center+cc];
tempim[index] = sum/kernelsum;
int index = 0;
for( int r = 0; r < rows; r++ ){
for( int c = 0; c < cols; c++ ){
double kernelsum(0);
double sum(0);
for( int rr = (-center); rr <= center; rr++ ){
if(((r+rr) >= 0) && ((r+rr) < rows)){
sum += tempim[(r+rr)*cols+c] * kernel[center+rr];
kernelsum += kernel[center+rr];
smoothImg[index] = sum/kernelsum;
void GetSaliencyMap(
const vector<vector<uint> >&inputimg,
const int& width,
const int& height,
vector<double>& salmap,
const bool& normflag){
int sz = width*height;
vector<double> lvec(0), avec(0), bvec(0);
RGB2LAB2(inputimg, lvec, avec, bvec);
double avgl(0), avga(0), avgb(0);
for( int i = 0; i < sz; i++ ){
avgl += lvec[i];
avga += avec[i];
avgb += bvec[i];
avgl /= sz;
avga /= sz;
avgb /= sz;
vector<double> slvec(0), savec(0), sbvec(0);
vector<double> kernel(0);
GaussianSmooth(lvec, width, height, kernel, slvec);
GaussianSmooth(avec, width, height, kernel, savec);
GaussianSmooth(bvec, width, height, kernel, sbvec);
for( int i = 0; i < sz; i++ ){
salmap[i] = (slvec[i]-avgl)*(slvec[i]-avgl) +
(savec[i]-avga)*(savec[i]-avga) +
if( true == normflag ){
vector<double> normalized(0);
Normalize(salmap, width, height, normalized);
swap(salmap, normalized);
void Normalize(
const vector<double>& input,
const int& width,
const int& height,
vector<double>& output,
const int& normrange = 255){
double maxval(0);
double minval(DBL_MAX);
int i(0);
for( int y = 0; y < height; y++ ){
for( int x = 0; x < width; x++ ){
if( maxval < input[i] ) maxval = input[i];
if( minval > input[i] ) minval = input[i];
double range = maxval-minval;
if( 0 == range ) range = 1;
int i(0);
for( int y = 0; y < height; y++ ){
for( int x = 0; x < width; x++ ){
output[i] = ((normrange*(input[i]-minval))/range);
int main(){
Mat image;
image = imread( argv[1], 1 );
if ( ! ){
printf("No image data \n");
return -1;
for(int y=0;y<image.rows;y++){
for(int x=0;x<image.cols;x++){
Vec3b color=<Vec3b>(Point(x,y));
array[image.cols*y+x][0]=color[0]; array[image.cols*y+x]
vector<double> salmap; bool normflag=true;
GetSaliencyMap(array, image.size().width, image.size().height, salmap,
Mat output;
output = Mat( image.rows, image.cols,CV_8UC1);
int k=0;
for(int y=0;y<image.rows;y++){
for(int x=0;x<image.cols;x++){<uchar>(Point(x,y)) = int(salmap[k]);
imwrite("test_saliency_blackAndWhite.jpg", output );
return 0;

C++ error: no match for call to ‘(RgbImage) (int&, int&)’

The pseudo code I'm trying to follow for this implementation:
for (int u = 0; u < uMax; u++)
{ for (int v = 0; v < vMax; v++)
{ float x = f_x(u, v);
float y = f_y(u, v);
dstImage(x, y) = srcImage(u, v);
The scaling function I'm trying to implement the above code with. I iterate over each pixel just as I did to change the r,g,b, but I'm having issues changing the value of the x. I've implemented code for this to change the r,g,b of the image but I would now like to be able to scale the x of the image by a factor of 2. This is the attempt to scale x the in the same way as changing r,g,b.
void scale()
RgbImage theTexMap( filename ); // loaded from some file name
//RgbImage destination;
double r, g, b; // variables to store the different colours
float u, v;
for (int x = 0; x < theTexMap.GetNumRows(); x++)
{ for (int y = 0; y < theTexMap.GetNumCols(); y++)
{ theTexMap.GetRgbPixel(x, y, &r, &g, &b); //this successfully allows me to change the r,g,b values
u = x * 2;
v = y;
//cout << x <<endl;
//cout << y << " " << endl;
//cout << " " <<endl;
destination.SetRgbPixelf(u, v, r, g, b); //allows me to set r,g,b values, fails with the x,y.
updateTexture(&destination, modifiedID);
According to the Pseudo code I'm trying to follow, it should be more like this. (changed inside of for loops and used u,v iteration)
void scale()
RgbImage theTexMap( filename ); // loaded from some file name
//RgbImage destination;
double r, g, b; // variables to store the different colours
float x, y;
for (int u = 0; u < theTexMap.GetNumRows(); u++)
{ for (int v = 0; v < theTexMap.GetNumCols(); v++)
{ x = u * 2;
y = v;
//cout << x <<endl;
//cout << y << " " << endl;
//cout << " " <<endl;
theTexMap(x,y) = theTexMap(u,v)
updateTexture(&theTexMap, modifiedID);
Now because I don't really understand the pseudo code for that last line, I get this error message when I call it in my implementation. The error message I receive.
error: no match for call to ‘(RgbImage) (float&, float&)’
theTexMap(x,y) = theTexMap(u,v);
Well now your saying, I don't know what RgbImage (the type of theTexMap) is you dummy. So here's it's class. Should I be using it in this case like the above? Or how should I follow that pseudo code to get my desired factor of 2 scaling?
#include "RgbImage.h"
#ifdef _WIN32
#include <windows.h>
#include "GL/gl.h"
RgbImage::RgbImage( int numRows, int numCols )
NumRows = numRows;
NumCols = numCols;
ImagePtr = new unsigned char[NumRows*GetNumBytesPerRow()];
if ( !ImagePtr ) {
fprintf(stderr, "Unable to allocate memory for %ld x %ld bitmap.\n",
NumRows, NumCols);
ErrorCode = MemoryError;
// Zero out the image
unsigned char* c = ImagePtr;
int rowLen = GetNumBytesPerRow();
for ( int i=0; i<NumRows; i++ ) {
for ( int j=0; j<rowLen; j++ ) {
*(c++) = 0;
bool RgbImage::LoadBmpFile( const char* filename )
FILE* infile = fopen( filename, "rb" ); // Open for reading binary data
if ( !infile ) {
fprintf(stderr, "Unable to open file: %s\n", filename);
ErrorCode = OpenError;
return false;
bool fileFormatOK = false;
int bChar = fgetc( infile );
int mChar = fgetc( infile );
if ( bChar=='B' && mChar=='M' ) { // If starts with "BM" for "BitMap"
skipChars( infile, 4+2+2+4+4 ); // Skip 4 fields we don't care about
NumCols = readLong( infile );
NumRows = readLong( infile );
skipChars( infile, 2 ); // Skip one field
int bitsPerPixel = readShort( infile );
skipChars( infile, 4+4+4+4+4+4 ); // Skip 6 more fields
if ( NumCols>0 && NumCols<=100000 && NumRows>0 && NumRows<=100000
&& bitsPerPixel==24 && !feof(infile) ) {
fileFormatOK = true;
if ( !fileFormatOK ) {
ErrorCode = FileFormatError;
fprintf(stderr, "Not a valid 24-bit bitmap file: %s.\n", filename);
fclose ( infile );
return false;
// Allocate memory
ImagePtr = new unsigned char[NumRows*GetNumBytesPerRow()];
if ( !ImagePtr ) {
fprintf(stderr, "Unable to allocate memory for %ld x %ld bitmap: %s.\n",
NumRows, NumCols, filename);
ErrorCode = MemoryError;
fclose ( infile );
return false;
unsigned char* cPtr = ImagePtr;
for ( int i=0; i<NumRows; i++ ) {
int j;
for ( j=0; j<NumCols; j++ ) {
*(cPtr+2) = fgetc( infile ); // Blue color value
*(cPtr+1) = fgetc( infile ); // Green color value
*cPtr = fgetc( infile ); // Red color value
cPtr += 3;
int k=3*NumCols; // Num bytes already read
for ( ; k<GetNumBytesPerRow(); k++ ) {
fgetc( infile ); // Read and ignore padding;
*(cPtr++) = 0;
if ( feof( infile ) ) {
fprintf( stderr, "Premature end of file: %s.\n", filename );
ErrorCode = ReadError;
fclose ( infile );
return false;
fclose( infile ); // Close the file
return true;
short RgbImage::readShort( FILE* infile )
// read a 16 bit integer
unsigned char lowByte, hiByte;
lowByte = fgetc(infile); // Read the low order byte (little endian form)
hiByte = fgetc(infile); // Read the high order byte
// Pack together
short ret = hiByte;
ret <<= 8;
ret |= lowByte;
return ret;
long RgbImage::readLong( FILE* infile )
// Read in 32 bit integer
unsigned char byte0, byte1, byte2, byte3;
byte0 = fgetc(infile); // Read bytes, low order to high order
byte1 = fgetc(infile);
byte2 = fgetc(infile);
byte3 = fgetc(infile);
// Pack together
long ret = byte3;
ret <<= 8;
ret |= byte2;
ret <<= 8;
ret |= byte1;
ret <<= 8;
ret |= byte0;
return ret;
void RgbImage::skipChars( FILE* infile, int numChars )
for ( int i=0; i<numChars; i++ ) {
fgetc( infile );
bool RgbImage::WriteBmpFile( const char* filename )
FILE* outfile = fopen( filename, "wb" ); // Open for reading binary data
if ( !outfile ) {
fprintf(stderr, "Unable to open file: %s\n", filename);
ErrorCode = OpenError;
return false;
int rowLen = GetNumBytesPerRow();
writeLong( 40+14+NumRows*rowLen, outfile ); // Length of file
writeShort( 0, outfile ); // Reserved for future use
writeShort( 0, outfile );
writeLong( 40+14, outfile ); // Offset to pixel data
writeLong( 40, outfile ); // header length
writeLong( NumCols, outfile ); // width in pixels
writeLong( NumRows, outfile ); // height in pixels (pos for bottom up)
writeShort( 1, outfile ); // number of planes
writeShort( 24, outfile ); // bits per pixel
writeLong( 0, outfile ); // no compression
writeLong( 0, outfile ); // not used if no compression
writeLong( 0, outfile ); // Pixels per meter
writeLong( 0, outfile ); // Pixels per meter
writeLong( 0, outfile ); // unused for 24 bits/pixel
writeLong( 0, outfile ); // unused for 24 bits/pixel
// Now write out the pixel data:
unsigned char* cPtr = ImagePtr;
for ( int i=0; i<NumRows; i++ ) {
// Write out i-th row's data
int j;
for ( j=0; j<NumCols; j++ ) {
fputc( *(cPtr+2), outfile); // Blue color value
fputc( *(cPtr+1), outfile); // Blue color value
fputc( *(cPtr+0), outfile); // Blue color value
// Pad row to word boundary
int k=3*NumCols; // Num bytes already read
for ( ; k<GetNumBytesPerRow(); k++ ) {
fputc( 0, outfile ); // Read and ignore padding;
fclose( outfile ); // Close the file
return true;
void RgbImage::writeLong( long data, FILE* outfile )
// Read in 32 bit integer
unsigned char byte0, byte1, byte2, byte3;
byte0 = (unsigned char)(data&0x000000ff); // Write bytes, low order to high order
byte1 = (unsigned char)((data>>8)&0x000000ff);
byte2 = (unsigned char)((data>>16)&0x000000ff);
byte3 = (unsigned char)((data>>24)&0x000000ff);
fputc( byte0, outfile );
fputc( byte1, outfile );
fputc( byte2, outfile );
fputc( byte3, outfile );
void RgbImage::writeShort( short data, FILE* outfile )
// Read in 32 bit integer
unsigned char byte0, byte1;
byte0 = data&0x000000ff; // Write bytes, low order to high order
byte1 = (data>>8)&0x000000ff;
fputc( byte0, outfile );
fputc( byte1, outfile );
* SetRgbPixel routines allow changing the contents of the RgbImage. *
void RgbImage::SetRgbPixelf( long row, long col, double red, double green, double blue )
SetRgbPixelc( row, col, doubleToUnsignedChar(red),
doubleToUnsignedChar(blue) );
void RgbImage::SetRgbPixelc( long row, long col,
unsigned char red, unsigned char green, unsigned char blue )
assert ( row<NumRows && col<NumCols );
unsigned char* thePixel = GetRgbPixel( row, col );
*(thePixel++) = red;
*(thePixel++) = green;
*(thePixel) = blue;
unsigned char RgbImage::doubleToUnsignedChar( double x )
if ( x>=1.0 ) {
return (unsigned char)255;
else if ( x<=0.0 ) {
return (unsigned char)0;
else {
return (unsigned char)(x*255.0); // Rounds down
// Bitmap file format (24 bit/pixel form) BITMAPFILEHEADER
// Header (14 bytes)
// 2 bytes: "BM"
// 4 bytes: long int, file size
// 4 bytes: reserved (actually 2 bytes twice)
// 4 bytes: long int, offset to raster data
// Info header (40 bytes) BITMAPINFOHEADER
// 4 bytes: long int, size of info header (=40)
// 4 bytes: long int, bitmap width in pixels
// 4 bytes: long int, bitmap height in pixels
// 2 bytes: short int, number of planes (=1)
// 2 bytes: short int, bits per pixel
// 4 bytes: long int, type of compression (not applicable to 24 bits/pixel)
// 4 bytes: long int, image size (not used unless compression is used)
// 4 bytes: long int, x pixels per meter
// 4 bytes: long int, y pixels per meter
// 4 bytes: colors used (not applicable to 24 bit color)
// 4 bytes: colors important (not applicable to 24 bit color)
// "long int" really means "unsigned long int"
// Pixel data: 3 bytes per pixel: RGB values (in reverse order).
// Rows padded to multiples of four.
bool RgbImage::LoadFromOpenglBuffer() // Load the bitmap from the current OpenGL buffer
int viewportData[4];
glGetIntegerv( GL_VIEWPORT, viewportData );
int& vWidth = viewportData[2];
int& vHeight = viewportData[3];
if ( ImagePtr==0 ) { // If no memory allocated
NumRows = vHeight;
NumCols = vWidth;
ImagePtr = new unsigned char[NumRows*GetNumBytesPerRow()];
if ( !ImagePtr ) {
fprintf(stderr, "Unable to allocate memory for %ld x %ld buffer.\n",
NumRows, NumCols);
ErrorCode = MemoryError;
return false;
assert ( vWidth>=NumCols && vHeight>=NumRows );
int oldGlRowLen;
if ( vWidth>=NumCols ) {
glGetIntegerv( GL_UNPACK_ROW_LENGTH, &oldGlRowLen );
glPixelStorei( GL_UNPACK_ROW_LENGTH, NumCols );
glPixelStorei(GL_UNPACK_ALIGNMENT, 4);
// Get the frame buffer data.
glReadPixels( 0, 0, NumCols, NumRows, GL_RGB, GL_UNSIGNED_BYTE, ImagePtr);
// Restore the row length in glPixelStorei (really ought to restore alignment too).
if ( vWidth>=NumCols ) {
glPixelStorei( GL_UNPACK_ROW_LENGTH, oldGlRowLen );
return true;
This expression:
theTexMap(x,y) = theTexMap(u,v);
Is trying to invoke RgbImage::operator()(float, float). That is not a defined operator on your type, hence the error.
I'm guessing the function you want to call is:
theTexMap.GetRgbPixel(u, v, &r, &g, &b);
theTexMap.SetRgbPixelf(x, y, r, g, b);

Cant read the file and place in 2d relative matrix address in CUDA

I am allocating a 2d matrix using malloc and trying to insert values in relative address. I do not understand why it is core dumped error. Please look at my code below.
#include <stdio.h>
#include <stdlib.h>
int main()
int width = 4;
FILE *fp = fopen("matB.txt", "r");
int *x;
x = (int*)malloc(width*width*sizeof(int));
int i, j;
for(i=0; i<width; i++)
for(j=0; j<width; j++)
fscanf(fp, "%d", x[i*width+j]);
for(i=0; i<width; i++)
for(j=0; j<width; j++)
printf("%d", x[i*width+j]);
return 0;
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
I have made the above sample program to check with the relative address and placing &x[] in fscanf cleared this problem.
The above sample C code is done because of the same read problem in Cuda. When using the same way of allocation of 2d array and its relative address, it is reading the file and when trying to print the same.. it prints 0's instead of 1,2,3,4.. I am in learning phase of CUDA. I see there is no allocation problem for the host array and placing in its relative address but why the file read is printing 0's??
Cuda Program is below
//Matrix multiplication using shared and non shared kernal
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define TILE_WIDTH 2
/*matrix multiplication kernels*/
//non shared
__global__ void MatrixMul( float *Md , float *Nd , float *Pd , const int WIDTH )
// calculate thread id
unsigned int col = TILE_WIDTH*blockIdx.x + threadIdx.x ;
unsigned int row = TILE_WIDTH*blockIdx.y + threadIdx.y ;
for (int k = 0 ; k<WIDTH ; k++ )
Pd[row*WIDTH + col]+= Md[row * WIDTH + k ] * Nd[ k * WIDTH + col] ;
// shared
__global__ void MatrixMulSh( float *Md , float *Nd , float *Pd , const int WIDTH )
//Taking shared array to break the MAtrix in Tile widht and fatch them in that array per ele
__shared__ float Mds [TILE_WIDTH][TILE_WIDTH] ;
__shared__ float Nds [TILE_WIDTH][TILE_WIDTH] ;
// calculate thread id
unsigned int col = TILE_WIDTH*blockIdx.x + threadIdx.x ;
unsigned int row = TILE_WIDTH*blockIdx.y + threadIdx.y ;
for (int m = 0 ; m<WIDTH/TILE_WIDTH ; m++ ) // m indicate number of phase
Mds[threadIdx.y][threadIdx.x] = Md[row*WIDTH + (m*TILE_WIDTH + threadIdx.x)] ;
Nds[threadIdx.y][threadIdx.x] = Nd[ ( m*TILE_WIDTH + threadIdx.y) * WIDTH + col] ;
__syncthreads() ; // for syncronizeing the threads
// Do for tile
for ( int k = 0; k<TILE_WIDTH ; k++ )
Pd[row*WIDTH + col]+= Mds[threadIdx.x][k] * Nds[k][threadIdx.y] ;
__syncthreads() ; // for syncronizeing the threads
// main routine
int main (int argc, char* argv[])
const int WIDTH = 4 ;
printf("%d\n", WIDTH);
//float array1_h[WIDTH][WIDTH] ,array2_h[WIDTH][WIDTH], M_result_array_h[WIDTH][WIDTH] ;
float *array1_h, *array2_h, *M_result_array_h;
float *array1_d , *array2_d ,*result_array_d ,*M_result_array_d; // device array
int i , j ;
cudaEvent_t start_full,stop_full;
float time;
cudaEventRecord(start_full, 0);
//char *file1 = argv[2];
//char *file2 = argv[3];
//char *file3 = argv[4];
FILE *fp1 = fopen("matA.txt", "r");
FILE *fp2 = fopen("matB.txt", "r");
FILE *fp3 = fopen("matC.txt", "w");
//create device array cudaMalloc ( (void **)&array_name, sizeofmatrixinbytes) ;
cudaMallocHost((void **) &array1_h , WIDTH*WIDTH*sizeof (float) ) ;
cudaMallocHost((void **) &array2_h , WIDTH*WIDTH*sizeof (float) ) ;
cudaMallocHost((void **) &M_result_array_h , WIDTH*WIDTH*sizeof (float) ) ;
//input in host array
for ( i = 0 ; i<WIDTH ; i++ )
for (j = 0 ; j<WIDTH ; j++ )
fscanf(fp1, "%d", &array1_h[i*WIDTH + j]);
printf("%d\t", array1_h[i*WIDTH + j]);
// fscanf(fp1, "\n");
for ( i = 0 ; i<WIDTH ; i++ )
for (j = 0 ; j<WIDTH ; j++ )
printf("%d\t", array1_h[i*WIDTH+j]);
for ( i = 0 ; i<WIDTH ; i++ )
for (j = 0 ; j<WIDTH ; j++ )
fscanf(fp2, "%d", &array2_h[i*WIDTH+j]);
fscanf(fp2, "\n");
//create device array cudaMalloc ( (void **)&array_name, sizeofmatrixinbytes) ;
cudaMalloc((void **) &array1_d , WIDTH*WIDTH*sizeof (float) ) ;
cudaMalloc((void **) &array2_d , WIDTH*WIDTH*sizeof (float) ) ;
//copy host array to device array; cudaMemcpy ( dest , source , WIDTH , direction )
cudaMemcpy ( array1_d , array1_h , WIDTH*WIDTH*sizeof (float) , cudaMemcpyHostToDevice ) ;
cudaMemcpy ( array2_d , array2_h , WIDTH*WIDTH*sizeof (float) , cudaMemcpyHostToDevice ) ;
//allocating memory for resultant device array
cudaMalloc((void **) &result_array_d , WIDTH*WIDTH*sizeof (float) ) ;
cudaMalloc((void **) &M_result_array_d , WIDTH*WIDTH*sizeof (float) ) ;
//calling kernal
dim3 dimBlock( TILE_WIDTH, TILE_WIDTH, 1 ) ;
// Change if 0 to if 1 for running non shared code and make if 0 for shared memory code
#if 0
MatrixMul <<<dimGrid,dimBlock>>> ( array1_d , array2_d ,M_result_array_d , WIDTH) ;
#if 1
MatrixMulSh<<<dimGrid,dimBlock>>> ( array1_d , array2_d ,M_result_array_d , WIDTH) ;
// all GPU function blocked till kernel is working
//copy back result_array_d to result_array_h
cudaMemcpy(M_result_array_h , M_result_array_d , WIDTH*WIDTH*sizeof(int), cudaMemcpyDeviceToHost) ;
//printf the result array
for ( i = 0 ; i<WIDTH ; i++ )
for ( j = 0 ; j < WIDTH ; j++ )
fprintf (fp3, "%d\t", M_result_array_h[i*WIDTH+j]) ;
fprintf (fp3, "\n") ;
//system("pause") ;
cudaEventRecord(stop_full, 0);
cudaEventElapsedTime(&time, start_full, stop_full);
printf ("Total execution Time is : %1.5f ms\n", time);
Should be fscanf(fp, "%d", &x[i*width+j]);. The scanf family requires the address of a location in which to write the scanned value.
Also, don't cast malloc.

haar transformation on an image

i need some help with haar transformation, i have to apply it on an image.
My math is bad, my english not all that awesome and i find it hard to understand from articles on the internet. I found this page where the haar transformation is applied on 2d matrix. I suppose if i give image pixels matrix in there, it should work?
im confused about this stuff, could someone enlighten me a bit please?
Thank you!
typedef struct {
unsigned char red,green,blue;
} PPMPixel;
typedef struct {
int x, y;
PPMPixel *data;
} PPMImage;
static PPMImage *readPPM(const char *filename)
char buff[16];
PPMImage *img;
FILE *fp;
int c, rgb_comp_color;
//open PPM file for reading
fp = fopen(filename, "rb");
if (!fp) {
fprintf(stderr, "Unable to open file '%s'\n", filename);
//read image format
if (!fgets(buff, sizeof(buff), fp)) {
//check the image format
if (buff[0] != 'P' || buff[1] != '6') {
fprintf(stderr, "Invalid image format (must be 'P6')\n");
//alloc memory form image
img = (PPMImage *)malloc(sizeof(PPMImage));
if (!img) {
fprintf(stderr, "Unable to allocate memory\n");
//check for comments
c = getc(fp);
while (c == '#') {
while (getc(fp) != '\n') ;
c = getc(fp);
ungetc(c, fp);
//read image size information
if (fscanf(fp, "%d %d", &img->x, &img->y) != 2) {
fprintf(stderr, "Invalid image size (error loading '%s')\n", filename);
//read rgb component
if (fscanf(fp, "%d", &rgb_comp_color) != 1) {
fprintf(stderr, "Invalid rgb component (error loading '%s')\n", filename);
//check rgb component depth
if (rgb_comp_color!= RGB_COMPONENT_COLOR) {
fprintf(stderr, "'%s' does not have 8-bits components\n", filename);
while (fgetc(fp) != '\n') ;
//memory allocation for pixel data
img->data = (PPMPixel*)malloc(img->x * img->y * sizeof(PPMPixel));
if (!img) {
fprintf(stderr, "Unable to allocate memory\n");
//read pixel data from file
if (fread(img->data, 3 * img->x, img->y, fp) != img->y) {
fprintf(stderr, "Error loading image '%s'\n", filename);
return img;
void writePPM(const char *filename, PPMImage *img)
FILE *fp;
//open file for output
fp = fopen(filename, "wb");
if (!fp) {
fprintf(stderr, "Unable to open file '%s'\n", filename);
//write the header file
//image format
fprintf(fp, "P6\n");
fprintf(fp, "# Created by %s\n",CREATOR);
//image size
fprintf(fp, "%d %d\n",img->x,img->y);
// rgb component depth
fprintf(fp, "%d\n",RGB_COMPONENT_COLOR);
// pixel data
fwrite(img->data, 3 * img->x, img->y, fp);
void imageDivide(const char *filename,PPMImage *img)
FILE *fp = fopen(filename,"rb");
FILE *filePtr;
filePtr = fopen ("floatArray.txt","w");
int width = 288;
int height = 352;
int i,j,m,k,l,i1,j1,l1,n1;
int *sum;
float *mean,*var;
unsigned char buff[(288*352)];
unsigned char image[288][352];
size_t n = fread( buff, sizeof(buff[0]), sizeof(buff), fp );
for(i =0; i < height; i++)
for(j = 0; j < width;j++)
image[j][i] = buff[(i*width)+j];
unsigned char dividedimage[(288*352)/(8*8)][8][8];
mean=(float *)malloc(sizeof(float)*1584);
var=(float *)malloc(sizeof(float)*1584);
sum=(int *)malloc(sizeof(int)*1584);
for(i = 0; i < height/8; i++)
for(j = 0; j < width/8;j++)
for(k = i*8, l = 0; k < (i*8)+8; k++,l++)
for(m = j*8, n = 0; m < (j*8)+8; m++,n++)
dividedimage[(i*(width/8))][n][l] = image[m][k];
printf("\n no of grids::%d i=%d j=%d,k=%d,m=%d n=%d",(i*(width/8)),i,j,k,m,n);
printf("\nprinting info of %dth grid::\n",i1+1);
// printf("%5d",dividedimage[i1][j1][l1]);
// printf("\n sum of intensities of grid %d is ::%d and mean is %f\n",i1+1,sum[i1],mean[i1]);
printf("\nprinting info of %dth grid::\n",i1+1);
printf("\n variance of grid %d is ::%f\n",i1+1,var[i1]);
for (i = 0; i < 1584; i++) {
// y[i] = var[i1]);
fprintf (filePtr, "%5f\n", var[i]);
/** The 1D Haar Transform **/
void haar1d(float *vec, int n)
int i=0;
int w=n;
FILE *filePtr;
filePtr = fopen ("1dhaarwavelet.txt","w");
float *vecp ;
vecp=(float *)malloc(sizeof(float)*n);
vecp[i] = 0;
vecp[i] = (vec[2*i] + vec[2*i+1])/sqrt(2.0);
vecp[i+w] = (vec[2*i] - vec[2*i+1])/sqrt(2.0);
vec[i] = vecp[i];
// delete [] vecp;
printf("\nthe 1d haarwavelet trasform is::");
for (i = 0; i < n; i++) {
fprintf (filePtr, "%5f\n", vec[i]);
/** A Modified version of 1D Haar Transform, used by the 2D Haar Transform function **/
void haar1(float *vec, int n, int w)
int i=0;
float *vecp = (float *)malloc(sizeof(float)*n);
vecp[i] = 0;
vecp[i] = (vec[2*i] + vec[2*i+1])/sqrt(2.0);
vecp[i+w] = (vec[2*i] - vec[2*i+1])/sqrt(2.0);
vec[i] = vecp[i];
// delete [] vecp;
/** The 2D Haar Transform **/
void haar2(float **matrix, int rows, int cols)
float *temp_row = (float *)malloc(sizeof(float)*cols);
float *temp_col = (float *)malloc(sizeof(float)*rows);
int i=0,j=0;
int w = cols, h=rows;
while(w>1 || h>1)
temp_row[j] = matrix[i][j];
matrix[i][j] = temp_row[j];
temp_col[j] = matrix[j][i];
haar1(temp_col, rows, h);
matrix[j][i] = temp_col[j];
// delete [] temp_row;
// delete [] temp_col;
int main(){
char filein[100],fileout[100];
PPMImage *image1,*image2;
int m,i,j;
printf("\nEnter the input file name::");
image1 = readPPM(filein);
// printf("\nEnter the output file name::");
// gets(fileout);
// writePPM(fileout,image1);
float **mat = (float **)malloc(sizeof(float*)*4);
mat[m] = (float *)malloc(sizeof(float)*4);
mat[0][0] = 5; mat[0][1] = 6; mat[0][2] = 1; mat[0][3] = 2;
mat[1][0] = 4; mat[1][1] = 2; mat[1][2] = 5; mat[1][3] = 5;
mat[2][0] = 3; mat[2][1] = 1; mat[2][2] = 7; mat[2][3] = 1;
mat[3][0] = 6; mat[3][1] = 3; mat[3][2] = 5; mat[3][3] = 1;
printf("\nafter 2d haarwavelet::\n");
printf(" %f ",mat[i][j]);
printf("Press any key...");
If you're trying to do object detection using Haar features, pay a look at OpenCV:
There is an example at the end of the URL you point to.
Look inside the main() function.
the 2D variant takes a float** and two parameters, height and width
the float** points to rows of grayscale pixels
each row is a float*, a pointer to the first pixel in the row
each float value is the intensity value of the pixel
in the example code, the dimensions are 4x4.
This is where the memory is allocated:
float **mat = new float*[4];
for(int m=0;m<4;m++)
mat[m] = new float[4];
This is where the pixel values are set:
mat[0][0] = 5; mat[0][1] = 6; mat[0][2] = 1; mat[0][3] = 2;
mat[1][0] = 4; mat[1][1] = 2; mat[1][2] = 5; mat[1][3] = 5;
mat[2][0] = 3; mat[2][1] = 1; mat[2][2] = 7; mat[2][3] = 1;
mat[3][0] = 6; mat[3][1] = 3; mat[3][2] = 5; mat[3][3] = 1;
This is where the haar2 function is called:
All you need to do is provide the data as needed by the function (float**) with the right dimensions. You probably want to store the results to an output file that you can open in an image viewing application.
Look for the PGM format for a really easy solution. Note that the results of the haar function will give you floating point values, which you may have to compress down to 8 bit to view the image.