Propose normal random variables using vdRngGaussian - mex

I want to draw n random variables form the Normal(mean, sigma2) using the function vdRngGaussian.
One way to do this is with the command
vdRngGaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF, stream, n, x, mean, sqrt(sigma2) )
Instead of this I want to use a for-loop. The mex-code I wrote is
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <mkl.h>
#include "mkl_vml.h"
#include "mex.h"
#include "matrix.h"
#include "mkl_vsl.h"
#include <time.h>
#define SEED time(NULL)
double normal(double mean, double sigma2, VSLStreamStatePtr stream);
/* main fucntion */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *x, mean, sigma2;
VSLStreamStatePtr stream;
vslNewStream( &stream, VSL_BRNG_MT19937, SEED );
/* make pointers to input data */
mean = (double)mxGetScalar(prhs[0]);
sigma2= (double)mxGetScalar(prhs[1]);
/* make pointers to output data */
plhs[0] = mxCreateDoubleMatrix(1, 1, mxREAL);
x = mxGetPr(plhs[0]);
x[0] = normal( mean, sigma2, stream);
/* Deleting the stream */
vslDeleteStream( &stream );
return;
}
double normal(double mean, double sigma2, VSLStreamStatePtr stream)
{
double x[1];
vdRngGaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF, stream, 1, x, mean, sqrt(sigma2) );
return(x[0]);
}
When I run the code with the command
for i=1:5
out(i) = normalc(0.0, 1.0);
end
I get the following results:
-1.1739 -1.1739 -1.1739 -1.1739 -1.1739
If I call my function 5 times without a for-loop I get these results
-0.2720, 2.1457, -1.2397, 0.7501, 0.1490
Could you please help me.
Thank you very much.

The stream you are creating with vslNewStream takes a seed (the initial condition of the random number generator) as an argument, and you are basing this on the output of the time function from the C time library. The problem is it returns a number of seconds. This is resolution too coarse, and you end up with the same seed when you call it in a for loop. (Assuming the for loop executes quickly and you are lucky with the start time.)
Perhaps use a different seed, like the high_resolution_clock from the C++11 <chrono> header. You may still get the epoch time, but in different units:
using namespace std::chrono;
system_clock::time_point now = high_resolution_clock::now();
system_clock::duration epoch = now.time_since_epoch();
long long ns = duration_cast<nanoseconds>(epoch).count();
long long mic = duration_cast<microseconds>(epoch).count();
long long ms = duration_cast<milliseconds>(epoch).count();
Demo (cpp.sh).
Alternatively, you can use elapsed CPU clocks from MKL's mkl_get_cpu_clocks function, but this goes back to zero when you shutdown your machine. Or just the time in seconds as a double with MKL's dsecnd.

Related

No matching function for call to 'as' in Rcpp

I'm trying to write an R wrapper for the FINUFFT routines for calculating the FFT of an unevenly sampled series. I have virtually no experience with C/C++, so I'm working from an example that compares the traditional Fourier transform to the NUFFT. The example code follows.
// this is all you must include for the finufft lib...
#include "finufft.h"
#include <complex>
// also needed for this example...
#include <stdio.h>
#include <stdlib.h>
using namespace std;
int main(int argc, char* argv[])
/* Simple example of calling the FINUFFT library from C++, using plain
arrays of C++ complex numbers, with a math test. Barnett 3/10/17
Double-precision version (see example1d1f for single-precision)
Compile with:
g++ -fopenmp example1d1.cpp -I ../src ../lib-static/libfinufft.a -o example1d1 -lfftw3 -lfftw3_omp -lm
or if you have built a single-core version:
g++ example1d1.cpp -I ../src ../lib-static/libfinufft.a -o example1d1 -lfftw3 -lm
Usage: ./example1d1
*/
{
int M = 1e6; // number of nonuniform points
int N = 1e6; // number of modes
double acc = 1e-9; // desired accuracy
nufft_opts opts; finufft_default_opts(&opts);
complex<double> I = complex<double>(0.0,1.0); // the imaginary unit
// generate some random nonuniform points (x) and complex strengths (c):
double *x = (double *)malloc(sizeof(double)*M);
complex<double>* c = (complex<double>*)malloc(sizeof(complex<double>)*M);
for (int j=0; j<M; ++j) {
x[j] = M_PI*(2*((double)rand()/RAND_MAX)-1); // uniform random in [-pi,pi)
c[j] = 2*((double)rand()/RAND_MAX)-1 + I*(2*((double)rand()/RAND_MAX)-1);
}
// allocate output array for the Fourier modes:
complex<double>* F = (complex<double>*)malloc(sizeof(complex<double>)*N);
// call the NUFFT (with iflag=+1): note N and M are typecast to BIGINT
int ier = finufft1d1(M,x,c,+1,acc,N,F,opts);
int n = 142519; // check the answer just for this mode...
complex<double> Ftest = complex<double>(0,0);
for (int j=0; j<M; ++j)
Ftest += c[j] * exp(I*(double)n*x[j]);
int nout = n+N/2; // index in output array for freq mode n
double Fmax = 0.0; // compute inf norm of F
for (int m=0; m<N; ++m) {
double aF = abs(F[m]);
if (aF>Fmax) Fmax=aF;
}
double err = abs(F[nout] - Ftest)/Fmax;
printf("1D type-1 NUFFT done. ier=%d, err in F[%d] rel to max(F) is %.3g\n",ier,n,err);
free(x); free(c); free(F);
return ier;
}
Much of this I don't need, such as generating the test series and comparing to the traditional FFT. Further, I want to return the values of the transform, not just an error code indicating success. Below is my code.
#include "finufft.h"
#include <complex>
#include <Rcpp.h>
#include <stdlib.h>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
ComplexVector finufft(int M, NumericVector x, ComplexVector c, int N) {
// From example code for finufft, sets precision and default options
double acc = 1e-9;
nufft_opts opts; finufft_default_opts(&opts);
// allocate output array for the finufft routine:
complex<double>* F = (complex<double>*)malloc(sizeof(complex<double>*)*N);
// Change vector inputs from R types to C++ types
double* xd = as< double* >(x);
complex<double>* cd = as< complex<double>* >(c);
// call the NUFFT (with iflag=-1): note N and M are typecast to BIGINT
int ier = finufft1d1(M,xd,cd,-1,acc,N,F,opts);
ComplexVector Fd = as<ComplexVector>(*F);
return Fd;
}
When I try to source this in Rstudio, I get the error "no matching function for call to 'as(std::complex<double>*&)'", pointing to the line declaring Fd towards the end. I believe the error indicates that either the function 'as' isn't defined (which I know is false), or the argument to 'as' isn't the correct type. The examples here include one using 'as' to convert to a NumericVector, so unless there's some complication with complex values I don't see why it should be a problem here.
I know there are potential problems using two namespaces, but I don't believe that's the issue here. My best guess is that there's an issue with how I'm trying to use pointers, but I lack the experience to identify it and I can't find any similar examples online to guide me.
Rcpp::as<T> converts from an R data type (SEXP) to a C++ data type, e.g. Rcpp::ComplexVector. This does not fit your situation, where you try to convert from a C-style array to C++. Fortunately Rcpp::Vector, which is the basis for Rcpp::ComplexVector, has a constructor for this task: Vector (InputIterator first, InputIterator last). For the other direction (going from C++ to C-style array) you can use vector.begin() or &vector[0].
However, one needs a reinterpret_cast to convert between Rcomplex* and std::complex<double>*. That should cause no problems, though, since Rcomplex (a.k.a. complex double in C) and std::complex<doulbe> are compatible.
A minimal example:
#include <Rcpp.h>
#include <complex>
using namespace Rcpp;
// [[Rcpp::export]]
ComplexVector foo(ComplexVector v) {
std::complex<double>* F = reinterpret_cast<std::complex<double>*>(v.begin());
int N = v.length();
// do something with F
ComplexVector Fd(reinterpret_cast<Rcomplex*>(F),
reinterpret_cast<Rcomplex*>(F + N));
return Fd;
}
/*** R
set.seed(42)
foo(runif(4)*(1+1i))
*/
Result:
> Rcpp::sourceCpp('56675308/code.cpp')
> set.seed(42)
> foo(runif(4)*(1+1i))
[1] 0.9148060+0.9148060i 0.9370754+0.9370754i 0.2861395+0.2861395i 0.8304476+0.8304476i
BTW, you can move these reinterpret_casts out of sight by using std::vector<std::complex<double>> as argument and return types for your function. Rcpp does the rest for you. This also helps getting rid of the naked malloc:
#include <Rcpp.h>
// dummy function with reduced signature
int finufft1d1(int M, double *xd, std::complex<double> *cd, int N, std::complex<double> *Fd) {
return 0;
}
// [[Rcpp::export]]
std::vector<std::complex<double>> finufft(int M,
std::vector<double> x,
std::vector<std::complex<double>> c,
int N) {
// allocate output array for the finufft routine:
std::vector<std::complex<double>> F(N);
// Change vector inputs from R types to C++ types
double* xd = x.data();
std::complex<double>* cd = c.data();
std::complex<double>* Fd = F.data();
int ier = finufft1d1(M, xd, cd, N, Fd);
return F;
}

Inconsistent MEX file Output using Armadillo Interpolation

I am trying to convert matlab code to a c++ mex file in order to run a few computations more efficiently. I am using the armadillo library with blas and lapack for a few matrix operations, which involves interpolating data to apply a delay.
However, I am receiving an inconsistent output from my mex file. If I run the same mex file with the same input, sometimes I receive the correct output, and occasionally it will output a HUGE number (i.e. instead of on the order of 100, it is on the order of 10^246).
I am very new to c++ coding, and have exhausted my general knowledge base. I believe the problem is in my interpolation step, because I am able to consistently output the correct delay matrix, which is the preceeding step.
Does anyone have any idea what I am doing to produce this?
In Matlab I call:
mex test.cpp -lblas -llapack
[outData] = test( squeeze(inData(:,:,ang,:)) , params, angles(ang),1);
My mex file is generally:
#include <math.h>
#include <mex.h>
#include <armadillo>
#include "armaMex.hpp"
using namespace std; //avoid having to scope with std:: before commands
using namespace arma; //avoid having to scope with std:: before commands
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]){
// ============== INITIALIZE =============
// Initialize Data
const mwSize *dims;
int cDim,dDim,aDim,numDims; // Dimension variables
int m, n, a; // Loop variables
mxArray *fs_p, *f0_p, *prf_p, *pval_p, *c_p; // Parameter pointers
const double *fs,*f0,*prf,*pval, *c, *ang; // Parameter variables
const int *nthreads;
// Initialize pointers for param variables
pval_p = mxGetField(prhs[1],0,"pval"); //note that your parameters need these exact names
fs_p = mxGetField(prhs[1],0,"fs");
f0_p = mxGetField(prhs[1],0,"f0");
prf_p = mxGetField(prhs[1],0,"prf");
c_p = mxGetField(prhs[1],0,"c");
// Initialize parameters
pval = mxGetPr(pval_p);
fs = mxGetPr(fs_p);
f0 = mxGetPr(pval_p);
prf = mxGetPr(prf_p);
c = mxGetPr(c_p);
ang = (double*)mxGetData(prhs[2]);
nthreads = (int*)mxGetData(prhs[3]);
dims = mxGetDimensions(prhs[0]);
numDims = (int)mxGetNumberOfDimensions(prhs[0]);
dDim=(int)dims[0];cDim=(int)dims[1];aDim=(int)dims[2];
//Read in channel Data
cube data_in = armaGetCubePr(prhs[0]);
(....... simple calculations that look okay ... )
cube data_out(dDim, bDim, aDim);
cube delayedData(dDim, aDim, bDim);
vec delayArray(dDim); //need to define these tmp variables bc subcube fcn otherwise gives me errors idk
vec tmpIN(dDim);
vec tmpOut(dDim);
vec tmpOUTdata(dDim);
for(m=0;m<bDim;m++){
for(n=0;n<cDim;n++){
for (a=0;a<aDim;a++){
delayArray = tdelays.subcube(0,n,m,dDim-1,n,m);
tmpIN = data_in.subcube(0,n,a,dDim-1,n,a);
tmpOUTdata = data_out.subcube(0,m,a,dDim-1,m,a);
interp1(timeArray, tmpIN , delayArray, tmpOut, "linear",0);
data_out.subcube(0,m,a,dDim-1,m,a) = tmpOUTdata +tmpOut;
}
}
}
// Define output data
plhs[0] = armaCreateMxMatrix(data_out.n_rows, data_out.n_cols, data_out.n_slices);
armaSetCubePr(plhs[0], data_out);
return
}

Why doesnt my function count?

I have a function that if you plug a number into it it counts them out. It only does this if you call the function at the begining of the program meaning it has something to do with clock(). I added clock() to the rest of my variables but the function doesnt count. Specifically at the if statement.
code:
#include <stdio.h>
#include <string>
#include <iostream>
#include <stdlib.h>
#include <windows.h>
#include "math.h"
#include <time.h>
#include <ctime>
#include <cstdlib>
#include <mmsystem.h>
void countbysec(int Seconds);
using namespace std;
int main(){
int secondsinput;
cout<<"Type how many seconds to cout \n";
cin>>secondsinput;
countbysec(secondsinput);
return 0;
}
void countbysec(int Seconds){
clock_t Timer;
Timer = clock() + Seconds * CLOCKS_PER_SEC ;
clock_t counttime = clock() + (Timer / Seconds);
clock_t secondcount = 0;
while(clock() <= Timer){
if(clock() == counttime){
counttime = counttime + CLOCKS_PER_SEC;
secondcount = secondcount + 1;
cout<<secondcount<<endl;
}
}
}
You're not calling the function with this line:
void countbysec(int Seconds);
You're forward declaring the function. The compiler needs to see the declaration of the function before you can call it otherwise you'll see:
error: use of undeclared identifier 'countbysec'
It needs to be able to type check and generate the code for the call at the point you make the call.
You could declare and define the function in one step by moving the code block from below main() to above it in your file. This is normal C/C++ behaviour.
First, clock is not the right tool to do what it looks like you want. It returns an approximation of how much CPU time has been used by the process. Approximation should have a few alarm bells ringing. Next, time CPU time is not the same as wall time, so who can say just how much elapsed time went by while the program worked its way to ten seconds.
So
if(clock() == counttime){
The time must be EXACTLY counttime in order to do your count-incrementing. The odds of pulling this off are not too good. You might shoot past it. And if you do, nothing will ever be counted. I recommend
if(clock() >= counttime){
Next, you are unlikely to get that last second because
while(clock() <= Timer)
Probably trips first and exits the loop.
If you want to count seconds, look to something like std::this_thread::sleep_until.
Set Wake time = current time + 1 second, then
Sleep until Wake time,
count,
Add a second onto Wake time,
repeat until done.

Count floating-point instructions

I am trying to count the number of floating point operations in one of my programs and I think perf could be the tool I am looking for (are there any alternatives?), but I have trouble limiting it to a certain function/block of code. Lets take the following example:
#include <complex>
#include <cstdlib>
#include <iostream>
#include <type_traits>
template <typename T>
typename std::enable_if<std::is_floating_point<T>::value, T>::type myrand()
{
return static_cast <T> (std::rand()) / static_cast <T> (RAND_MAX);
}
template <typename T>
typename std::enable_if<!std::is_floating_point<T>::value, std::complex<typename T::value_type>>::type myrand()
{
typedef typename T::value_type S;
return std::complex<S>(
static_cast <S> (std::rand()) / static_cast <S> (RAND_MAX),
static_cast <S> (std::rand()) / static_cast <S> (RAND_MAX)
);
}
int main()
{
auto const a = myrand<Type>();
auto const b = myrand<Type>();
// count here
auto const c = a * b;
// stop counting here
// prevent compiler from optimizing away c
std::cout << c << "\n";
return 0;
}
The myrand() function simply returns a random number, if the type T is complex then a random complex number. I did not hardcode doubles into the program because they would be optimized away by the compiler.
You can compile the file (lets call it bench.cpp) with c++ -std=c++0x -DType=double bench.cpp.
Now I would like to count the number of floating point operations, which can be done on my processor (Nehalem architecture, x86_64 where floating point is done with scalar SSE) with the event r8010 (see Intel Manual 3B, Section 19.5). This can be done with
perf stat -e r8010 ./a.out
and works as expected; however it counts the overall number of uops (is there a table telling how many uops a movsd e.g. is?) and I am only interested in the number for the multiplication (see in the example above).
How can this be done?
I finally found a way to do this, although not using perf but instead the corresponding perf API. One first has to define a perf_event_open function which is actually a syscall:
#include <cstdlib> // stdlib.h for C
#include <cstdio> // stdio.h for C
#include <cstring> // string.h for C
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/perf_event.h>
#include <asm/unistd.h>
long perf_event_open(
perf_event_attr* hw_event,
pid_t pid,
int cpu,
int group_fd,
unsigned long flags
) {
int ret = syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
return ret;
}
Next, one selects the events one wishes to count:
perf_event_attr attr;
// select what we want to count
std::memset(&attr, 0, sizeof(perf_event_attr));
attr.size = sizeof(perf_event_attr);
attr.type = PERF_TYPE_HARDWARE;
attr.config = PERF_COUNT_HW_INSTRUCTIONS;
attr.disabled = 1;
attr.exclude_kernel = 1; // do not count the instruction the kernel executes
attr.exclude_hv = 1;
// open a file descriptor
int fd = perf_event_open(&attr, 0, -1, -1, 0);
if (fd == -1)
{
// handle error
}
In this case I want to count simply the number of instructions. Floating point instructions can be counted on my processor (Nehalem) by replacing the corresponding lines with
attr.type = PERF_TYPE_RAW;
attr.config = 0x8010; // Event Number = 10H, Umask Value = 80H
By setting the type to RAW one can basically count every event the processor is offering; the number 0x8010 specifies which one. Note that this number is highly processor-dependent! One can find the right numbers in the Intel Manual 3B, Part2, Chapter 19, by picking the right subsection.
One can then measure the code by enclosing it in
// reset and enable the counter
ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
// perform computation that should be measured here
// disable and read out the counter
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
long long count;
read(fd, &count, sizeof(long long));
// count now has the (approximated) result
// close the file descriptor
close(fd);

Definitive function for get elapsed time in miliseconds

I have tried clock_gettime(CLOCK_REALTIME) and gettimeofday() without luck - And the most basic like clock(), what return 0 to me (?).
But none of they count the time under sleep. I don't need a high resolution timer, but I need something for getting the elapsed time in ms.
EDIT: Final program:
#include <iostream>
#include <string>
#include <time.h>
#include <sys/time.h>
#include <sys/resource.h>
using namespace std;
// Non-system sleep (wasting cpu)
void wait ( int seconds )
{
clock_t endwait;
endwait = clock () + seconds * CLOCKS_PER_SEC ;
while (clock() < endwait) {}
}
int show_time() {
timeval tv;
gettimeofday(&tv, 0);
time_t t = tv.tv_sec;
long sub_sec = tv.tv_usec;
cout<<"t value: "<<t<<endl;
cout<<"sub_sec value: "<<sub_sec<<endl;
}
int main() {
cout<<show_time()<<endl;
sleep(2);
cout<<show_time()<<endl;
wait(2);
cout<<show_time()<<endl;
}
You need to try gettimeofday() again, it certainly count the wall clock time, so it counts when the process sleep as well.
long long getmsofday()
{
struct timeval tv;
gettimeofday(&tv);
return (long long)tv.tv_sec*1000 + tv.tv_usec/1000;
}
...
long long start = getmsofday();
do_something();
long long end = getmsofday();
printf("do_something took %lld ms\n",end - start);
Your problem probably relates to integral division. You need to cast one of the division operands to float/double to avoid truncation of decimal values less than a second.
clock_t start = clock();
// do stuff
// Can cast either operand for the division result to a double.
// I chose the right-hand operand, CLOCKS_PER_SEC.
double time_passed = clock() / static_cast<double>(CLOCKS_PER_SEC);
[Edit] As pointed out, clock() measures CPU time (clock ticks/cycles) and is not suitable well-suited for wall timer tests. If you want a portable solution for that, #see Boost.Timer as a possible solution
You actually want clock_gettime(CLOCK_MONOTONIC, ...).