Create a new Rcpp::DateVector using given dates as string - c++

How can I create a new DateVector (a C++ class from the package Rcpp) with dates given during compile time?
Since a DateVector is an NumericVector I can do this:
DateVector d = DateVector::create(14974, 14975, 15123); // TODO how to use real dates instead?
But I would prefer to use a more intuitive and human-readable representation like
DateVector d = DateVector::create("2010-12-31", "2011-01-01", "2011-05-29");
but this causes an compiler error like
Rcpp/include/Rcpp/vector/converter.h:34:27: error: no matching
function for call to ‘caster::target>(const char [11])’
return caster(input) ;
Edit: I have found a working example (also showing some variations):
DateVector d = DateVector::create(Date("2010-12-31"), Date("01.01.2011", "%d.%m.%Y"), Date(2011, 05, 29));

Related

Pig: Joining without nested renaming

I have two datasets
A(af1, af2, af3)
B(bf1, bf2, bf3)
When I join them in Pig as
C = Join A by af1, B by bf1
And subsequently store as a JSON (after removing the join-predicate column)
store C into 'output.son' using JsonStorage();
I see a JSON schema as
{"A::af1":val, "A::af2":val, ...., "B::bf2":val, ...}
Is there a way I can strip off the unnecessary (as I am taking care of the ambiguity already) nesting-like naming resulting from the join?
Thanks in advance
We have to iterate over relation/alias C and generate the required fields and then store the new alias, lets say new alias is D.
D = FOREACH C GENERATE A::af1 AS af1, A::af2 AS af2, A::af3 AS af3, B::bf2 AS bf2, B::bf3 AS bf3;
STORE D INTO 'output.son' USING JsonStorage();
Update :
If there are 100 of unique field names in alias A, likewise in B, then after join we can use .. operator and select the required columns. We can even access the required fields using position notation also ($0..$99,$101..$200)
C = JOIN A BY af1, B BY bf1;
D = FOREACH C GENERATE af1..af100,bf2..bf100;
STORE D INTO 'output.son' USING JsonStorage();

subset Armadillo field

If I understand correctly, a field in Armadillo is like a List for arbitrary objects. For instance a set of matrices of different sizes, or matrices and vectors. In the documentation I have seen the type cube which can be used with slices so you can subset using them. However, it seems there is no specific method to subset the fields.
A simplified version of my code is:
arma::mat A = eye(2,2);
arma::mat B = eye(3,3)*3;
arma::mat C = eye(4,4)*4;
arma::field<arma::mat> F(3,1);
F(0,0) = A;
F(1,0) = B;
F(2,1) = C;
// to get matrices B and C
F.slices(1,2);
but get error
Error: field::slices(): indicies out of bounds or incorrectly used
Firstly, there is a small error in the code you presented:
F(2,1) = C;
I assume it should be:
F(2,0) = C;
Secondly, the function slices() is only valid for 3D fields. Your field F, however, is only a 2D field because you only specify rows and columns in the constructor. To access matrices B and C, you can instead use:
arma::field<arma::mat> G=F.subfield(1,0,2,0);
or:
arma::field<arma::mat> G=F.rows(1,2);
More info on the subfield views at this page.

libsvm : C++ vs. MATLAB : What's With The Different Accuracies?

I have two multi-class data sets with 5 labels, one for training, and the other for cross validation. These data sets are stored as .csv files, so they act as a control in this experiment.
I have a C++ wrapper for libsvm, and the MATLAB functions for libsvm.
For both C++ and MATLAB:
Using a C-type SVM with an RBF kernel, I iterate over 2 lists of C and Gamma values. For each parameter combination, I train on the training data set and then predict the cross validation data set. I store the accuracy of the prediction in a 2D map which correlates to the C and Gamma value which yielded the accuracy.
I've recreated different training and cross validation data sets many, many times. Each time, the C++ and MATLAB accuracies are different; sometimes by a lot! Mostly MATLAB produces higher accuracies, but sometimes the C++ implementation is better.
What could be accounting for these differences? The C/Gamma values I'm trying are the same, as are the remaining SVM parameters (default).
There should be no significant differences as both C and Matlab codes use the same svm.c file. So what can be the reason?
implementation error in your code(s), this is unfortunately the most probable one
used wrapper has some bug and/or use other version of libsvm then your matlab code (libsvm is written in pure C and comes with python, Matlab and java wrappers, so your C++ wrapper is "not official") or your wrapper assumes some additional default values, which are not default in C/Matlab/Python/Java implementations
you perform cross validation in somewhat randomized form (shuffling the data and then folding, which is completely correct and reasonable, but will lead to different results in two different runs)
There is some rounding/conversion performed during loading data from .csv in one (or both) of your codes which leads to inconsistencies (really not likely to happen, yet still possible)
I trained an SVC using scikit-Learn (sklearn.svm.SVC) within a python Jupiter Notebook. I wanted to use the trained classifier in MATLAB v. 2022a and C++. I nedeed to verify that all three versions' predictions matched for each implementation of the kernel, decision, and prediction functions. I found some useful guidance from bcorso's implementation of the original libsvm C++ code.
Exporting structure that represents the structure's model is explained in bcorso's post ab required to call his prediction function implementation:
predict(params, sv, nv, a, b, cs, X)
for it to match sklearn's version for trained classifier instance, clf:
clf.predict(X)
Once I established this match, I created a MATLAB versions of bcorso's kernel,
function [k] = kernel_svm(params, sv, X)
k = zeros(1,length(sv));
if strcmp(params.kernel,'linear')
for i = 1:length(sv)
k(i) = dot(sv(i,:),X);
end
elseif strcmp(params.kernel,'rbf')
for i = 1:length(sv)
k(i) =exp(-params.gamma*dot(sv(i,:)-X,sv(i,:)-X));
end
else
uiwait(msgbox('kernel not defined','Error','modal'));
end
k = k';
end
decision,
function [d] = decision_svm(params, sv, nv, a, b, X)
%% calculate the kernels
kvalue = kernel_svm(params, sv, X);
%% define the start and end index for support vectors for each class
nr_class = length(nv);
start = zeros(1,nr_class);
start(1) = 1;
%% First Class Loop
for i = 1:(nr_class-1)
start(i+1) = start(i)+ nv(i)-1;
end
%% Other Classes Nested Loops
for i = 1:nr_class
for j = i+1:nr_class
sum = 0;
si = start(i); %first class start
sj = start(j); %first class end
ci = nv(i)+1; %next class start
cj = ci+ nv(j)-1; %next class end
for k = si:sj
sum =sum + a(k) * kvalue(k);
end
sum1=sum;
sum = 0;
for k = ci:cj
sum = sum + a(k) * kvalue(k);
end
sum2=sum;
end
end
%% Add class sums and the intercept
sumd = sum1 + sum2;
d = -(sumd +b);
end
and predict functions.
function [class, classIndex] = predict_svm(params, sv, nv, a, b, cs, X)
dec_value = decision_svm(params, sv, nv, a, b, X);
if dec_value <= 0
class = cs(1);
classIndex = 1;
else
class = cs(2);
classIndex = 0;
end
end
Translation of the python comprehension syntax to a MATLAB/C++ equivalent of the summations required nested for loops in the decision function.
It is also required to account for for MATLAB indexing (base 1) vs.Python/C++ indexing (base 0).
The trained classifer model is conveyed by params, sv, nv, a, b, cs, which can be gathered within a structure after hanving exported the sv and a matrices as .csv files from teh python notebook. I simply created a wrapper MATLAB function svcInfo that builds the structure:
svcStruct = svcInfo();
params = svcStruct.params;
sv= svcStruct.sv;
nv = svcStruct.nv;
a = svcStruct.a;
b = svcStruct.b;
cs = svcStruct.cs;
Or one can save the structure contents within as MATLAB workspace within a .mat file.
The new case for prediction is provided as a vector X,
%Classifier input feature vector
X=[x1 x2...xn];
A simplified C++ implementation that follows bcorso's python version is fairly similar to this MATLAB implementation in that it uses the nested "for" loop within the decision function but it uses zero based indexing.
Once tested, I may expand this post with the C++ version on the MATLAB code shared above.

write an Rdata file from C++

Suppose I have a C++ program that has a vector of objects that I want to write out to an Rdata data.frame file, one observation per element of the vector. How can I do that? Here is an example. Suppose I have
vector<Student> myStudents;
And Student is a class which has two data members, name which is of type std::string and grade which is of type int.
Is my only option to write a csv file?
Note that Rdata is a binary format so I guess I would need to use a library.
A search for Rdata [r] [C++] came up empty.
I think nobody has bothered to extract a binary file writer from the R sources to be used independently from R.
Almost twenty years ago I did the same for Octave files as their format is simply: two integers for 'n' and 'k', followed by 'n * k' of data -- so you could read / write with two function calls each.
I fear that for R you would have to cover too many of R's headers -- so the easiest (?) route may be to give the data to R, maybe via Rserve ('loose' connection over tcp/ip) and RInside (tighter connection via embedding), and have R write it.
Edit: In the years since the original answer was written, one such library has been created: librdata.
Here is an example of a function that saves a list in a RData. This example is based on the previous answer :
void save_List_RData(const List &list_Data, const CharacterVector &file_Name)
{
Environment base("package:base");
Environment env = new_env();
env["list_Data"] = list_Data;
Function save = base["save"];
CharacterVector all(1);
all[0] = "list_Data";
save(Named("list", all), Named("envir", env), Named("file", file_Name));
Rcout << "File " << file_Name << " has been saved! \\n";
}
I don't know if this will fit everyone needs (of those who is googling this question), but this way you can save individual or multiple variables:
using namespace std;
using namespace Rcpp;
using Eigen::Map;
using Eigen::MatrixXi;
using Eigen::MatrixXd;
Environment base("package:base");
Function save = base["save"];
Function saveRDS = base["saveRDS"];
MatrixXd M = MatrixXd::Identity(3,3);
NumericMatrix xx(wrap(M));
NumericMatrix xx1(wrap(M));
NumericMatrix xx2(wrap(M));
base["xx"] = xx;
base["xx1"] = xx1;
base["xx2"] = xx2;
vector<string> lst;
lst.push_back("xx");
lst.push_back("xx1");
lst.push_back("xx2");
CharacterVector all = wrap(lst);
save(Named("list", all), Named("envir", base) , Named("file","Identities.RData"));
saveRDS(xx,Named("file","Identity.RDs"));
return wrap(M);
library(inline)
library(Rcpp)
library(RcppEigen)
src <- '
#put here cpp code shown above
'
saveworkspace <- cxxfunction(signature(), src, plugin = "RcppEigen")
saveworkspace()
list.files(pattern="*.RD*")
[1] "Identity.RDs"
[2] "Identities.RData"
I'm not 100% sure if this C++ code will work in standalone library/executable.
NB: Initially I missed the comment that the solution should be independent of R, but for those who is searching for exactly the same question, but they are ok with dependency on R, this could be helpful.

Sorting based on associative arrays in D

I am trying to follow examples given in various places for D apps. Generally when learning a language I start on example apps and change them myself, purely to test stuff out.
One app that caught my eye was to count the frequency of words in a block of text passed in. As the dictionary was built up in an associative array (with the elements storing the frequency, and the keys being the words themselves), the output was not in any particular order. So, I attempted to sort the array based on examples given on the site.
Anyway, the example showed a lambda 'sort!(...)(array);' but when I attempt the code dmd won't compile it.
Here's the boiled down code:
import std.stdio;
import std.string;
void main() {
uint[string] freqs;
freqs["the"] = 51;
freqs["programming"] = 3;
freqs["hello"] = 10;
freqs["world"] = 10;
/*...You get the point...*/
//This is the actual example given, but it doesn't
//seem to work, old D version???
//string[] words = array(freqs.keys);
//This seemed to work
string[] words = freqs.keys;
//Example given for how to sort the 'words' array based on
//external criteria (i.e. the frequency of the words from
//another array). This is the line where the compilor craps out!
sort!((a,b) {return freqs[a] < freqs[b];})(words);
//Should output in frequency order now!
foreach(word; words) {
writefln("%s -> %s", word, freqs[word]);
}
}
When I try to compile this code, I get the following
s1.d(24): Error: undefined identifier sort
s1.d(24): Error: function expected before (), not sort of type int
Can anyone tell me what I need to do here?
I use DMD v2.031, I've tried installing the gdc but this only seems to support the v1 language spec. I've only started looking at dil, so I can't comment on whether this supports the code above.
Try adding this near the top of the file:
import std.algorithm;
Here's an even simpler way to get an input file (from cmdline), get lines/words and print a table of word frequencing, in descending order :
import std.algorithm;
import std.file;
import std.stdio;
import std.string;
void main(string[] args)
{
auto contents = cast(string)read(args[1]);
uint[string] freqs;
foreach(i,line; splitLines(contents))
foreach(word; split(strip(line)))
++freqs[word];
string[] words = freqs.keys;
sort!((a,b)=> freqs[a]>freqs[b])(words);
foreach(s;words)
writefln("%s\t\t%s",s,freqs[s]);
}
Well, almost 4 years later... :-)