Vectorized log1p() in RcppArmadillo - c++

What is the appropriate way to apply log1p() to an entire arma::vec? It seems that there are vectorized versions of log() and exp(), but not log1p(). I found that there's syntactic sugar for NumericVector, so I end up converting arma::vec to NumericVector, applying log1p(), then converting back:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test_log1p( arma::vec v )
{
// arma::vec res = log1p(v); // results in a compilation error
NumericVector v1 = log1p( wrap(v) );
arma::vec res = as<arma::vec>(v1);
return res;
}
Is there a more elegant way of doing this?

The devil is, once again, in the detail.
For starters, RcppArmadillo does not have 'Sugar' so your reasoning is flawed--you can't just look at the Rcpp Sugar functions that are working on Rcpp::NumericVector.
Then again, one can convert as you did. But you chose an expensive conversion. Look into the advanced constructors explicitly reusing memory -- no copies needed.
A much simpler and more direct (yet local) approach would just be to add a little local inlined function. That's what I would do :) Done in a few minutes.
Lastly, we have some sibbling projects that generalized Rcpp Sugar over anything that can take iterators. That is "the high road" and it could do with some fresh development. Maybe start at this repo.

Use the .transform() or .for_each() facilities available for Armadillo vectors and matrices. Example:
v.transform( [](double val) { return log1p(val); } );
or
v.for_each( [](double& val) { val = log1p(val); } ); // note the & character
You may need to use the std prefix: std::log1p() instead of log1p().

Related

Is it possible / advisable to return a range?

I'm using the ranges library to help filer data in my classes, like this:
class MyClass
{
public:
MyClass(std::vector<int> v) : vec(v) {}
std::vector<int> getEvens() const
{
auto evens = vec | ranges::views::filter([](int i) { return ! (i % 2); });
return std::vector<int>(evens.begin(), evens.end());
}
private:
std::vector<int> vec;
};
In this case, a new vector is constructed in the getEvents() function. To save on this overhead, I'm wondering if it is possible / advisable to return the range directly from the function?
class MyClass
{
public:
using RangeReturnType = ???;
MyClass(std::vector<int> v) : vec(v) {}
RangeReturnType getEvens() const
{
auto evens = vec | ranges::views::filter([](int i) { return ! (i % 2); });
// ...
return evens;
}
private:
std::vector<int> vec;
};
If it is possible, are there any lifetime considerations that I need to take into account?
I am also interested to know if it is possible / advisable to pass a range in as an argument, or to store it as a member variable. Or is the ranges library more intended for use within the scope of a single function?
This was asked in op's comment section, but I think I will respond it in the answer section:
The Ranges library seems promising, but I'm a little apprehensive about this returning auto.
Remember that even with the addition of auto, C++ is a strongly typed language. In your case, since you are returning evens, then the return type will be the same type of evens. (technically it will be the value type of evens, but evens was a value type anyways)
In fact, you probably really don't want to type out the return type manually: std::ranges::filter_view<std::ranges::ref_view<const std::vector<int>>, MyClass::getEvens() const::<decltype([](int i) {return ! (i % 2);})>> (141 characters)
As mentioned by #Caleth in the comment, in fact, this wouldn't work either as evens was a lambda defined inside the function, and the type of two different lambdas will be different even if they were basically the same, so there's literally no way of getting the full return type here.
While there might be debates on whether to use auto or not in different cases, but I believe most people would just use auto here. Plus your evens was declared with auto too, typing the type out would just make it less readable here.
So what are my options if I want to access a subset (for instance even numbers)? Are there any other approaches I should be considering, with or without the Ranges library?
Depends on how you would access the returned data and the type of the data, you might consider returning std::vector<T*>.
views are really supposed to be viewed from start to end. While you could use views::drop and views::take to limit to a single element, it doesn't provide a subscript operator (yet).
There will also be computational differences. vector need to be computed beforehand, where views are computed while iterating. So when you do:
for(auto i : myObject.getEven())
{
std::cout << i;
}
Under the hood, it is basically doing:
for(auto i : myObject.vec)
{
if(!(i % 2)) std::cout << i;
}
Depends on the amount of data, and the complexity of computations, views might be a lot faster, or about the same as the vector method. Plus you can easily apply multiple filters on the same range without iterating through the data multiple times.
In the end, you can always store the view in a vector:
std::vector<int> vec2(evens.begin(), evens.end());
So my suggestions is, if you have the ranges library, then you should use it.
If not, then vector<T>, vector<T*>, vector<index> depending on the size and copiability of T.
There's no restrictions on the usage of components of the STL in the standard. Of course, there are best practices (eg, string_view instead of string const &).
In this case, I can foresee no problems with handling the view return type directly. That said, the best practices are yet to be decided on since the standard is so new and no compiler has a complete implementation yet.
You're fine to go with the following, in my opinion:
class MyClass
{
public:
MyClass(std::vector<int> v) : vec(std::move(v)) {}
auto getEvens() const
{
return vec | ranges::views::filter([](int i) { return ! (i % 2); });
}
private:
std::vector<int> vec;
};
As you can see here, a range is just something on which you can call begin and end. Nothing more than that.
For instance, you can use the result of begin(range), which is an iterator, to traverse the range, using the ++ operator to advance it.
In general, looking back at the concept I linked above, you can use a range whenever the conext code only requires to be able to call begin and end on it.
Whether this is advisable or enough depends on what you need to do with it. Clearly, if your intention is to pass evens to a function which expects a std::vector (for instance it's a function you cannot change, and it calls .push_back on the entity we are talking about), you clearly have to make a std::vector out of filter's output, which I'd do via
auto evens = vec | ranges::views::filter(whatever) | ranges::to_vector;
but if all the function which you pass evens to does is to loop on it, then
return vec | ranges::views::filter(whatever);
is just fine.
As regards life time considerations, a view is to a range of values what a pointer is to the pointed-to entity: if the latter is destroied, the former will be dangling, and making improper use of it will be undefined behavior. This is an erroneous program:
#include <iostream>
#include <range/v3/view/filter.hpp>
#include <string>
using namespace ranges;
using namespace ranges::views;
auto f() {
// a local vector here
std::vector<std::string> vec{"zero","one","two","three","four","five"};
// return a view on the local vecotor
return vec | filter([](auto){ return true; });
} // vec is gone ---> the view returned is dangling
int main()
{
// the following throws std::bad_alloc for me
for (auto i : f()) {
std::cout << i << std::endl;
}
}
You can use ranges::any_view as a type erasure mechanism for any range or combination of ranges.
ranges::any_view<int> getEvens() const
{
return vec | ranges::views::filter([](int i) { return ! (i % 2); });
}
I cannot see any equivalent of this in the STL ranges library; please edit the answer if you can.
EDIT: The problem with ranges::any_view is that it is very slow and inefficient. See https://github.com/ericniebler/range-v3/issues/714.
It is desirable to declare a function returning a range in a header and define it in a cpp file
for compilation firewalls (compilation speed)
stop the language server from going crazy
for better factoring of the code
However, there are complications that make it not advisable:
How to get type of a view?
If defining it in a header is fine, use auto
If performance is not a issue, I would recommend ranges::any_view
Otherwise I'd say it is not advisable.

Rcpp: confusion about the base operation of assignment

Recently, I am trying to work on the Rcpp package to improve efficiency of computation in my work. However,I am not deep knowledged about C++, there are some strange behavoirs I can not understand. The below example show a simple tasks about derving weight of NumericVector, there are several questions:
When I use WAP=rev(WAP), it results in an incorrect output, I have to introduce a new variable to store the result so that I get the right output. I do not know why, should it NEVER use a 'x=f(x)' operation in C++ and Rcpp (must copy by clone instead) ?
About the CharacterVector method="eq", exactly I want to use a char or string type, however, it does not work with strncmp function (now I have to use method[0]), but I do not know how to look up the API of Rcpp functions in Rstudio?
I wonder whether there is a R-style grep, tolower function for conditions in Rcpp, I do not know which document I should refer to except for Rcpp suger, so that I can find the availiable base functions. Otherwise, I am thinking about calling R functions with Rcpp::function R_grep("grep"), but I do know whether this is the best way and recommended.
Any suggestions would be greatly appreciated.
#include <Rcpp.h>
#include <string>
#include <math.h>
#include <algorithm>
using namespace std;
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cppweight(int N, CharacterVector method="eq", const bool reverse=false, const bool test=false){
NumericVector W(N);
NumericVector WAP(N);
NumericVector revWAP(N);
//method=tolower(method); //function not exists
if(strncmp(method[0],"eq",2)==0){
W=rep(1,N)/1.0*N;//convert int to float by multiplying 1.0
WAP=W/sum(W);
Rcout<< sum(W) << "\n";
} else if(strncmp(method[0],"ln",2)==0){
W=rev(seq(1,N))/1.0*N;
WAP=W/sum(W);
}
if(reverse){
if(test){
WAP=rev(WAP);//Why this result in incorrect result
revWAP=WAP;
}else{
revWAP=rev(WAP);
}
}else{
revWAP=WAP;
}
return(round(revWAP,3));
}
/*** R
cppweight(6,"ln",reverse=F,test=F)
cppweight(6,"ln",reverse=T,test=F)
cppweight(6,"ln",reverse=T,test=T)
*/

Construct vector from vector with conversion/scale

All
suppose I got vector with data in cm, and would like to construct another vector but in mm (or mm with a shift, or ..., so it's not quite simple).
What would be good way to accomplish such task?
I wrote some code doing iterator adapter
struct scaling_iterator_adaptor {
...
};
vector v_mm{ scaling_iterator_adaptor{v_cm.begin()}, scaling_iterator_adaptor{v_cm.end()} };
Is there a better way to do such task? Conceptually different way?
If it is not essential to construct it with all the data contained already, you can use standard algorithms:
std::vector<double> v_cm{1, 3.14, 4.2};
std::vector<double> v_mm(v_cm.size());
std::transform(v_cm.cbegin(), v_cm.cend(), v_mm.begin(), [](double x){ return x * 10; });
You can use std::back_inserter if you don't want to prefill the target with zeroes.

subset a vector and sort it

I'm looking into using some C++ for simple parts of my R package using the Rcpp package. I'm a C++ novice (but keen to learn!). I've implemented a few simple cpp programs using the excellent Rcpp - in fact that package has motivated me to learn C++...
Anyway, I've got stuck with a simple problem, which if I can fix would help lots. I have a NumericVector I want to subset and then sort. The code below sorts the whole vector (and would also deal with NAs, which is what I need).
My question is, say I want to extract a part of this vector, sort and have it available for other processing - how can I do that? For example, for a vector of length 10, how do I extract and sort the elements 5:10?
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
RcppExport SEXP rollP(SEXP x) {
NumericVector A(x); // the data
A = sort_unique(A);
return A;
}
which I call from R:
sourceCpp( "rollP.cpp")
rollP(10:1)
# [1] 1 2 3 4 5 6 7 8 9 10
Here are 3 variants:
include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector rollP(NumericVector A, int start, int end) {
NumericVector B(end-start+1) ;
std::copy( A.begin() + start-1, A.begin() + end, B.begin() ) ;
return B.sort() ;
}
// [[Rcpp::export]]
NumericVector rollP2(NumericVector A, int start, int end) {
NumericVector B( A.begin() + start-1, A.begin() + end ) ;
return B.sort() ;
}
// [[Rcpp::export]]
NumericVector rollP3(NumericVector A, int start, int end) {
NumericVector B = A[seq(start-1, end-1)] ;
return B.sort() ;
}
start and end are meant as 1-based indices, as if you were passing A[start:end] from R.
You need to look into C++ indexing, iterators and the whole bit. At a minimum, you need to change your interface (vector, fromInd, toInd) and figure out what you want to return.
One interpretation of your question would be to copy the subset from [fromInd, toInd) into a new vector, sort it and return it. All that is standard C++ fare, and a good text like the excellent (and free!!) C++ Annotations will be of help. It has a pretty strong STL section too.
You can use std::slice on a std::valarray. But if you want to use std::vector specifically then you can use std::copy to extract a portion of the vector and then use std::sort to sort the extracted slice of the vector.
You can do this quite easily by using the std::sort implementation that receives two iterators:
#include <vector>
#include <cinttypes>
#include <algorithm>
template <typename SeqContainer>
SeqContainer slicesort(SeqContainer const& sq, size_t begin, size_t end) {
auto const b = std::begin(sq)+begin;
auto const e = std::begin(sq)+end;
if (b <= std::end(sq) && e <= std::end(sq)) {
SeqContainer copy(b,e);
std::sort(copy.begin(),copy.end());
return copy;
}
return SeqContainer();
}
Which can be invoked like
std::vector<int> v = {3,1,7,3,6,-2,-8,-7,-1,-4,2,3,9};
std::vector<int> v2 = slicesort(v,5,10);

How to return more than one value from a C++ function?

I am interested if I can return more then one value from a function. For example consider such a function: extended euclidean algorithm. The basic step is described by this
Input is nonnegative integers a and b;
output is a triplet (d,i,j) such that d=gcd(a,b)=i*a+j*b.
Just to clarify my question's goal I will write a short recursive code:
if (b==0) return (a,1,0)
q=a mod b;
let r be such that a=r*b+q;
(d,k,l)=extendedeuclidean(b,q);
return (d,l,k-l*r);
How does one return a triplet?
You could create a std::tuple or boost::tuple (if you don't use C++0x) from your triple pair and return that.
As has been suggested by Tony The Tiger you can use tuple. It is included in C++11 standard and new compilers already support it. It is also implemented in boost.
For my ibm xlC compiler tuple is in std::tr1 namespace (tried it for MSVC10 — it's in std namespace).
#include <cstdio>
#include <tuple>
// for MSVC
using namespace std;
// for xlC
//using namespace std::tr1;
// for boost
// using namespace boost;
typedef tuple<int, float, char> MyTuple;
MyTuple f() {
return MyTuple(1, 2.0f, '3');
}
int main() {
MyTuple t = f();
printf("%i, %f, %c\n", get<0>(t), get<1>(t), get<2>(t));
}
xlC compilation for TR1:
xlC -D__IBMCPP_TR1__ file.cpp
xlC compilation for boost:
xlC file.cpp -I/path/to/boost/root
Just create an appropriate data structure holding the three values and return that.
struct extmod_t {
int d;
int i;
int j
extmod_t(int d, int i, int j) : d(d), i(i), j(j) { }
};
…
extmod_t result = extendedeuclidean(b, q);
return extmod_t(result.d, l, k - l * r);
Either create a class that encapsulates the triplet and then return the instance of this class, or use 3 by-reference parameters.
I usually find that when I need to return two parameters from a function, it is useful to use the STL std::pair.
You could always stack the pairs inside one another (e.g. std::pair <int, std::pair <int, int> >) and help your self with typedef-s or defines to make it more accessible, but when ever I try doing this my code ends up messy and unpractical for re-use.
For more than two parameters, however, I recommend making you own specific data structure that holds the information you need (if you are returning multiple values there's a high likelihood that they are strongly logically connected somehow and that you might end up using the same structure again).
E.g. I needed a function that returned the slope of the line (1 param) and that was fine. Then I needed to expand it to return both parameters of the parametric representation of the line (y = k*x + l). Two parameters, still fine. Then I remembered that the line can be vertical and that I should add another parameter to indicate that (no parametric representation then)... At this point, it became too complicated to try and make do with existing datatypes, so I typed up my own Line structure and ended up using the same structure all over my project later.