Idiom for data aggregation and post processing in C++ - c++

A common task in programming is to process data on the fly and, when all data are collected, do some post processing. A simple example for this would be the computation of the average (and other statistics), where you can have a class like this
class Statistic {
public:
Statistic() : nr(0), sum(0.0), avg(0.0) {}
void add(double x) { sum += x; ++nr; }
void process() { avg = sum / nr; }
private:
int nr;
double sum;
double avg;
};
A disadvantage with this approach is, that we always have to remember to call the process() function after adding all the data. Since in C++ we have things like RAII, this seems like a less than ideal solution.
In Ruby, for example, we can write code like this
class Avg
attr_reader :avg
def initialize
#nr = 0
#sum = 0.0
#avg = nil
if block_given?
yield self
process
end
end
def add(x)
#nr += 1
#sum += x.to_f
end
def process
#avg = #sum / #nr
end
end
which we then can call like this
avg = Avg.new do |a|
data.each {|x| a.add(x)}
end
and the process method is automatically called when exiting the block.
Is there an idiom in C++ that can provide something similar?
For clarification: this question is not about computing the average. It is about the following pattern: feeding data to an object and then, when all the data is fed, triggering a processing step. I am interested in context-based ways to automatically trigger the processing step - or reasons why this would not be a good idea in C++.

"Idiomatic average"
I don't know Ruby but you can't translate idioms directly anyhow. I know that calculating the average is just an example, so lets see what we can get from that example...
Idiomatic way to caclulate sum, and average of elements in a container is std::accumulate:
std::vector<double> data;
// ... fill data ...
auto sum = std::accumulate( a.begin(), a.end() , 0.0);
auto avg = sum / a.size();
The building blocks are container, iterator and algorithms.
If you do not have elements to be processed readily available in a container you can still use the same algorithms, because algorithms only care about iterators. Writing your own iterators requires a bit of boilerplate. The following is just a toy example that calcualtes average of results of calling the same function a certain number of times:
#include <numeric>
template <typename F>
struct my_iter {
F f;
size_t count;
my_iter(size_t count, F f) : count(count),f(f) {}
my_iter& operator++() {
--count;
return *this;
}
auto operator*() { return f(); }
bool operator==(const my_iter& other) const { return count == other.count;}
};
int main()
{
auto f = [](){return 1.;};
auto begin = my_iter{5,f};
auto end = my_iter{0,f};
auto sum = std::accumulate( begin, end, 0.0);
auto avg = sum / 5;
std::cout << sum << " " << avg;
}
Output is:
5 1
Suppose you have a vector of paramters for a function to be called, then calling std::accumulate is straight-forward:
#include <iostream>
#include <vector>
#include <numeric>
int main()
{
auto f = [](int x){return x;};
std::vector<int> v = {1,2,5,10};
auto sum = std::accumulate( v.begin(), v.end(), 0.0, [f](int accu,int add) {
return accu + f(add);
});
auto avg = sum / 5;
std::cout << sum << " " << avg;
}
The last argument to std::accumulate specifies how the elements are added up. Instead of adding them up directly I add up the result of calling the function. Output is:
18 3.6
For your actual question
Taking your question more literally and to answer also the RAII part, here is one way you can make use of RAII with your statistic class:
struct StatisticCollector {
private:
Statistic& s;
public:
StatisticCollector(Statistic& s) : s(s) {}
~StatisticCollector() { s.process(); }
};
int main()
{
Statistic stat;
{
StatisticCollector sc{stat};
//for (...)
// stat.add( x );
} // <- destructor is called here
}
PS: Last but not least there is the alternative to just keep it simple. Your class definition is kinda broken, because all results are private. Once you fix that, it is kinda obvious that you need no RAII to make sure process gets called:
class Statistic {
public:
Statistic() : nr(0), sum(0.0), avg(0.0) {}
void add(double x) { sum += x; ++nr; }
double process() { return sum / nr; }
private:
int nr;
double sum;
};
This is the right interface in my opinion. The user cannot forget to call process because to get the result they need to call it. If the only purpose of the class is to accumulate numbers and process the result it should not encapsulate the result. The result is for the user of the class to store.

Related

How to write custom datatype (std::array filled with std::pairs) to a filestream

I am quite new to c++ and I am building a model studying certain mutations in genes. My "genes" are defined as a function of two doubles, a and b. A single gene is saved in a std::pair format. The whole genome consists of four of these genes collected in a std:array.
I perform some changes on the genes and want to write the information in a text file for analysis. The way I have currently implemented this is tedious. I have separate functions (8 in total) which collect the information like g[i].first, g[i[.second etc. for every i in the array. I feel this could be done much more efficiently.
Relevant code:
Declaration of data type:
using gene = std::pair<double, double>;
using genome = std::array<gene, 4>;
Function in which I create a genome called g:
genome Individual::init_Individual()
{
double a1, a2, a3, a4 = -1.0;
double b1, b2, b3, b4 = 0.0;
gene g1{ a1,b1 };
gene g2{ a2,b2 };
gene g3{ a3,b3 };
gene g4{ a4,b4 };
genome g{g1,g2,g3,g4};
return g;
}
Example of collect function:
double get_Genome_a1() { return g[0].first; };
Function in which I write information to a text file:
void Individual::write_Statistics(unsigned int &counter)
{
//Generate output file stream
std::ofstream ofs;
ofs.open("data.txt", std::ofstream::out | std::ofstream::app);
ofs << counter << std::setw(14) << get_Genome_a1() << std::setw(14)
<< get_Genome_a2() << std::setw(14) << get_Genome_b1() <<
std::setw(14) << get_Genome_b2() << "\n";
}
ofs.close();
}
etc, etc. So the final result of my data file in this example looks like this:
1 a1 a2 b1 b2
2 a1 a2 b1 b2
3 a1 a2 b1 b2
etc, etc.
My question:
I am currently storing the two doubles in a std::pair, which I collect in a std::array. Is this an efficient storage mechanism or can this be improved?
Is there a way to directly reference an individual element from my custom data type "genome" using only one function to write every element away in the exact same manner as I am doing now (with fourteen spaces between every element)? Something in pseudocode like: get_Genome() {return g;};, and when you call it you can specify the element like: get_Genome([0].first) which would be the first value stored in the first pair of the array, for example.
Happy to learn, any insight is appreciated.
Your storage is good. Neither pair nor array requires indirect/dynamic allocation, so this is great for cache locality.
As for referencing elements, no, not exactly like that. You could have an enum with members FIRST, SECOND then pass that as another argument to get_Genome. But, honestly, this doesn't seem to me to be worthwhile.
Overall, your approach looks great to me. My only suggestions would be:
Re-use one ofstream
…rather than opening and closing the file for every sample. You should see substantial speed improvements from that change.
You could make one in your main or whatever, and have write_Statistics take a std::ostream&, which would also be more flexible!
Initialise a bit quicker
All those declarations in init_Individual may get optimised, but why take the risk? The following is pretty expressive:
genome Individual::init_Individual()
{
const double a = -1.0;
const double b = 0.0;
return {{a, b}, {a, b}, {a, b}, {a, b}};
}
It's worth noting here that your double initialisations were wrong: you were only initialising a4 and b4; your compiler ought to have warned you about this. But, as shown, we don't need all of those anyway as they [are intended to] have the same values!
Your array looks good, however using std::pair in this situation might make it a bit more tedious. I would create 2 simple classes or structures one to represent a gene and the other to represent your genome. I'd still use array. The class might look something like this:
#include <array>
const int genesPerGenome = 4; // change this to set how many...
struct Gene {
double a_;
double b_;
Gene() = default;
Gene(double a, double b) : a_(a), b_(b) {}
};
struct Genome {
std::array<Gene, genesPerGenome> genome_;
int geneCount_{0};
Genome() = default;
void addGene(const Gene& gene) {
if ( geneCount_ >= genesPerGenome ) return;
genome_[geneCount_++] = gene; // post increment since we added one
}
};
Then I would have a stand alone function that would generate your genome as such:
void generateGenome( Genome& genome ) {
for (int i = 0; i < 4; i++) {
// When looking at your example; I notices that the genes were all
// initialized with [-1.0,0.0] so I used Gene's constructor to init
// them with those values.
Gene gene(-1.0, 0.0);
genome.addGene(gene);
}
}
Then to couple these together, I'll just print them to the console for demonstration. You can then take this approach and apply it to what ever calculations that will be done and then writing the results to a file.
#include <array>
#include <iostream>
int main() {
Genome genome;
generateGenome( genome );
// printing to console here is where you would do your calculations then write to file
for ( int i = 0; i < 4; i++ ) {
if ( i >= genome.geneCount_ ) break; // prevent accessing beyond array bounds
std::cout << (i+1) << " [" << genome.genome_[i].a_ << "," << genome.genome_[i].b_ << "]\n";
}
return 0;
}
-Output- - No calculations, only the initialized values:
1 [-1,0]
2 [-1,0]
3 [-1,0]
4 [-1,0]
Maybe this will help. From here you can write a operartor<<() function that will take an ostream reference object and a const reference to a Genome and from there you should be able to print the entire Genome to file in a single function call.
-Edit-
User t.niese left a comment with a valid point that I had overlooked. I was using a static variable in the addGene() function. This would work okay as long as you are working only with a single Genome, but if you had more than one Genome object, every time you'd call the addGene() function this value would increase and you wouldn't be able to add more than gene to each genome due to the condition of the if statement in the addGene() function.
I had modified the original code above to fix this limitation. Here I removed the static variable and I introduced two new variables; one is a const int that represents how many genes per genome as it will be used to define the size of your array as well as checking against how many genes to add to that genome. The other variable I added is a member variable to the Genome class itself that keeps track of how many genes there are per each Genome object.
Here is an example of what i meant in my comment by overloading the operator [].
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
class Genome {
public:
typedef std::pair<double, double> gene;
private:
double a1 = -1.0, a2 = -1.0, a3 = -1.0, a4 = -1.0;
double b1 = 0.0, b2 = 0.0, b3 = 0.0, b4 = 0.0;
gene g1{ a1,b1 };
gene g2{ a2,b2 };
gene g3{ a3,b3 };
gene g4{ a4,b4 };
public:
Genome() {}
const double operator[] (std::string l) const {
if (l == "a1") {return g1.first;}
else if (l == "b1") {return g1.second;}
else if (l == "a2") {return g2.first;}
else if (l == "b2") {return g2.second;}
else if (l == "a3") {return g3.first;}
else if (l == "b3") {return g3.second;}
else if (l == "a4") {return g4.first;}
else if (l == "b4") {return g4.second;}
else {
throw std::invalid_argument("not valid label");
}
}
void setvalue(std::string l, double x) {
if (l == "a1") {g1.first = x;}
else if (l == "b1") {g1.second = x;}
else if (l == "a2") {g2.first = x;}
else if (l == "b2") {g2.second = x;}
else if (l == "a3") {g3.first = x;}
else if (l == "b3") {g3.second = x;}
else if (l == "a4") {g4.first = x;}
else if (l == "b4") {g4.second = x;}
else {
throw std::invalid_argument("not valid label");
}
}
void write_Statistics(unsigned int counter) {
std::ofstream ofs;
ofs.open("data.txt", std::ofstream::out | std::ofstream::app);
ofs << counter
<< std::setw(14) << (*this)["a1"] << std::setw(14) << (*this)["a2"]
<< std::setw(14) << (*this)["b1"] << std::setw(14) << (*this)["b2"] << "\n";
ofs.close();
}
}
};
I don't know if you may find useful to access to the individual genes by a label instead of an index, but this is what this overload do.
int main(int argc, char **argv) {
Genome a = Genome();
std::cout << a["b1"] << std::endl; #this prints 0
a.setvalue("b2", 3.0);
std::cout << a["b2"] << std::endl; #this prints 3
a.write_Statistics(0);
return 0;
}

Write a function that may return either one or more values

Suppose I want to write a function that, say, returns the sum of f(x) for x in a certain range.
double func() {
double sum = 0.;
for (int i=0; i<100; i++) {
sum += f(i);
}
return sum;
}
But sometimes, in addition to the final sum I also need the partial terms, so I can do
pair<vector<double>,double> func_terms() {
double sum = 0.;
vector<double> terms(100);
for (int i=0; i<100; i++) {
terms[i] = f(i);
sum += terms[i];
}
return {terms, sum};
}
The thing is, this is code duplication. This seems very harmless in this example, but suppose the function is much larger (which it is in the situation that prompted me to ask this), and the two versions differ in just a handful of lines lines (in this example the logic is the same only the latter version stores the term in a vector before adding to sum, and returns a pair with that vector; any other logic is equivalent). Then I will have to write and maintain two nearly-identical versions of the same function, differing only in a couple lines and in the return statement. My question is if there is an idiom/pattern/best practice to deal with this kind of problem. Something that would let me share the common code between the two versions.
In short: I can write two functions and have to maintain two nearly-identical versions. Or I can just use the latter but that will be very wasteful whenever I just need the sum, which is unacceptable. What's the best pattern to deal with this?
I reckon that with C++17 one can do something like
template<bool partials>
double func(vector<double>* terms=nullptr) {
double sum = 0.;
if constexpr (partials)
*terms = vector<double>(100);
for (int i=0; i<100; i++) {
if constexpr (partials) {
(*terms)[i] = f(i);
sum += (*terms)[i];
} else {
sum += f(i);
}
}
return sum;
}
Which comes very close to what I intended, apart from using pointers (I can't use references because terms may be empty).
Your question title says "Write a function that may return either one or more values", but it's more than that; as your example shows, the function may also do a lot of different things long before a result is returned. There really is no general solution to such a broad problem.
However, for the specific case you've explained I'd like to offer a low-tech solution. You could simply implement both functions in terms of a third function and give that third function a parameter to determine whether the extra functionality is performed or not.
Here is a C++17 example, in which that third function is called func_impl and more or less hidden inside a namespace to make life easier for the client of func and func_terms:
namespace detail {
enum class FuncOption {
WithTerms,
WithoutTerms
};
std::tuple<std::vector<double>, double> func_impl(FuncOption option) {
auto const withTerms = option == FuncOption::WithTerms;
double sum = 0.;
std::vector<double> terms(withTerms ? 100 : 0);
for (int i = 0; i < 100; ++i) {
auto const result = f(i);
if (withTerms) {
terms[i] = result;
}
sum += result;
}
return std::make_tuple(terms, sum);
}
}
double func() {
using namespace detail;
return std::get<double>(func_impl(FuncOption::WithTerms));
}
std::tuple<std::vector<double>, double> func_terms() {
using namespace detail;
return func_impl(FuncOption::WithoutTerms);
}
Whether that's too low-tech is up to you and depends on your exact problem.
Here was a solution that suggested to pass an optional pointer to vector and to fill it only if present. I deleted it as other answers mention it as well and as the latter solution looks much more elegant.
You can abstract your calculation to iterators, so callers remain very simple and no code is copied:
auto make_transform_counting_iterator(int i) {
return boost::make_transform_iterator(
boost::make_counting_iterator(i),
f);
}
auto my_begin() {
return make_transform_counting_iterator(0);
}
auto my_end() {
return make_transform_counting_iterator(100);
}
double only_sum() {
return std::accumulate(my_begin(), my_end(), 0.0);
}
std::vector<double> fill_terms() {
std::vector<double> result;
std::copy(my_begin(), my_end(), std::back_inserter(result));
return result;
}
One of the simple way is to write a common function and use input parameter to do condition. Like this:
double logic(vector<double>* terms) {
double sum = 0.;
for (int i=0; i<100; i++) {
if (terms != NULL) {
terms.push_back(i);
}
sum += terms[i];
}
return sum;
}
double func() {
return logic(NULL);
}
pair<vector<double>,double> func_terms() {
vector<double> terms;
double sum = logic(&ret);
return {terms, sum};
}
this method is used in many conditions. The Logic can be very complicated and with many input options. You can use the same logic through different parameters.
But in most cases, We need not that much return values but just different input parameter.
If you are not for:
std::pair<std::vector<double>, double> func_terms() {
std::vector<double> terms(100);
for (int i = 0; i != 100; ++i) {
terms[i] = f(i);
}
return {terms, std::accumulate(terms.begin(), terms.end(), 0.)};
}
then maybe:
template <typename Accumulator>
Accumulator& func_helper(Accumulator& acc) {
for (int i=0; i<100; i++) {
acc(f(i));
}
return acc;
}
double func()
{
double sum = 0;
func_helper([&sum](double d) { sum += d; });
return sum;
}
std::pair<std::vector<double>, double> func_terms() {
double sum = 0.;
std::vector<double> terms;
func_helper([&](double d) {
sum += d;
terms.push_back(d);
});
return {terms, sum};
}
The simplest solution for this situation I think would be something like this:
double f(int x) { return x * x; }
auto terms(int count) {
auto res = vector<double>{};
generate_n(back_inserter(res), count, [i=0]() mutable {return f(i++);});
return res;
}
auto func_terms(int count) {
const auto ts = terms(count);
return make_pair(ts, accumulate(begin(ts), end(ts), 0.0));
}
auto func(int count) {
return func_terms(count).second;
}
Live version.
But this approach gives func() different performance characteristics to your original version. There are ways around this with the current STL but this highlights an area where the STL is not ideal for composability. The Ranges v3 library offers a better approach to composing algorithms for this type of problem and is in the process of standardization for a future version of C++.
In general there is often a tradeoff between composability / reuse and optimal performance. At its best C++ lets us have our cake and eat it too but this is an example where there is work underway to give standard C++ better approaches to handle this sort of situation.
I worked out an OOP solution, where a base class always compute sum and makes the current term available to derived classes, this way:
class Func
{
public:
Func() { sum = 0.; }
void func()
{
for (int i=0; i<100; i++)
{
double term = f(i);
sum += term;
useCurrentTerm(term);
}
}
double getSum() const { return sum; }
protected:
virtual void useCurrentTerm(double) {} //do nothing
private:
double f(double d){ return d * 42;}
double sum;
};
So a derived class can implement the virtual method and espose extra properties (other than sum):
class FuncWithTerms : public Func
{
public:
FuncWithTerms() { terms.reserve(100); }
std::vector<double> getTerms() const { return terms; }
protected:
void useCurrentTerm(double t) { terms.push_back(t); }
private:
std::vector<double> terms;
};
If one doesn't want to expose these classes, could fall back to functions and use them as a façade (yet two functions, but very manageable, now):
double sum_only_func()
{
Func f;
f.func();
return f.getSum();
}
std::pair<std::vector<double>, double> with_terms_func()
{
FuncWithTerms fwt;
fwt.func();
return { fwt.getTerms(), fwt.getSum() };
}

Timing in an elegant way in c++

I am interested in timing the execution time of a free function or a member function (template or not). Call TheFunc the function in question, its call being
TheFunc(/*parameters*/);
or
ReturnType ret = TheFunc(/*parameters*/);
Of course I could wrap these function calls as follows :
double duration = 0.0 ;
std::clock_t start = std::clock();
TheFunc(/*parameters*/);
duration = static_cast<double>(std::clock() - start) / static_cast<double>(CLOCKS_PER_SEC);
or
double duration = 0.0 ;
std::clock_t start = std::clock();
ReturnType ret = TheFunc(/*parameters*/);
duration = static_cast<double>(std::clock() - start) / static_cast<double>(CLOCKS_PER_SEC);
but I would like to do something more elegant than this, namely (and from now on I will stick to the void return type) as follows :
Timer thetimer ;
double duration = 0.0;
thetimer(*TheFunc)(/*parameters*/, duration);
where Timer is some timing class that I would like to design and that would allow me to write the previous code, in such way that after the exectution of the last line of previous code the double duration will contain the execution time of
TheFunc(/*parameters*/);
but I don't see how to do this, nor if the syntax/solution I aim for is optimal...
With variadic template, you may do:
template <typename F, typename ... Ts>
double Time_function(F&& f, Ts&&...args)
{
std::clock_t start = std::clock();
std::forward<F>(f)(std::forward<Ts>(args)...);
return static_cast<double>(std::clock() - start) / static_cast<double>(CLOCKS_PER_SEC);
}
I really like boost::cpu_timer::auto_cpu_timer, and when I cannot use boost I simply hack my own:
#include <cmath>
#include <string>
#include <chrono>
#include <iostream>
class AutoProfiler {
public:
AutoProfiler(std::string name)
: m_name(std::move(name)),
m_beg(std::chrono::high_resolution_clock::now()) { }
~AutoProfiler() {
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end - m_beg);
std::cout << m_name << " : " << dur.count() << " musec\n";
}
private:
std::string m_name;
std::chrono::time_point<std::chrono::high_resolution_clock> m_beg;
};
void foo(std::size_t N) {
long double x {1.234e5};
for(std::size_t k = 0; k < N; k++) {
x += std::sqrt(x);
}
}
int main() {
{
AutoProfiler p("N = 10");
foo(10);
}
{
AutoProfiler p("N = 1,000,000");
foo(1000000);
}
}
This timer works thanks to RAII. When you build the object within an scope you store the timepoint at that point in time. When you leave the scope (that is, at the corresponding }) the timer first stores the timepoint, then calculates the number of ticks (which you can convert to a human-readable duration), and finally prints it to screen.
Of course, boost::timer::auto_cpu_timer is much more elaborate than my simple implementation, but I often find my implementation more than sufficient for my purposes.
Sample run in my computer:
$ g++ -o example example.com -std=c++14 -Wall -Wextra
$ ./example
N = 10 : 0 musec
N = 1,000,000 : 10103 musec
EDIT
I really liked the implementation suggested by #Jarod42. I modified it a little bit to offer some flexibility on the desired "units" of the output.
It defaults to returning the number of elapsed microseconds (an integer, normally std::size_t), but you can request the output to be in any duration of your choice.
I think it is a more flexible approach than the one I suggested earlier because now I can do other stuff like taking the measurements and storing them in a container (as I do in the example).
Thanks to #Jarod42 for the inspiration.
#include <cmath>
#include <string>
#include <chrono>
#include <algorithm>
#include <iostream>
template<typename Duration = std::chrono::microseconds,
typename F,
typename ... Args>
typename Duration::rep profile(F&& fun, Args&&... args) {
const auto beg = std::chrono::high_resolution_clock::now();
std::forward<F>(fun)(std::forward<Args>(args)...);
const auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<Duration>(end - beg).count();
}
void foo(std::size_t N) {
long double x {1.234e5};
for(std::size_t k = 0; k < N; k++) {
x += std::sqrt(x);
}
}
int main() {
std::size_t N { 1000000 };
// profile in default mode (microseconds)
std::cout << "foo(1E6) takes " << profile(foo, N) << " microseconds" << std::endl;
// profile in custom mode (e.g, milliseconds)
std::cout << "foo(1E6) takes " << profile<std::chrono::milliseconds>(foo, N) << " milliseconds" << std::endl;
// To create an average of `M` runs we can create a vector to hold
// `M` values of the type used by the clock representation, fill
// them with the samples, and take the average
std::size_t M {100};
std::vector<typename std::chrono::milliseconds::rep> samples(M);
for(auto & sample : samples) {
sample = profile(foo, N);
}
auto avg = std::accumulate(samples.begin(), samples.end(), 0) / static_cast<long double>(M);
std::cout << "average of " << M << " runs: " << avg << " microseconds" << std::endl;
}
Output (compiled with g++ example.cpp -std=c++14 -Wall -Wextra -O3):
foo(1E6) takes 10073 microseconds
foo(1E6) takes 10 milliseconds
average of 100 runs: 10068.6 microseconds
You can do it the MatLab way. It's very old-school but simple is often good:
tic();
a = f(c);
toc(); //print to stdout, or
auto elapsed = toc(); //store in variable
tic() and toc() can work to a global variable. If that's not sufficient, you can create local variables with some macro-magic:
tic(A);
a = f(c);
toc(A);
I'm a fan of using RAII wrappers for this type of stuff.
The following example is a little verbose but it's more flexible in that it works with arbitrary scopes instead of being limited to a single function call:
class timing_context {
public:
std::map<std::string, double> timings;
};
class timer {
public:
timer(timing_context& ctx, std::string name)
: ctx(ctx),
name(name),
start(std::clock()) {}
~timer() {
ctx.timings[name] = static_cast<double>(std::clock() - start) / static_cast<double>(CLOCKS_PER_SEC);
}
timing_context& ctx;
std::string name;
std::clock_t start;
};
timing_context ctx;
int main() {
timer_total(ctx, "total");
{
timer t(ctx, "foo");
// Do foo
}
{
timer t(ctx, "bar");
// Do bar
}
// Access ctx.timings
}
The downside is that you might end up with a lot of scopes that only serve to destroy the timing object.
This might or might not satisfy your requirements as your request was a little vague but it illustrates how using RAII semantics can make for some really nice reusable and clean code. It can probably be modified to look a lot better too!

Lazy transform in C++

I have the following Python snippet that I would like to reproduce using C++:
from itertools import count, imap
source = count(1)
pipe1 = imap(lambda x: 2 * x, source)
pipe2 = imap(lambda x: x + 1, pipe1)
sink = imap(lambda x: 3 * x, pipe2)
for i in sink:
print i
I've heard of Boost Phoenix, but I couldn't find an example of a lazy transform behaving in the same way as Python's imap.
Edit: to clarify my question, the idea is not only to apply functions in sequence using a for, but rather to be able to use algorithms like std::transform on infinite generators. The way the functions are composed (in a more functional language like dialect) is also important, as the next step is function composition.
Update: thanks bradgonesurfing, David Brown, and Xeo for the amazing answers! I chose Xeo's because it's the most concise and it gets me right where I wanted to be, but David's was very important into getting the concepts through. Also, bradgonesurfing's tipped Boost::Range :).
Employing Boost.Range:
int main(){
auto map = boost::adaptors::transformed; // shorten the name
auto sink = generate(1) | map([](int x){ return 2*x; })
| map([](int x){ return x+1; })
| map([](int x){ return 3*x; });
for(auto i : sink)
std::cout << i << "\n";
}
Live example including the generate function.
I think the most idiomatic way to do this in C++ is with iterators. Here is a basic iterator class that takes an iterator and applies a function to its result:
template<class Iterator, class Function>
class LazyIterMap
{
private:
Iterator i;
Function f;
public:
LazyIterMap(Iterator i, Function f) : i(i), f(f) {}
decltype(f(*i)) operator* () { return f(*i); }
void operator++ () { ++i; }
};
template<class Iterator, class Function>
LazyIterMap<Iterator, Function> makeLazyIterMap(Iterator i, Function f)
{
return LazyIterMap<Iterator, Function>(i, f);
}
This is just a basic example and is still incomplete as it has no way to check if you've reached the end of the iterable sequence.
Here's a recreation of your example python code (also defining a simple infinite counter class).
#include <iostream>
class Counter
{
public:
Counter (int start) : value(start) {}
int operator* () { return value; }
void operator++ () { ++value; }
private:
int value;
};
int main(int argc, char const *argv[])
{
Counter source(0);
auto pipe1 = makeLazyIterMap(source, [](int n) { return 2 * n; });
auto pipe2 = makeLazyIterMap(pipe1, [](int n) { return n + 1; });
auto sink = makeLazyIterMap(pipe2, [](int n) { return 3 * n; });
for (int i = 0; i < 10; ++i, ++sink)
{
std::cout << *sink << std::endl;
}
}
Apart from the class definitions (which are just reproducing what the python library functions do), the code is about as long as the python version.
I think the boost::rangex library is what you are looking for. It should work nicely with the new c++lambda syntax.
int pipe1(int val) {
return 2*val;
}
int pipe2(int val) {
return val+1;
}
int sink(int val) {
return val*3;
}
for(int i=0; i < SOME_MAX; ++i)
{
cout << sink(pipe2(pipe1(i))) << endl;
}
I know, it's not quite what you were expecting, but it certainly evaluates at the time you want it to, although not with an iterator iterface. A very related article is this:
Component programming in D
Edit 6/Nov/12:
An alternative, still sticking to bare C++, is to use function pointers and construct your own piping for the above functions (vector of function pointers from SO q: How can I store function pointer in vector?):
typedef std::vector<int (*)(int)> funcVec;
int runPipe(funcVec funcs, int sinkVal) {
int running = sinkVal;
for(funcVec::iterator it = funcs.begin(); it != funcs.end(); ++it) {
running = (*(*it))(running); // not sure of the braces and asterisks here
}
return running;
}
This is intended to run through all the functions in a vector of such and return the resulting value. Then you can:
funcVec funcs;
funcs.pushback(&pipe1);
funcs.pushback(&pipe2);
funcs.pushback(&sink);
for(int i=0; i < SOME_MAX; ++i)
{
cout << runPipe(funcs, i) << endl;
}
Of course you could also construct a wrapper for that via a struct (I would use a closure if C++ did them...):
struct pipeWork {
funcVec funcs;
int run(int i);
};
int pipeWork::run(int i) {
//... guts as runPipe, or keep it separate and call:
return runPipe(funcs, i);
}
// later...
pipeWork kitchen;
kitchen.funcs = someFuncs;
int (*foo) = &kitchen.run();
cout << foo(5) << endl;
Or something like that. Caveat: No idea what this will do if the pointers are passed between threads.
Extra caveat: If you want to do this with varying function interfaces, you will end up having to have a load of void *(void *)(void *) functions so that they can take whatever and emit whatever, or lots of templating to fix the kind of pipe you have. I suppose ideally you'd construct different kinds of pipe for different interfaces between functions, so that a | b | c works even when they are passing different types between them. But I'm going to guess that that's largely what the Boost stuff is doing.
Depending on the simplicity of the functions :
#define pipe1(x) 2*x
#define pipe2(x) pipe1(x)+1
#define sink(x) pipe2(x)*3
int j = 1
while( ++j > 0 )
{
std::cout << sink(j) << std::endl;
}

C++ Dynamically Define Function

I am on visual c++ working on a console calculator, I am creating a way to let the user define a custom linear function. Here is where I am stumped: Once I get the users desired name of the function, the slope, and the y-intercept, I need to use that data to create a callable function that I can pass to muParser.
In muParser, you define custom functions like this:
double func(double x)
{
return 5*x + 7; // return m*x + b;
}
MyParser.DefineFun("f", func);
MyParser.SetExpr("f(9.5) - pi");
double dResult = MyParser.Eval();
How could I dynamically create a function like this based on the users input for the values 'm' and 'b' and pass that to the 'DefineFun()' method?
This is what I have so far:
void cb_SetFunc(void)
{
string FuncName, sM, sB;
double dM, dB;
bool GettingName = true;
bool GettingM = true;
bool GettingB = true;
regex NumPattern("[+-]?(?:0|[1-9]\\d*)(?:\\.\\d*)?(?:[eE][+\\-]?\\d+)?");
EchoLn(">>> First, enter the functions name. (Enter 'cancel' to abort)");
EchoLn(">>> Only letters, numbers, and underscores can be used.");
try
{
do // Get the function name
{
Echo(">>> Enter name: ");
FuncName = GetLn();
if (UserCanceled(FuncName)) return;
if (!ValidVarName(FuncName))
{
EchoLn(">>> Please only use letters, numbers, and underscores.");
continue;
}
GettingName = false;
} while (GettingName);
do // Get the function slope
{
Echo(">>> Enter slope (m): ");
sM = GetLn();
if (UserCanceled(sM)) return;
if (!regex_match(sM, NumPattern))
{
EchoLn(">>> Please enter any constant number.");
continue;
}
dM = atof(sM.c_str());
GettingM = false;
} while (GettingM);
do // Get the function y-intercept
{
Echo(">>> Enter y-intercept (b): ");
sB = GetLn();
if (UserCanceled(sB)) return;
if (!regex_match(sB, NumPattern))
{
EchoLn(">>> Please enter any constant number.");
continue;
}
dB = atof(sB.c_str());
GettingB = false;
} while (GettingB);
// ------------
// TODO: Create function from dM (slope) and
// dB (y-intercept) and pass to 'DefineFun()'
// ------------
}
catch (...)
{
ErrMsg("An unexpected error occured while trying to set the function.");
}
}
I was thinking that there isn't a way to define an individual method for each user-defined-function. Would I need to make a vector<pair<double, double>> FuncArgs; to keep track of the appropriate slopes and y-intercepts then call them dynamically from the function? How would I specify which pair to use when I pass it to DefineFun(FuncStrName, FuncMethod)?
What you need (in addition to a script language interpreter) is called a "trampoline". There is no standard solution to create those, in particular since it involves creating code at runtime.
Of course, if you accept a fixed number of trampolines, you can create them at compile time. And if they're all linear, this might be even easier:
const int N = 20; // Arbitrary
int m[N] = { 0 };
int b[N] = { 0 };
template<int I> double f(double x) { return m[I] * x + b; }
This defines a set of 20 functions f<0>...f<19> which use m[0]...m[19] respectively.
Edit:
// Helper class template to instantiate all trampoline functions.
double (*fptr_array[N])(double) = { 0 };
template<int I> struct init_fptr<int I> {
static const double (*fptr)(double) = fptr_array[I] = &f<I>;
typedef init_fptr<I-1> recurse;
};
template<> struct init_fptr<-1> { };
I would keep it simple:
#include <functional>
std::function<double(double)> f; // this is your dynamic function
int slope, yintercept; // populate from user input
f = [=](double x) -> double { return slope * x + yintercept; };
Now you can pass the object f to your parser, which can then call f(x) at its own leisure. The function object packages the captured values of slope and yintercept.
GiNaC is C++ lib which can parse and evaluate math expressions.
Generating a fixed array of functions bindable to boost function.
Someone else already said about a similar method, but since I'd taken the time to write the code, here it is anyway.
#include <boost/function.hpp>
enum {
MAX_FUNC_SLOTS = 255
};
struct FuncSlot
{
double (*f_)(double);
boost::function<double(double)> closure_;
};
FuncSlot s_func_slots_[MAX_FUNC_SLOTS];
template <int Slot>
struct FuncSlotFunc
{
static void init() {
FuncSlotFunc<Slot-1>::init();
s_func_slots_[Slot - 1].f_ = &FuncSlotFunc<Slot>::call;
}
static double call(double v) {
return s_func_slots_[Slot - 1].closure_(v);
}
};
template <> struct FuncSlotFunc<0> {
static void init() {}
};
struct LinearTransform
{
double m_;
double c_;
LinearTransform(double m, double c)
: m_(m)
, c_(c)
{}
double operator()(double v) const {
return (v * m_) + c_;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
FuncSlotFunc<MAX_FUNC_SLOTS>::init();
s_func_slots_[0].closure_ = LinearTransform(1, 0);
s_func_slots_[1].closure_ = LinearTransform(5, 1);
std::cout << s_func_slots_[0].f_(1.0) << std::endl; // should print 1
std::cout << s_func_slots_[1].f_(1.0) << std::endl; // should print 6
system("pause");
return 0;
}
So, you can get the function pointer with: s_func_slots_[xxx].f_
And set your action with s_func_slots_[xxx].closure_
Try to embed to your application some script language. Years ago I was using Tcl for similar purpose - but I do not know what is the current time best choice.
Either you can start from Tcl or search yourself for something better:
See: Adding Tcl/Tk to a C application