In C++11, this:
const std::vector<int>& f() {
static const std::vector<int> x { 1, 2, 3 };
return x;
}
is thread-safe. However, is there an extra penalty for calling this function after the first time (i.e. when it is initialized) due to this extra thread-safe guarantee? I am wondering if the function will be slower than one using a global variable, because it has to acquire a mutex to check whether it's being initialized by another thread every time it is called, or something.
"The best intution to be ever had is 'I should measure this.'" So let's find out:
#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <numeric>
#include <vector>
namespace {
class timer {
using hrc = std::chrono::high_resolution_clock;
hrc::time_point start;
static hrc::time_point now() {
// Prevent memory operations from reordering across the
// time measurement. This is likely overkill, needs more
// research to determine the correct fencing.
std::atomic_thread_fence(std::memory_order_seq_cst);
auto t = hrc::now();
std::atomic_thread_fence(std::memory_order_seq_cst);
return t;
}
public:
timer() : start(now()) {}
hrc::duration elapsed() const {
return now() - start;
}
template <typename Duration>
typename Duration::rep elapsed() const {
return std::chrono::duration_cast<Duration>(elapsed()).count();
}
template <typename Rep, typename Period>
Rep elapsed() const {
return elapsed<std::chrono::duration<Rep,Period>>();
}
};
const std::vector<int>& f() {
static const auto x = std::vector<int>{ 1, 2, 3 };
return x;
}
static const auto y = std::vector<int>{ 1, 2, 3 };
const std::vector<int>& g() {
return y;
}
const unsigned long long n_iterations = 500000000;
template <typename F>
void test_one(const char* name, F f) {
f(); // First call outside the timer.
using value_type = typename std::decay<decltype(f()[0])>::type;
std::cout << name << ": " << std::flush;
auto t = timer{};
auto sum = uint64_t{};
for (auto i = n_iterations; i > 0; --i) {
const auto& vec = f();
sum += std::accumulate(begin(vec), end(vec), value_type{});
}
const auto elapsed = t.elapsed<std::chrono::milliseconds>();
std::cout << elapsed << " ms (" << sum << ")\n";
}
} // anonymous namespace
int main() {
test_one("local static", f);
test_one("global static", g);
}
Running at Coliru, the local version does 5e8 iterations in 4618 ms, the global version in 4392 ms. So yes, the local version is slower by approximately 0.452 nanoseconds per iteration. Although there's a measurable difference, it's too small to impact observed performance in most situations.
EDIT: Interesting counterpoint, switching from clang++ to g++ changes the result ordering. The g++-compiled binary runs in 4418 ms (global) vs. 4181 ms (local) so local is faster by 474 picoseconds per iteration. It does nonetheless reaffirm the conclusion that the variance between the two methods is small.
EDIT 2: Examining the generated assembly, I decided to convert from function pointers to function objects for better inlining. Timing with indirect calls through function pointers isn't really characteristic of the code in the OP. So I used this program:
#include <atomic>
#include <chrono>
#include <cstdint>
#include <iostream>
#include <numeric>
#include <vector>
namespace {
class timer {
using hrc = std::chrono::high_resolution_clock;
hrc::time_point start;
static hrc::time_point now() {
// Prevent memory operations from reordering across the
// time measurement. This is likely overkill.
std::atomic_thread_fence(std::memory_order_seq_cst);
auto t = hrc::now();
std::atomic_thread_fence(std::memory_order_seq_cst);
return t;
}
public:
timer() : start(now()) {}
hrc::duration elapsed() const {
return now() - start;
}
template <typename Duration>
typename Duration::rep elapsed() const {
return std::chrono::duration_cast<Duration>(elapsed()).count();
}
template <typename Rep, typename Period>
Rep elapsed() const {
return elapsed<std::chrono::duration<Rep,Period>>();
}
};
class f {
public:
const std::vector<int>& operator()() {
static const auto x = std::vector<int>{ 1, 2, 3 };
return x;
}
};
class g {
static const std::vector<int> x;
public:
const std::vector<int>& operator()() {
return x;
}
};
const std::vector<int> g::x{ 1, 2, 3 };
const unsigned long long n_iterations = 500000000;
template <typename F>
void test_one(const char* name, F f) {
f(); // First call outside the timer.
using value_type = typename std::decay<decltype(f()[0])>::type;
std::cout << name << ": " << std::flush;
auto t = timer{};
auto sum = uint64_t{};
for (auto i = n_iterations; i > 0; --i) {
const auto& vec = f();
sum += std::accumulate(begin(vec), end(vec), value_type{});
}
const auto elapsed = t.elapsed<std::chrono::milliseconds>();
std::cout << elapsed << " ms (" << sum << ")\n";
}
} // anonymous namespace
int main() {
test_one("local static", f());
test_one("global static", g());
}
Not surprisingly, runtimes were faster under both g++ (3803ms local, 2323ms global) and clang (4183ms local, 3253ms global). The results affirm our intuition that the global technique should be faster than the local, with deltas of 2.96 nanoseconds (g++) and 1.86 nanoseconds (clang) per iteration.
Yes, there will be a cost to check whether the object has been initialised. This would typically test an atomic Boolean variable, rather than lock a mutex.
Related
I'm trying to write a function that measures the time of execution of other functions.
It should have the same return type as the measured function.
The problem is that i'm getting a compiler error Variable has incomplete type 'void' when the return type is void.
Is there a workaround to solve this problem?
Help would be greatly appreciated, thanks!
#include <iostream>
#include <chrono>
template<class Func, typename... Parameters>
auto getTime(Func const &func, Parameters &&... args) {
auto begin = std::chrono::system_clock::now();
auto ret = func(std::forward<Parameters>(args)...);
auto end = std::chrono::system_clock::now();
std::cout << "The execution took " << std::chrono::duration<float>(end - begin).count() << " seconds.";
return ret;
}
int a() { return 0; }
void b() {}
int main()
{
getTime(a);
getTime(b);
return 0;
}
It's possible to solve this problem using specialization and an elaborate song-and-dance routine. But there's also a much simpler approach that takes advantage of return <void expression>; being allowed.
The trick is to fit it into this framework, by taking advantage of construction/destruction semantics.
#include <iostream>
#include <chrono>
struct measure_time {
std::chrono::time_point<std::chrono::system_clock> begin=
std::chrono::system_clock::now();
~measure_time()
{
auto end = std::chrono::system_clock::now();
std::cout << "The execution took "
<< std::chrono::duration<float>(end - begin).count()
<< " seconds.\n";
}
};
template<class Func, typename... Parameters>
auto getTime(Func const &func, Parameters &&... args) {
measure_time measure_it;
return func(std::forward<Parameters>(args)...);
}
int a() { return 0; }
void b() {}
int main()
{
getTime(a);
getTime(b);
return 0;
}
I try to submit a lambda function to a thread to defer its execution. However, I face some issue with lifetime of variables passed by copy.
Here is my code
#include <future>
#include "ctpl_stl.h"
ctpl::thread_pool tp(2);
class T1 //a simple class that can print a number
{
public:
void print(int number)
{
std::cout << a << " " << number << std::endl;
}
int a = 6;
};
void testPrint(T1 *t1, int number) //a function used by the lambda
{
t1->print(number);
}
template<typename F, typename... Rest>
auto submit(F && f, Rest&&... rest) ->std::future<bool>
{
return tp.push([&, rest...] (int id) {
return f(id, rest...);
});
}
void test(T1 *t1)
{
int number = 6;
submit(
[=](int threadId, int number)
{
std::cout << threadId << std::endl;
testPrint(t1, number);
return true;
},
number);
}
int main()
{
T1 *t1 = new T1();
test(t1);
using namespace std::chrono_literals;
std::this_thread::sleep_for(2000ms);//for testing purpose
return 0;
}
You can think that the code is too complex for what it does, but I have removed all unnecessary code to reproduce the error. Class T1, submit function are useless for other purposes not needed here.
When I run this code, the function testPrint, launched in a thread of the threadPool, will crash, as t1 is not a valid pointer.
I don't understand why, as t1 is passed to the lambda by copy, as the
[=]
of the lambda function specify.
So my question is: why t1 is invalid, and when the values created by copy for a lambda are created/destroyed?
I have been trying to measure the performance of a std::variant-based (pseudo-) static dispatch scheme using std::visit.
The idea is that instead of a
struct A {
virtual unsigned f() const noexcept = 0;
virtual ~A() noexcept {}
};
struct B0 : A {
virtual unsigned f() const noexcept { return 0; }
};
// more types like B0 ...
std::unique_ptr<A> a = std::make_unique<B0>();
I have
struct C0 {
unsigned f() const noexcpet { return 0; }
};
std::variant<C0,C1,C2,/*...*/> c = C0();
and I want to measure how much faster constructing a series of such objects is and how fast the dispatch is. Note that the first example (A/Bs...) requires dynamic memory on top of dynamic dispatch while the second example (Cs...) has automatic storage.
To this end I generalised B0 and C0 into type templates:
template <unsigned X>
struct B : A {
virtual unsigned f() const noexcept override { return X; }
};
template <unsigned X>
struct C {
unsigned f() const noexcept { return X; }
};
and then wrote a (maybe slightly over-engineered) test harness to fill a std::vector and read it back. The full code is attached below. I am running this with -O1 and -O3 as C++17.
Effectively it pseudo-randomly fills pre-grown vectors bs with B<...> and cs with C<...> respectively, and then either calls bs[i]->f() or std::visit([](auto const& c) { return c.f(); },cs[i]) (for more details see the attached benchmark code).
What I would have expected is that a test-instance of variant<C<0>> blows its dynamic counter part unique_ptr<A> out of the water by orders of magnitude (it does), but as I grow the variant, say to variant<C<0>,...,C<127>> the efficiency of visit starts to go down significantly to the point where the dynamic dispatch is faster (it doesn't as expected).
With -O3 (the -O1 results are fairly similar) I see the following results, which vary slightly across runs, but seem relatively stable (the times mostly stay within 10% deviation).
[0,0] time ctor virtual: 35.0315 ns
[0,0] time ctor variant: 2.9425 ns
[0,0] time call virtual: 14.0037 ns (L1)
[0,0] time call variant: 1.44748 ns (L2)
[0,1] time ctor virtual: 34.8007 ns
[0,1] time ctor variant: 2.95368 ns
[0,1] time call virtual: 19.6874 ns
[0,1] time call variant: 7.04521 ns
[0,7] time ctor virtual: 39.6325 ns
[0,7] time ctor variant: 2.97607 ns
[0,7] time call virtual: 30.7592 ns
[0,7] time call variant: 9.22505 ns (L4.1)
[0,31] time ctor virtual: 35.0002 ns
[0,31] time ctor variant: 2.95473 ns
[0,31] time call virtual: 24.3198 ns
[0,31] time call variant: 9.72678 ns (L4.2)
[0,127] time ctor virtual: 36.5918 ns
[0,127] time ctor variant: 2.95542 ns
[0,127] time call virtual: 26.701 ns (L3)
[0,127] time call variant: 9.88592 ns (L4.3)
Discussion
The small time for (L1) is explainable, I think, by caching and/or branch prediction. (L2) is absolutely as expected: If the variant is trivial, dispatch is extremely fast. All times for construction also make sense: ctor variant does not at any point malloc anything, so it is clear why it is that much faster than ctor virtual and that the time is roughly constant regardless of the number of dynamic types.
call virtual stays roughly the same as the number of dynamic types goes up (L3), which should be expected. However, why is call variant not going up (more) between (L4.1) and (L4.3).
Note: Given the limitations of template programming in my test harness, I can not increase the range much more without exploding g++ during compilation / exhausting my memory.
Anyway, given that the test function f is as simple as possible should imply that the measurement records the incurred overhead as accurately as possible.
Validation
The questions are,
how can I validate these results such that they are representative and
make sure no relevant parts have been optimised out by the compiler?
Do other benchmarks arrive at the same conclusion, namely that std::variant dispatch is always faster by about factor 2-3?
Full benchmark
// g++ -Wall -Wextra -pedantic -std=c++17 -O3 a.cpp
#include <random>
#include <memory>
#include <variant>
#include <chrono>
#include <iostream>
using chronores = std::nano;
static constexpr char const resstr[] = "ns";
namespace helper {
template <template <unsigned> typename T, unsigned X, unsigned UB, typename... Args>
struct mkvar {
using type = typename mkvar<T,X+1,UB,Args...,T<X>>::type;
};
template <template <unsigned> typename T, unsigned UB, typename... Args>
struct mkvar<T,UB,UB,Args...> {
using type = std::variant<Args...,T<UB>>;
};
template <template <unsigned> typename T, unsigned LB, unsigned UB>
using mkvar_t = typename mkvar<T,LB,UB>::type;
template <unsigned X>
struct Num {
static constexpr unsigned value = X;
using inc = Num<X+1>;
};
template <typename NumX, typename NumUB, template <unsigned> typename T, bool use_variant>
struct ctor_Num {
static constexpr auto X = NumX::value;
static constexpr auto UB = NumUB::value;
template <typename Container>
static void run(unsigned x, Container& container) {
if (x == X) {
if constexpr (use_variant) {
container.emplace_back(T<X>());
} else {
container.emplace_back(std::make_unique<T<X>>());
}
} else {
ctor_Num<typename NumX::inc,NumUB,T,use_variant>::run(x,container);
}
}
};
template <typename NumX, template <unsigned> typename T, bool use_variant>
struct ctor_Num<typename NumX::inc,NumX,T,use_variant> {
template <typename Container>
static void run(unsigned, Container&) { }
};
template <unsigned X, unsigned UB, template <unsigned> typename T, bool use_variant, typename Container>
inline void ctor(unsigned x, Container& container) {
return ctor_Num<Num<X>,Num<UB>,T,use_variant>::run(x,container);
}
struct Time {
double& time;
std::chrono::time_point<std::chrono::steady_clock> start;
Time(double& time) : time(time) {
start = std::chrono::steady_clock::now();
}
~Time() {
auto const finish = std::chrono::steady_clock::now();
time += std::chrono::duration<double,chronores>(finish-start).count();
}
};
}
template <unsigned LB, unsigned UB>
struct measure {
struct A {
virtual unsigned f() const noexcept = 0;
virtual ~A() noexcept {}
};
template <unsigned X>
struct B : A {
virtual unsigned f() const noexcept override { return X; }
};
template <unsigned X>
struct C {
unsigned f() const noexcept { return X; }
};
static void main(std::size_t const N, std::size_t const R = 10, bool warmup = false) {
if (!warmup) main(N,1,true);
using namespace helper;
std::vector<std::unique_ptr<A>> bs;
bs.reserve(N);
std::vector<mkvar_t<C,LB,UB>> cs;
cs.reserve(N);
std::uniform_int_distribution<unsigned> distr(LB,UB);
double time_ctor_virtual = 0;
double time_ctor_variant = 0;
double time_call_virtual = 0;
double time_call_variant = 0;
unsigned volatile sum = 0;
std::mt19937 mt(42); mt.discard(100);
for (std::size_t r = 0; r < R; ++r) {
bs.clear();
cs.clear();
{
Time scope(time_ctor_virtual);
for (std::size_t i = 0; i < N; ++i) {
bs.emplace_back(std::make_unique<B<UB>>());
}
}
{
Time scope(time_ctor_variant);
for (std::size_t i = 0; i < N; ++i) {
cs.emplace_back(C<UB>());
}
}
bs.clear();
cs.clear();
for (std::size_t i = 0; i < N; ++i) {
auto const rn = distr(mt);
// effectively calls bs.emplace_back(std::make_unique<B<rn>>())
ctor<LB,UB,B,false>(rn,bs);
// effectively calls cs.emplace_back(C<rn>())
ctor<LB,UB,C,true >(rn,cs);
}
{
Time scope(time_call_variant);
for (std::size_t i = 0; i < N; ++i) {
sum += std::visit([](auto const& c) { return c.f(); },cs[i]);
}
}
{
Time scope(time_call_virtual);
for (std::size_t i = 0; i < N; ++i) {
sum += bs[i]->f();
}
}
}
(void)sum;
if (!warmup) {
std::cout << "[" << LB << "," << UB << "] time ctor virtual: " << (time_ctor_virtual/N/R) << " " << resstr << "\n";
std::cout << "[" << LB << "," << UB << "] time ctor variant: " << (time_ctor_variant/N/R) << " " << resstr << "\n";
std::cout << "[" << LB << "," << UB << "] time call virtual: " << (time_call_virtual/N/R) << " " << resstr << "\n";
std::cout << "[" << LB << "," << UB << "] time call variant: " << (time_call_variant/N/R) << " " << resstr << "\n";
}
}
};
int main() {
static constexpr std::size_t N = 400000;
measure<0,0>::main(N);
std::cout << "\n";
measure<0,1>::main(N);
std::cout << "\n";
measure<0,7>::main(N);
std::cout << "\n";
measure<0,31>::main(N);
std::cout << "\n";
measure<0,127>::main(N);
std::cout << "\n";
}
In the C++ coroutines TS (2017), there is an example of an awaitable object.
template <class Rep, class Period>
auto operator co_await(std::chrono::duration<Rep, Period> d) {
struct awaiter {
std::chrono::system_clock::duration duration;
...
awaiter(std::chrono::system_clock::duration d) : duration(d){}
bool await_ready() const { return duration.count() <= 0; }
void await_resume() {}
void await_suspend(std::experimental::coroutine_handle<> h){...}
};
return awaiter{d};
}
using namespace std::chrono;
my_future<int> h();
my_future<void> g() {
std::cout << "just about go to sleep...\n";
co_await 10ms;
std::cout << "resumed\n";
co_await h();
}
Like a typical StackOverflow Question, it will not compile. After cursing quietly for a while, I decided to turn it into a [MCVE] -- for learning. The code below compiles and runs on VC++17 with /await enabled. I think it probably does approximately what the TS authors intended. Alas, it employs a detached thread. It is not easy to see how that thread could be harvested via join or future::get or signal_all_at_thread_exit() or ...
For example, join cannot be added to a destructor for awaiter. In the spawned thread, h.resume() causes the awaiter object to be moved into the spawned thread and its (default) constructor called there. So the destructor is called in a different thread than the constructor.
The question, aside from "Is this what the TS intended?", is "Can this be improved, in a reasonably economical way, to tend to the dangling thread?" (And if so how?)
#include <experimental/coroutine>
#include <future>
#include <thread>
namespace xtd = std::experimental;
template <class Rep, class Period>
auto operator co_await(std::chrono::duration<Rep, Period> dur) {
struct awaiter {
using clock = std::chrono::high_resolution_clock;
clock::time_point resume_time;
awaiter(clock::duration dur) : resume_time(clock::now()+dur) {}
bool await_ready() { return resume_time <= clock::now(); }
void await_suspend(xtd::coroutine_handle<> h) {
std::thread([=]() {
std::this_thread::sleep_until(resume_time);
h.resume(); // destructs the obj, which has been std::move()'d
}).detach(); // Detach scares me.
}
void await_resume() {}
};
return awaiter{ dur };
}
using namespace std::chrono;
std::future<int> g() {
co_await 4000ms;
co_return 86;
}
template<typename R>
bool is_ready(std::future<R> const& f)
{ return f.wait_for(std::chrono::seconds(0)) == std::future_status::ready; }
int main() {
using std::cout;
auto gg = g();
cout << "Doing stuff in main, while coroutine is suspended...\n";
std::this_thread::sleep_for(1000ms);
if (!is_ready(gg)) {
cout << "La lala, lala, lala...\n";
std::this_thread::sleep_for(1500ms);
}
cout << "Whew! Done. Getting co_return now...\n";
auto ret = gg.get();
cout << "coroutine resumed and co_returned " << ret << '\n';
system("pause");
return ret;
}
Can this be improved, in a reasonably economical way, to tend to the dangling thread?
You can use "thread pool" implementation, instead of on-demand detached thread.
Here is toy example:
https://gist.github.com/yohhoy/a5ec6d4aeeb4c60d3e4f3adfd1df9ebf
I'd like to wrap the result of a std::bind() or a lambda in a helper function that tracks the execution time of calls to the function. I'd like a generalized solution that will work with any number of parameters (and class methods) and is c++11 compatible.
My intent is to take the wrapped function and pass it to a boost::signals2::signal so the resulting function object needs to be identical in signature to the original function.
I'm basically looking for some magical class or function Wrapper that works like this:
std::function<void(int)> f = [](int x) {
std::cerr << x << std::endl;
};
boost::signals2::signal<void(int)> x_signal;
x_signal.connect(Wrapper<void(int)>(f));
x_signal(42);
that would time how long it took to print 42.
Thanks!
If it's about performance, I strongly suggest not to doubly wrap functions.
You can do without those:
template <typename Caption, typename F>
auto timed(Caption const& task, F&& f) {
return [f=std::forward<F>(f), task](auto&&... args) {
using namespace std::chrono;
struct measure {
high_resolution_clock::time_point start;
Caption const& task;
~measure() { std::cout << " -- (" << task << " completed in " << duration_cast<microseconds>(high_resolution_clock::now() - start).count() << "µs)\n"; }
} timing { high_resolution_clock::now(), task };
return f(std::forward<decltype(args)>(args)...);
};
}
See live demo:
Live On Coliru
#include <chrono>
#include <iostream>
template <typename Caption, typename F>
auto timed(Caption const& task, F&& f) {
return [f=std::forward<F>(f), task](auto&&... args) {
using namespace std::chrono;
struct measure {
high_resolution_clock::time_point start;
Caption const& task;
~measure() { std::cout << " -- (" << task << " completed in " << duration_cast<microseconds>(high_resolution_clock::now() - start).count() << "µs)\n"; }
} timing { high_resolution_clock::now(), task };
return f(std::forward<decltype(args)>(args)...);
};
}
#include <thread>
int main() {
using namespace std;
auto f = timed("IO", [] { cout << "hello world\n"; return 42; });
auto g = timed("Sleep", [](int i) { this_thread::sleep_for(chrono::seconds(i)); });
g(1);
f();
g(2);
std::function<int()> f_wrapped = f;
return f_wrapped();
}
Prints (e.g.):
-- (Sleep completed in 1000188µs)
hello world
-- (IO completed in 2µs)
-- (Sleep completed in 2000126µs)
hello world
-- (IO completed in 1µs)
exitcode: 42
UPDATE: c++11 version
Live On Coliru
#include <chrono>
#include <iostream>
namespace detail {
template <typename F>
struct timed_impl {
std::string _caption;
F _f;
timed_impl(std::string const& task, F f)
: _caption(task), _f(std::move(f)) { }
template <typename... Args>
auto operator()(Args&&... args) const -> decltype(_f(std::forward<Args>(args)...))
{
using namespace std::chrono;
struct measure {
high_resolution_clock::time_point start;
std::string const& task;
~measure() { std::cout << " -- (" << task << " completed in " << duration_cast<microseconds>(high_resolution_clock::now() - start).count() << "µs)\n"; }
} timing { high_resolution_clock::now(), _caption };
return _f(std::forward<decltype(args)>(args)...);
}
};
}
template <typename F>
detail::timed_impl<F> timed(std::string const& task, F&& f) {
return { task, std::forward<F>(f) };
}
#include <thread>
int main() {
using namespace std;
auto f = timed("IO", [] { cout << "hello world\n"; return 42; });
auto g = timed("Sleep", [](int i) { this_thread::sleep_for(chrono::seconds(i)); });
g(1);
f();
g(2);
std::function<int()> f_wrapped = f;
return f_wrapped();
}
I believe what you want to do can be solved with variadic templates.
http://www.cplusplus.com/articles/EhvU7k9E/
You can basically "forward" the argument list from your outer std::function to the inner.
EDIT:
Below, I added a minimal working example using the variadic template concept. In main(...), a lambda function is wrapped into another std::function object, using the specified parameters for the inner lambda function. This is done by passing the function to measure as a parameter to the templated function measureTimeWrapper. It returns a function with the same signature as the function passed in (given that you properly define that lambda's parameter list in the template argument of measureTimeWrapper).
The function who's running time is measured just sits here and waits for a number of milliseconds defined by its parameter. Other than that, it is not at all concerned with time measuring. That is done by the wrapper function.
Note that the return value of the inner function is lost this way; you might want to change the way values are returned (maybe as a struct, containing the measured time and the real return value) if you want to keep it.
Remember to compile your code with -std=c++11 at least.
#include <iostream>
#include <cstdlib>
#include <functional>
#include <chrono>
#include <thread>
template<typename T, typename... Args>
std::function<double(Args...)> measureTimeWrapper(std::function<T> fncFunctionToMeasure) {
return [fncFunctionToMeasure](Args... args) -> double {
auto tsStart = std::chrono::steady_clock::now();
fncFunctionToMeasure(args...);
auto tsEnd = std::chrono::steady_clock::now();
std::chrono::duration<double> durTimeTaken = tsEnd - tsStart;
return durTimeTaken.count();
};
}
int main(int argc, char** argv) {
std::function<double(int)> fncMeasured = measureTimeWrapper<void(int), int>([](int nParameter) {
std::cout << "Process function running" << std::endl;
std::chrono::milliseconds tsTime(nParameter); // Milliseconds
std::this_thread::sleep_for(tsTime);
});
std::cout << "Time taken: " << fncMeasured(500) << " sec" << std::endl;
return EXIT_SUCCESS;
}
#include <iostream>
#include <functional>
template<typename Signature>
std::function<Signature> Wrapper(std::function<Signature> func)
{
return [func](auto... args)
{
std::cout << "function tracked" << std::endl;
return func(args...);
};
}
int lol(const std::string& str)
{
std::cout << str << std::endl;
return 42;
}
int main(void)
{
auto wrapped = Wrapper<int(const std::string&)>(lol);
std::cout << wrapped("Hello") << std::endl;
}
Replace the "function tracked"part with whatever tracking logic you want (timing, cache, etc.)
This requires c++14 though