QLibrary functions work slow on first call - c++

I'm using QLibrary to load functions from one .dll file.
I succesfully load it, succesfully resolve functions.
But when i use some function from that .dll for the first time, this function works very slow(even if it is very simple one). Next time i use it again - and the speed is just fine (immediately, as it should be).
What is the reason for such behaviour? I suspect some caсhing somewhere.
Edit 1: Code:
typedef int(*my_type)(char *t_id);
QLibrary my_lib("Path_to_lib.dll");
my_lib.load();
if(my_lib.isLoaded){
my_type func = (my_type)my_lib.resolve("_func_from_dll");
if(func){
char buf[50] = {0};
char buf2[50] = {0};
//Next line works slow
qint32 resultSlow = func(buf);
//Next line works fast
qint32 resultFast = func(buf2);
}
}

I wouldn't blame QLibrary: func simply takes long the first time it's invoked. I bet that you'll have identical results if you resolve its address using platform-specific code, e.g. dlopen and dlsym on Linux. QLibrary doesn't really do much besides wrapping the platform API. There's nothing specific to it that would make the first call slow.
There is some code smell of doing file I/O in constructors of presumably generic classes: do the users of the class know that the constructor may block on disk I/O and thus ideally shouldn't be invoked from the GUI thread? Qt makes the doing this task asynchronously fairly easy, so I'd at least try to be nice that way:
class MyClass {
QLibrary m_lib;
enum { my_func = 0, other_func = 1 };
QFuture<QVector<FunctionPointer>> m_functions;
my_type my_func() {
static my_type value;
if (Q_UNLIKELY(!value) && m_functions.size() > my_func)
value = reinterpret_cast<my_type>(m_functions.result().at(my_func));
return value;
}
public:
MyClass() {
m_lib.setFileName("Path_to_lib.dll");
m_functions = QtConcurrent::run{
m_lib.load();
if (m_lib.isLoaded()) {
QVector<QFunctionPointer> funs;
funs.push_back(m_lib.resolve("_func_from_dll"));
funs.push_back(m_lib.resolve("_func2_from_dll"));
return funs;
}
return QVector<QFunctionPointer>();
}
}
void use() {
if (my_func()) {
char buf1[50] = {0}, buf2[50] = {0};
QElapsedTimer timer;
timer.start();
auto result1 = my_func()(buf1);
qDebug() << "first call took" << timer.restart() << "ms";
auto result2 = my_func()(buf2);
qDebug() << "second call took" << timer.elapsed() << "ms";
}
}
};

Related

Invoke a callback in a boost asio GUI loop exactly once per frame

The following problem originates from https://github.com/cycfi/elements/issues/144 which is me struggling to find a way in elements GUI library to invoke a callback once per frame.
So far in every library I have seen, there is some callback/explicit loop that continuously processes user input, measures time since last frame and performs the render.
In the elements library, such loop is a platform-specific implementation detail and instead, the library-using code is given access to boost::asio::io_context object to which any callable can be posted. poll is invoked inside platform-specific event loop.
I had no problems changing code from typical waterfall update(time_since_last_frame) to posting functors that do it, however this is where the real problem begins:
Posted functors are only invoked once. The answer from the library author is "just post again".
If I post again immediately from the functor, I create an endless busy loop because as soon as one functor from the poll is completed, boost asio runs the newly posted one. This completely freezes the thread that runs the GUI because of an infinite self-reposting callback loop. The answer from the library author is "post with a timer".
If I post with a timer, I don't fix anything:
If the time is too small, it runs out before the callback finishes so the newly posted callback copy is invoked again ... which brings infinite loop again.
If the time is too large to cause an infinite loop, but small enough to fit in multiple times within one frame, it is run multiple times per frame ... which is a waste because there is no point in calculating UI/animation/input state multiple times per frame.
If the time is too large, the callback is not invoked on each frame. The application renders multiple times without processing user-generated events ... which is a waste because identical state is rendered multiple times for each logic update.
There is no way to calculate FPS because library-using code does not even know how many frames have been rendered between posted callbacks (if any).
In other words:
In a typical update+input+render loop the loop runs as fast as possible, yielding as many frames as it can (or to a specified cap thanks to sleeps). If the code is slow, it's just FPS loss.
In elements library, if the callback is too fast it is repeated multiple times per frame because registered timer may finish multiple times within one frame. If the code is too slow, it's a "deadlock" callback loop that never gets out of asio's poll.
I do not want my code to be invoked every X time (or more-than-X because of OS-scheduler). I want my code to be invoked once per frame (preferably with the time delta argument, but I can also measure it myself from previous invokation).
Is such usage of asio in the elements library a bad design? I find the "post with a timer" solution to be an antipattern. It feels to me like fixing a deadlock between 2 threads by adding a sleep in one of them and hoping they will never collide after such change - in case of elements I'm posting a timed callback and hoping it's not too fast to waste CPU but also not to slow to cause infinite timed-callback loop. The ideal time is too hard to calculate because of so many factors that can affect it, including user actions - basically a lose-lose situation.
Extra note 1: I have tried defer instead of poll, no difference.
Extra note 2: I have already created 100+ issues/PRs for the library so it's very likely that a motivating answer will end in another PR. In other words, solutions that attempt to modify library are fine too.
Extra note 3: MCVE (here without a timer, which causes almost-infinite loop until the counter finishes, during coutning the GUI thread is frozen):
#include <elements.hpp>
using namespace cycfi::elements;
bool func()
{
static int x = 0;
if (++x == 10'000'000)
return true;
return false;
}
void post_func(view& v)
{
if (!func())
v.post([&v](){ post_func(v); });
}
int main(int argc, char* argv[])
{
app _app(argc, argv);
window _win(_app.name());
_win.on_close = [&_app]() { _app.stop(); };
view view_(_win);
view_.content(box(rgba(35, 35, 37, 255)));
view_.post([&view_](){ post_func(view_); });
_app.run();
return 0;
}
So, finally found time to look at this.
In the back-end it seems that Elements already integrates with Asio. Therefore, when you post tasks to the view with they become async tasks.
You can give them a delay, so you don't have to busy loop.
Let's do a demo
Defining A Task
Let's define a task that has fake progress and a fixed deadline for completion:
#include <utility>
#include <chrono>
using namespace std::chrono_literals;
auto now = std::chrono::high_resolution_clock::now;
struct Task {
static constexpr auto deadline = 2.0s;
std::chrono::high_resolution_clock::time_point _start = now();
bool _done = false;
void reset() { *this = {}; }
auto elapsed() const { return now() - _start; } // fake progress
auto done() { return std::exchange(_done, elapsed() > deadline); }
};
How To Self-Chain?
As you noticed, this is tricky. You can stoop and just type-erase your handler:
std::function<void()> cheat;
cheat = [&cheat]() {
// do something
cheat(); // self-chain
};
However, just to humor you, let me introduce what functional programming calls the Y combinator.
#include
template<class Fun> struct ycombi {
Fun fun_;
explicit ycombi(Fun fun): fun_(std::move(fun)) {}
template<class ...Args> void operator()(Args &&...args) const {
return fun_(*this, std::forward<Args>(args)...);
}
};
With that, we can create a generic handler posting chainer:
auto chain = [&view_](auto f) {
return ycombi{ [=, &view_](auto self) {
view_.post(10ms, [=] {
if (f())
self();
});
} };
};
I opted for 10ms delay, but you don't have to. Doing no delay means "asap" which would amount to every frame, given the resources.
A Reporter Task
Let's update a progress-bar:
auto prog_bar = share(progress_bar(rbox(colors::black), rbox(pgold)));
auto make_reporter = [=, &view_](Task& t) {
static int s_reporter_id = 1;
return [=, id=s_reporter_id++, &t, &view_] {
std::clog << "reporter " << id << " task at " << (t.elapsed() / 1.0ms) << "ms " << std::endl;
prog_bar->value(t.elapsed() / Task::deadline);
view_.refresh(*prog_bar);
if (t.done()) {
std::clog << "done" << std::endl;
return false;
}
return true;
};
};
Now. let's add a button to start updating the progress bar.
auto task_btn = button("Task #1");
task_btn.on_click = [=,&task1](bool) {
if (task1.done())
task1.reset();
auto progress = chain(make_reporter(task1));
progress();
};
Let's put the button and the bar in the view and run the app:
view_.content(task_btn, prog_bar);
view_.scale(8);
_app.run();
Full Listing
Used current Elements master (a7d1348ae81f7c)
File test.cpp
#include <utility>
#include <chrono>
using namespace std::chrono_literals;
auto now = std::chrono::high_resolution_clock::now;
struct Task {
static constexpr auto deadline = 2.0s;
std::chrono::high_resolution_clock::time_point _start = now();
bool _done = false;
void reset() { *this = {}; }
auto elapsed() const { return now() - _start; } // fake progress
auto done() { return std::exchange(_done, elapsed() > deadline); }
};
#include <functional>
template<class Fun> struct ycombi {
Fun fun_;
explicit ycombi(Fun fun): fun_(std::move(fun)) {}
template<class ...Args> void operator()(Args &&...args) const {
return fun_(*this, std::forward<Args>(args)...);
}
};
#include <elements.hpp>
#include <iostream>
using namespace cycfi::elements;
constexpr auto bred = colors::red.opacity(0.4);
constexpr auto bgreen = colors::green.level(0.7).opacity(0.4);
constexpr auto bblue = colors::blue.opacity(0.4);
constexpr auto brblue = colors::royal_blue.opacity(0.4);
constexpr auto pgold = colors::gold.opacity(0.8);
int main(int argc, char* argv[]) {
app _app(argc, argv);
window _win(_app.name());
_win.on_close = [&_app]() { _app.stop(); };
view view_(_win);
Task task1;
auto chain = [&view_](auto f) {
return ycombi{ [=, &view_](auto self) {
view_.post(10ms, [=] {
if (f())
self();
});
} };
};
auto prog_bar = share(progress_bar(rbox(colors::black), rbox(pgold)));
auto make_reporter = [=, &view_](Task& t) {
static int s_reporter_id = 1;
return [=, id=s_reporter_id++, &t, &view_] {
std::clog << "reporter " << id << " task at " << (t.elapsed() / 1.0ms) << "ms " << std::endl;
prog_bar->value(t.elapsed() / Task::deadline);
view_.refresh(*prog_bar);
if (t.done()) {
std::clog << "done" << std::endl;
return false;
}
return true;
};
};
auto task_btn = button("Task #1");
task_btn.on_click = [=,&task1](bool) {
if (task1.done())
task1.reset();
auto progress = chain(make_reporter(task1));
progress();
};
view_.content(task_btn, prog_bar);
view_.scale(8);
_app.run();
}

Using QtConcurrent::map() function on QList yields segmentation fault

I am familiarizing myself with QtConcurrent library. I have a UI (MainWindow) where I run my functions to simulate a real world example of multithreading.
The QtConcurrent::map() function I am using requires some:
Iterator or a Sequence, in my case I am using a QList.
Further, it requires a MapFunctor (which supports lambdas*) but for this purpose, I am choosing to stick to a static method for testing.
What I have tried
I attempted using both map() functions (the first is left uncommented)
QtConcurrent::map(Sequence &sequence, MapFunctor function
QtConcurrent::map(Iterator begin, Iterator end, MapFunctor function)
I tried searching for a Sequence and a MapFunctor, but I could only find it in templates which did not help alot, thus I had to try and use my intuition to make sense of it.
The Code:
Somewhere inside my MainWindow.cpp
// counter variable stored in MainWindow
int i = 0;
// MapFunctor
void mapSumToQString(QPair<int, int> pair)
{
i++;
qDebug() << "Execute " << i << " = " << QString::number(pair.first, pair.second);;
}
and the code to start it all
// UI class decl
MainWindow::MainWindow(QWidget* parent)
: QMainWindow(parent)
, ui(new Ui::MainWindow)
{
ui->setupUi(this);
// Create list of integers to perform map function on (here I don't write back to the original sequence i.e. list)
QList<QPair<int, int>> intPairList = QList<QPair<int, int>>();
for (int i = 0; i < 1000; i++) {
int i1 = qrand();
int i2 = qrand();
intPairList.append(QPair<int, int>(i1, i2));
}
QFuture<void> future;
future = QtConcurrent::map(intPairList, mapSumToQString);
// future = QtConcurrent::map(intPairList.begin(), intPairList.end(), mapSumToQString);
}
Problem:
Running this snippet of code results in a SEGV here
namespace QtConcurrent {
// map kernel, works with both parallel-for and parallel-while
template <typename Iterator, typename MapFunctor>
class MapKernel : public IterateKernel<Iterator, void>
{
MapFunctor map;
public:
typedef void ReturnType;
MapKernel(Iterator begin, Iterator end, MapFunctor _map)
: IterateKernel<Iterator, void>(begin, end), map(_map)
{ }
bool runIteration(Iterator it, int, void *) override
{
map(*it); <--------SEGV line
return false;
}
//...
}
Stacktrace (copied from debugger)
1 QtConcurrent::MapKernel<QList<QPair<int, int>>::iterator, QtConcurrent::FunctionWrapper1<void, QPair<int, int>>>::runIteration qtconcurrentmapkernel.h 68 0x404ee8
2 QtConcurrent::MapKernel<QList<QPair<int, int>>::iterator, QtConcurrent::FunctionWrapper1<void, QPair<int, int>>>::runIterations qtconcurrentmapkernel.h 77 0x404f82
3 QtConcurrent::IterateKernel<QList<QPair<int, int>>::iterator, void>::forThreadFunction qtconcurrentiteratekernel.h 255 0x40466e
4 QtConcurrent::IterateKernel<QList<QPair<int, int>>::iterator, void>::threadFunction qtconcurrentiteratekernel.h 217 0x404486
5 QtConcurrent::ThreadEngineBase::run qtconcurrentthreadengine.cpp 302 0x6d881973
6 QThreadPoolThread::run qthreadpool.cpp 99 0x111b36a
7 QThreadPrivate::start(void *) *4 qthread_win.cpp 403 0x11163eb
8 KERNEL32!BaseThreadInitThunk 0x74d56359
9 ntdll!RtlGetAppContainerNamedObjectPath 0x77467c24
10 ntdll!RtlGetAppContainerNamedObjectPath 0x77467bf4
11 ??
For the record, there is another question related to this, but most certainly does not provide a usable solution.
Why do I get this SEGV, what is causing this access violation?
Honestly, some parts of your question are not clear for me. However, please take into account the followings (although you may have considered some or all of them):
the functor is highly recommended to be a static function. From your code, it seems that you may have not declared the static function properly. Please put the following in your MainWindow.h:
static void mapSumToQString(QPair<int, int> pair);
and modify the implementation as follows:
// MapFunctor
void MainWindow::mapSumToQString(QPair<int, int> pair)
{
j++;
qDebug() << "Execute " << j << " = " << QString::number(pair.first, pair.second);;
}
mapSumToQString is static, hence you cannot use non-static members of the MainWindow inside it. Hence, the i must be static as well.
static int j; //in MainWindow.h ---> I changed i to j to make it differ from your For loop variable
int MainWindow::j = 0;//in MainWindow.cpp ----> static variables have to be initialized
Modify MainWindow.cpp as follows:
// counter variable stored in MainWindow
j = 0;
QFuture<void> future;
future = QtConcurrent::map(intPairList, mapSumToQString);
future.waitForFinished();
One thing that I cannot understand is that you are converting an integer to a string with a random base?!
qDebug() << "Execute " << j << " = " << QString::number(pair.first, pair.second);
//Herein, pair.second is random, and hence QString::number's second input (base) is always a random number. Do you really want it?

how to create/pass a completion handler callback as function parameter in c++11?

There is already tonnes of samples/codes out there that shows how to pass a function as callback into function parameter in C++11. Then the callback gets called into a separate function rather than into it's original caller function.
Let's say, i have the following sample code in Objective-C
- (void)calculateSizeWithCompletionBlock:(IPVWebImageCalculateSizeBlock)completionBlock {
dispatch_async(self.ioQueue, ^{
NSUInteger fileCount = 0;
NSUInteger totalSize = 0;
// Doing some time consuming task, that plays with some local(on this function scope) vars
if (completionBlock) {
dispatch_async(dispatch_get_main_queue(), ^{
completionBlock(fileCount, totalSize);
});
}
});
}
- (void)doSomething {
NSUInteger var1 = 0;
NSUInteger var2 = 0;
[self calculateSizeWithCompletionBlock:^(NSUInteger fileCount, NSUInteger totalSize) {
// Here, do things with fileCount, totalSize, var1, var2
NSLog(#"fileCount: %lu, totalSize: %lu, var1: %lu, var2: %lu",(unsigned long)fileCount, (unsigned long)totalSize, (unsigned long)var1, (unsigned long)var2);
}];
}
The straight question is how can i rewrite the above code in C++11 ?
Where my callbacks will be called into the caller function so that it cal use caller functions local vars. I am aware of C++11's Lambda, std::function, std::bind, but not sure how to achieve that.
Any help would be appreciated.
thread_pool& get_threadpool();
void run_on_ui_thread( std::function<void()> );
std::future<void> calculateSizeWithCompletionBlock(
std::function<void(int fileCount, int totalSize)> completion
) {
get_threadpool.queue(
[completion]{
int fileCount = 0;
int totalSize = 0;
// Doing some time consuming task, that plays with some local(on this function scope) vars
if (completion) {
RunOnUIThread( [fileCount, totalSize, completion]{
completion(fileCount, totalSize);
});
}
}
);
}
void doSomething() {
int var1 = 0;
int var2 = 0;
calculateSizeWithCompletionBlock(
[var1, var2](int fileCount, int totalSize) {
// Here, do things with fileCount, totalSize, var1, var2
std::cout <<
"fileCount: " << fileCount <<
", totalSize: " << totalSize <<
", var1: " << var1 <<
", var2: " << var2 << "\n";
}
);
}
this is the rough equivalent of your code.
I do not include run_on_ui_thread and get_threadpool, because both will depend on what context your C++ program is running in.
This is the only method of thread_pool I use:
struct thread_pool {
std::future<void> queue( std::function<void()> );
};
basically, it is something that takes a function-like, and returns an object that lets you wait on that task's completion.
Unlike Objective-C, C++ runs on a large myriad of different environments. The services that the OS or whatever other environment it is running in are not fixed.
There isn't, for example, an assumption that all C++ code runs in an interactive UI message-pumping environment. The run_on_ui_thread implicitly assumes that, and would have to be written with the particular ui-thread-pump library in mind.
Some of the above code could be made marginally more efficient in C++14 with move-into-lambda. In particular,
RunOnUIThread( [fileCount, totalSize, completion=std::move(completion)]{
completion(fileCount, totalSize);
});
as in calculateSizeWithCompletionBlock we don't know how expensive completion is to copy. In C++ you have more access to objects by-value, so sometimes you have to explicitly move things around. On the plus side, this reduces the amount of allocations you'll have to do compared to objective-C.

Issue with setting speed to DifferentialWheels in Webots C++

Small community here, but hopefully somebody sees this. I'm attempting to do a pure C++ implementation of a Webots simulation for an E-puck. The C++ documentation is sorely lacking, and I can't seem to find a resolution for this issue (the C implementation is stellar, but all the function calls were changed for C++).
Essentially, I'm just trying to get a simple application up and running...I want to make the E-puck move forward. I will post the entirety of my code below...all I'm doing is instantiating a Robot entity, printing out all the IR sensor values, and attempting to move it forward.
The issue is that it does not move. I'd think that there would be some call to link the DifferentialWheel object to the E-puck (similar to the camera = getCamera("camera") call).
If I comment out my call to setSpeed, the program works perfectly (doesn't move, but prints values). If I leave it in, the simulation freezes up after a single step, once it gets to that call. I'm not exactly sure what I'm doing wrong, to be honest.
// webots
#include <webots/Robot.hpp>
#include <webots/Camera.hpp>
#include <webots/DistanceSensor.hpp>
#include <webots/DifferentialWheels.hpp>
#include <webots/LED.hpp>
// standard
#include <iostream>
using namespace webots;
#define TIME_STEP 16
class MyRobot : public Robot
{
private:
Camera *camera;
DistanceSensor *distanceSensors[8];
LED *leds[8];
DifferentialWheels *diffWheels;
public:
MyRobot() : Robot()
{
// camera
camera = getCamera("camera");
// sensors
distanceSensors[0] = getDistanceSensor("ps0");
distanceSensors[1] = getDistanceSensor("ps1");
distanceSensors[2] = getDistanceSensor("ps2");
distanceSensors[3] = getDistanceSensor("ps3");
distanceSensors[4] = getDistanceSensor("ps4");
distanceSensors[5] = getDistanceSensor("ps5");
distanceSensors[6] = getDistanceSensor("ps6");
distanceSensors[7] = getDistanceSensor("ps7");
for (unsigned int i = 0; i < 8; ++i)
distanceSensors[i]->enable(TIME_STEP);
// leds
leds[0] = getLED("led0");
leds[1] = getLED("led1");
leds[2] = getLED("led2");
leds[3] = getLED("led3");
leds[4] = getLED("led4");
leds[5] = getLED("led5");
leds[6] = getLED("led6");
leds[7] = getLED("led7");
}
virtual ~MyRobot()
{
// cleanup
}
void run()
{
double speed[2] = {20.0, 0.0};
// main loop
while (step(TIME_STEP) != -1)
{
// read sensor values
for (unsigned int i = 0; i < 8; ++i)
std::cout << " [" << distanceSensors[i]->getValue() << "]";
std::cout << std::endl;
// process data
// send actuator commands
// this call kills the simulation
// diffWheels->setSpeed(1000, 1000);
}
}
};
int main(int argc, char* argv[])
{
MyRobot *robot = new MyRobot();
robot->run();
delete robot;
return 0;
}
Now, if this were the C implementation, I would call wb_differential_wheels_set_speed(1000, 1000); However, that call isn't available in the C++ header files.
The problem causing the freeze is due to the use of the uninitialized variable diffWheels.
DifferentialWheels (as well as Robot and Supervisor) doesn't need to be initialized.
You have to change the base class of your MyRobot class to DifferentialWheels
class MyRobot : public DifferentialWheels
and then simply call
setSpeed(1000, 1000)
and not
diffWheels->setSpeed(1000, 1000)
It doesn't seem as though you've initialized diffWheels, so I would imagine you're getting a segfault from dereferencing a garbage pointer. Try putting
diffWheels = new DifferentialWheels;
in the constructor of MyRobot.

D Module Name Being Printed by Module Destructor

I've recently started learning D version 1, using the Tango library. I decided to write a small class Dout that wraps tango.io.Stdout, except it overrides opShl to better match C++'s << style output. My implementation is like so:
// dout.d
module do.Dout;
import tango.io.Stdout;
class Dout
{
public static Dout opShl(T) (T arg)
{
stdout(arg);
return new Dout;
}
public static Dout newline()
{
stdout.newline;
return new Dout;
}
}
And in main, I make a simple call to Dout.opShl(), like so.
// main.d
import do.Dout;
import tango.io.Console;
int main(char[][] argv)
{
Dout << "Hello" << " world!" << Dout.newline;
Cin.get();
return 0;
}
This works, but after pressing enter and exiting main, the text "do.Dout.Dout" is printed. After stepping through the code, I found that this text is printed at the assembly instruction:
00406B5C call __moduleDtor (40626Ch)
In which do.Dout's destructor is being called.
My question is, why is the module name being printed upon exiting main, and what can I do to stop this behaviour?
the reason "do.Dout.Dout" is printed is because Dout << Dout.newline; prints a new line (in the newline property call) and then attempts to print a human readable string of a Dout object (after it is passed to opShl!Dout())
and you only see it during destruction because then the output is flushed ;)
you should have done
__gshared Doutclass Dout = new Doutclass;
class Doutclass
{
public Dout opShl(T) (T arg)
{
static if(T == NLine){
stdout.newline;//if nl is passed do newline
}else{
stdout(arg);
}
return this;
}
struct NLine{}//this might need a dummy field to stop compiler complaints
public static NLine newline()
{
return NLine();
}
}
which is closer to the C style (Dout is an global object and doesn't get recreated each call, newline is a special struct that flushed the output besides adding a newline to it)