QSignalMapper vs. grabbing senderId Performance

QSignalMapper vs. grabbing senderId Performance - c++

For mapping multiple custom signals in one slot in qt i basically have two options: the QSignalMapper or the cast from the senderID pointer (see: http://doc.qt.digia.com/qq/qq10-signalmapper.html).
My question is: which is more efficient code?
I want to use it in a timecritical section of my program.
Should i consider using seperate Signals/Slots to optimzie the code?
Thank you in advance.

You're most likely wrong about what "time critical" means and where is your application actually spending the CPU time. You can't make any arguments without actually measuring things. At this point I believe you're micro-optimizing and wasting your time. Don't do anything optimization-related unless you can measure the starting point and see any improvements in real numbers.
If your signal-slot connection is invoked on the order of 1000 times per second, you can do pretty much anything you want - the overhead won't matter. It only starts to matter if you're in the 100k invocations/second range, and then probably you're architecting things wrongly to begin with.
A signal-slot connection without any parameters is always faster than one that sends some parameters. You can simply add a property to the sender object using the dynamic property system, and check for that property by using sender()->property("..."). Dynamic property look-up takes a bit more time than using qobject_cast<...>(sender()) and a call to a member function on a custom QObject or QWidget-derived class. But this is immaterial, because unless you can measure the difference, you don't need to worry about it. Premature optimization is truly the root of all evil.

Related

How to simulate the passage of time to debug a program?

I'm writing a C++ program that keeps track of dates whenever it's ran, keeping a record. I want to test and debug this record keeping functionality, but the records themselves can span years; letting time pass naturally to expose bugs would take... a while.
Is there an established way of simulating the passage of time, so I an debug this program more easily? I'm using C's ctime library to acquire dates.

If you want your system to be testable, you'll want to be able to replace your entire time engine.
There are a number of ways to replace a time engine.
The easiest would be to create a new time engine singleton that replicates the time engine API you are using (in your case, ctime).
Then sweep out every use of ctime based APIs and redirect them to your new time engine.
This new time engine can be configured to use ctime, or something else.
To be more reliable, you'll even want to change the binary layout and type of any data structures used that interact with your old API, audit every case where you convert things to/from void pointers or reinterpret cast them, etc.
Another approach is dependency injection, where instead of using a singleton you pass the time-engine as an argument to your program. Every part of your program that needs access to time now either stores a pointer to the time engine, or takes it as a function argument.
Now that you have control over time you can arrange for it to pass faster, jump around, etc. Depending on your system you may want to be more or less realistic in how time passes, and it could expose different real and not real bugs.
Even after you do all of this you will not be certain that your program doesn't have time bugs. You may interact with OS time in ways you don't expect. But you can at least solve some of these problems.

How do I optimize the spawning of actors in Unreal Engine 4?

Currently, I'm using a blueprint script to generate and delete around 60 actors in a radius of a flying pawn.
This creates framerate spikes, and I've read that this is quite a heavy process.
So obviously I would like to increase the performance of this process by just placing the same logic in C++.
But I would like to know a couple of things first to make sure what I'm doing is right.
1 ) Is the spawnActor function in C++ by itself - faster - than the blueprint node?
2 ) Which kind of properties of the blueprint, increase the processing time of spawning?
I know that for example enabling physics will increase the process time, but are there any more properties that I need to take into consideration?
I thank everyone taking their time reading this, and any kind of help is much appreciated :)

You can't say that C++'s SpawnActor is faster, since the Blueprint's SpawnActor finally leads to the C++ SpawnActor. But of course if you directly write C++ codes then it saves the time of calling several functions routing Blueprint node to C++ function. Hence for the spike of SpawnActor I don't think it's because of calling Blueprint SpawnActor, i.e. calling C++ won't fix it.
The SpawnActor itself does have its cost, so calling it 60 times in a row surely makes a spike. I'm not very sure how many overhead is in SpawnActor, but at least I can guess that allocating memory for these new Actors cost some time. Also if your Actor template has a lot of components then it takes more time. So a common technique is to use an Actor Pool to pre-spawn some Actors and reuse them.

C++ will be always faster than Blueprints because last are based on templates, so your compilation and bug fixing won't be easy.
Amount of characters will always affect performance because of CPU time, but it's ok. You must take into account that getting access to characters based on UE4 iterators and dedicated containers, so they are already optimised and thread safe.

Performance goes bad for range based query on cassandra wide column family when implemented using multithreaded pycassa xget() calls?

I have a typical wide column family with {rowkey--> uuid4+date and timeseries data as columns}, on which I have implemented a range based query using pycassa xget() calls. Not that I was plagued with poor performance with single threaded code, I was more like curious to understand the difference in performance when the xget() calls are made in parallel rather than sequential (from inside of a for: loop).
I have used the "threading" python library to implement the multithreaded version of the range based query and performance actually degraded considerably. Now I am aware of the effect that python GIL has on multithreaded code but is there any way I can be sure that this is infact caused by GIL? Can it be something else that is causing this ?
Thanks in advance.
Note: I am not considering the "multiprocessing" library because I can't afford to have different ConnectionPool object for each sub-process.

One thing I would try is playing around with different values for the buffer_size kwarg for xget() (the default is 1024).
If the GIL is the problem, you'll see CPU usage somewhere between ~90% and ~120% for the process. Otherwise, you may want to adjust the size of the ConnectionPool to make sure there is at least one connection available for each thread.
If all else fails, try profiling your application: http://docs.python.org/2/library/profile.html.

Creating my object takes too long. Is it bad practice to create a ton of instances at startup to speed things up later?

I have a wizard class that gets used a lot in my program. Unfortunately, the wizard takes a while to load mostly because the GUI framework is very slow. I tried to redesign the wizard class multiple times (like making the object reusable so it only gets created once) but I always hit a brick wall somewhere. So, at this point is it a huge ugly hack to just load 50 instances of this beast into a vector and just pop them off as I use them? That way the delay will only be noticed on startup and run fine thereafter. Too much of a hack? Is such a construct common?

In games, we often first allocate and construct everything needed in a game session. Then we recycle the objects if they have short life-time, trying to get 0 allocations/deallocations while the game session is running.
So no it's not really a hack, it's just good sense to make the computer do less work to get faster. One strategy is "caching", that is, in general, first compute your non-variant data, then run with the dynamic ones. Memory allocation, object constructions, etc have to be prepared before use, where possible and necessary.

Unfortunately, the wizard takes a while to load mostly because the GUI framework is very slow.
Isn't a wizard just a form-based template? Shouldn't that carry essentially no overhead? Find what's slowing the framework down (uncompressed background image?) and fix the root cause.
As a stopgap, you could create the windows in the background and not display them until the user asks. But that's obviously just moving the problem somewhere else. Even if you create them in a background thread at startup, the user's first command might ask for the last wizard and then they have to wait 50x as long… which they'll probably interpret as a crash. At the very least, anticipate and test such corner cases. Also test on a low-RAM setup.

Yes it is bad practice, it breaks RFC2549 standard.
OK ok, I was just kidding. Do whatever is best for your application.
It isn't a matter of "hacks" or "standards".
Just make sure you have proper documentation about what isn't as straightforward as it should be (such as hacks).
Trust me, if a 5k investment produced a product with lots of hacks (such as windows), then they [hacks] must really help at some point.

Options for a message passing system for a game

I'm working on an RTS game in C++ targeted at handheld hardware (Pandora). For reference, the Pandora has a single ARM processor at ~600Mhz and runs Linux. We're trying to settle on a good message passing system (both internal and external), and this is new territory for me.
It may help to give an example of a message we'd like to pass. A unit may make this call to load its models into memory:
sendMessage("model-loader", "load-model", my_model.path, model_id );
In return, the unit could expect some kind of message containing a model object for the particular model_id, which can then be passed to the graphics system. Please note that this sendMessage function is in no way final. It just reflects my current understanding of message passing systems, which is probably not correct :)
From what I can tell there are two pretty distinct choices. One is to pass messages in memory, and only pass through the network when you need to talk to an external machine. I like this idea because the overhead seems low, but the big problem here is it seems like you need to make extensive use of mutex locking on your message queues. I'd really like to avoid excess locking if possible. I've read a few ways to implement simple queues without locking (by relying on atomic int operations) but these assume there is only one reader and one writer for a queue. This doesn't seem useful to our particular case, as an object's queue will have many writers and one reader.
The other choice is to go completely over the network layer. This has some fun advantages like getting asynchronous message passing pretty much for free. Also, we gain the ability to pass messages to other machines using the exact same calls as passing locally. However, this solution rubs me the wrong way, probably because I don't fully understand it :) Would we need a socket for every object that is going to be sending/receiving messages? If so, this seems excessive. A given game will have thousands of objects. For a somewhat underpowered device like the Pandora, I fear that abusing the network like that may end up being our bottleneck. But, I haven't run any tests yet, so this is just speculation.
MPI seems to be popular for message passing but it sure feels like overkill for what we want. This code is never going to touch a cluster or need to do heavy calculation.
Any insight into what options we have for accomplishing this is much appreciated.

The network will be using locking as well. It will just be where you cannot see it, in the OS kernel.
What I would do is create your own message queue object that you can rewrite as you need to. Start simple and make it better as needed. That way you can make it use any implementation you like behind the scenes without changing the rest of your code.
Look at several possible implementations that you might like to do in the future and design your API so that you can handle them all efficiently if you decide to implement in those terms.
If you want really efficient message passing look at some of the open source L4 microkernels. Those guys put a lot of time into fast message passing.

Since this is a small platform, it might be worth timing both approaches.
However, barring some kind of big speed issue, I'd always go for the approach that is simpler to code. That is probably going to be using the network stack, as it will be the same code no matter where the recipient is, and you won't have to manually code and degug your mutual exclusions, message buffering, allocations, etc.
If you find out it is too slow, you can always recode the local stuff using memory later. But why waste the time doing that up front if you might not have to?

I agree with Zan's recommendation to pass messages in memory whenever possible.
One reason is that you can pass complex objects C++ without needing to marshal and unmarshal (serialize and de-serialize) them.
The cost of protecting your message queue with a semaphore is most likely going to be less than the cost of making networking code calls.
If you protect your message queue with some lock-free algorithm (using atomic operations as you alluded to yourself) you can avoid a lot a context switches into and out of the kernel.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js