Launching all threads at exactly the same time in C++

Launching all threads at exactly the same time in C++ - c++

I have Rosbag file which contains messages on various topics, each topic has its own frequency. This data has been captured from a hardware device streaming data, and data from all topics would "reach" at the same time to be used for different algorithms.
I wish to simulate this using the rosbag file(think of it as every topic has associated an array of data) and it is imperative that this data streaming process start at the same time so that the data can be in sync.
I do this via launching different publishers on different threads (I am open to other approaches as well, this was the only one I could think of.), but the threads do not start at the same time, by the time thread 3 starts, thread 1 would be considerably ahead.
How may I achieve this?
Edit - I understand that launching at the exact same time is not possible, but maybe I can get away with a launch extremely close to each other as well. Is there any way to ensure this?
Edit2 - Since the main aim is to get the data stream in Sync, I was wondering about the warmup effect of the thread(suppose a thread1 starts from 3.3GHz and reaches to 4.2GHz by the time thread2 starts at 3.2). Would this have a significant effect (I can always warm them up before starting the publishing process, but I am curious whether it would have a pronounced effect)
TIA

As others have stated in the comments you cannot guarantee threads launch at exactly the same time. To address your overall goal: you're going about solving this problem the wrong way, from a ROS perspective. Instead of manually publishing data and trying to get it in sync, you should be using the rosbag api. This way you can actually guarantee messages have the same timestamp. Note that this doesn't guarantee they will be sent out at the exact same time, because they won't. You can put a message into a bag file directly like this
import rosbag
from std_msgs.msg import Int32, String
bag = rosbag.Bag('test.bag', 'w')
try:
s = String()
s.data = 'foo'
i = Int32()
i.data = 42
bag.write('chatter', s)
bag.write('numbers', i)
finally:
bag.close()
For more complex types that include a Header field simply edit the header.stamp portion to keep timestamps consistent

Related

Django: for loop through parallel process and store values and return after it finishes

I have a for loop in django. It will loop through a list and get the corresponding data from database and then do some calculation based on the database value and then append it another list
def getArrayList(request):
list_loop = [...set of values to loop through]
store_array = [...store values here from for loop]
for a in list_loop:
val_db = SomeModel.objects.filter(somefield=a).first()
result = perform calculation on val_db
store_array.append(result)
The list if 10,000 entries. If the user want this request he is ready to wait and will be informed that it will take time
I have tried joblib with backed=threading its not saving much time than normal loop
But when i try with backend=multiprocessing. it says "Apps aren't loaded yet"
I read multiprocessing is not possible in module based files.
So i am looking at celery now. I am not sure how can this be done in celery.
Can any one guide how can we faster the for loop calculation using mutliprocessing techniques available.

You're very likely looking for the wrong solution. But then again - this is pseudo code so we can't be sure.
In either case, your pseudo code is a self-fulfilling prophecy, since you run queries in a for loop. That means network latency, result set fetching, tying up database resources etc etc. This is never a good pattern, at best it's a last resort.
The simple solution is to get all values in one query:
list_values = [ ... ]
results = []
db_values = SomeModel.objects.filter(field__in=list_values)
for value in db_values:
results.append(calc(value))
If for some reason you need to loop, then to do this in celery, you would mark the function as a task (plenty of examples to find). It won't speed up anything. But you won't speed up anything - it will we be run in the background and so you render a "please wait" message and somehow you need to notify the user again that the job is done.
I'm saying somehow, because there isn't a really good integration package that I'm aware of that ties in all the components. There's django-notifications-hq, but if this is your only background task, it's a lot of extra baggage just for that - so you may want to change the notification part to "we will send you an email when the job is done", cause that's easy to achieve inside your function.
And thirdly, if this is simply creating a report, that doesn't need things like automatic retries on failure, then you can simply opt to use Django Channels and a browser-native websocket to start and report on the job (which also allows you to send email).

You could try concurrent.futures.ProcessPoolExecutor, which is a high level api for processing cpu bound tasks
def perform_calculation(item):
pass
# specify number of workers(default: number of processors on your machine)
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
res = executor.map(perform_calculation, tasks)
EDIT
In case of IO bound operation, you could make use of ThreadPoolExecutor to open a few connections in parallel, you can wrap the pool in a contextmanager which handles the cleanup work for you(close idle connections). Here is one example but handles the connection closing manually.

MessageProducer.send() is too slow for a particular topic

I've narrowed down the area of the problem I'm facing and it turned out that MessageProducer.send() is too slow when it is created for a particular topic "replyfeserver":
auto producer = context.CreateProducerFromTopic("replyfeserver");
producer->send(textMessage); //it is slow
Here the call to send() blocks for up to 55-65 seconds occasionally — almost every after 4-5 calls, and up to 5-15 seconds in general.
However, if I use some other topic, say "feserver.action.status".
auto producer = context.CreateProducerFromTopic("feserver.action.status");
producer->send(textMessage); //it is fast!
Now the call to send() returns immediately, within a fraction of second. I've tried send() with several other topics and all of them work fast enough.
What could be the possible issues with this particular topic "replyfeserver"? What are the things I should look at in order to diagnose the issue with it? I was using this topic for last 2 months.
I'm using XMS C++ API and please assume that context object is an abstraction which creates session, destination, consumer, producer and so on.
I'd also like to know if there is any difference between these two approaches:
xms::Destination dest("topic://replyfeserver");
vs
xms::Destination dest = session.createTopic("replyfeserver");
I tried with both, it doesn't make any difference — at least I didn't notice it.

There shouldn't be any difference. Personally, I like to have my topics in a hierarchy. i.e. A.B.C
I would run an MQ trace then open a PMR with IBM and give them the trace and say please explain the delay.

CQRS, multiple write nodes for a single aggregate entry, while maintaining concurrency

Let's say I have a command to edit a single entry of an article, called ArticleEditCommand.
User 1 issues an ArticleEditCommand based on V1 of the article.
User 2 issues an ArticleEditCommand based on V1 of the same
article.
If I can ensure that my nodes process the older ArticleEditCommand commands first, I can be sure that the command from User 2 will fail because User 1's command will have changed the version of the article to V2.
However, if I have two nodes process ArticleEditCommand messages concurrently, even though the commands will be taken of the queue in the correct order, I cannot guarantee that the nodes will actually process the first command before the second command, due to a spike in CPU or something similar. I could use a sql transaction to update an article where version = expectedVersion and make note of the number of records changed, but my rules are more complex, and can't live solely in SQL. I would like my entire logic of the command processing guaranteed to be concurrent between ArticleEditCommand messages that alter that same article.
I don't want to lock the queue while I process the command, because the point of having multiple command handlers is to handle commands concurrently for scalability. With that said, I don't mind these commands being processed consecutively, but only for a single instance/id of an article. I don't expect a high volume of ArticleEditCommand messages to be sent for a single article.
With the said, here is the question.
Is there a way to handle commands consecutively across multiple nodes for a single unique object (database record), but handle all other commands (distinct database records) concurrently?
Or, is this a problem I created myself because of a lack of understanding of CQRS and concurrency?
Is this a problem that message brokers typically have solved? Such as Windows Service Bus, MSMQ/NServiceBus, etc?
EDIT: I think I know how to handle this now. When User 2 issues the ArticleEditCommand, an exception should be throw to the user letting them know that there is a current pending operation on that article that must be completed before then can queue the ArticleEditCommand. That way, there is never two ArticleEditCommand messages in the queue that effect the same article.

First let me say, if you don't expect a high volume of ArticleEditCommand messages being sent, this sounds like premature optimization.
In other solutions, this problem is usually not solved by message brokers, but by optimistic locking enforced by the persistence implementation. I don't understand why a simple version field for optimistic locking that can be trivially handled by SQL contradicts complicated business logic/updates, maybe you could elaborate more?

It's actually quite simple and I did that. Basically, it looks like this ( pseudocode)
//message handler
ModelTools.TryUpdateEntity(
()=>{
var entity= _repo.Get(myId);
entity.Do(whateverCommand);
_repo.Save(entity);
}
10); //retry 10 times until giving up
//repository
long? _version;
public MyObject Get(Guid id)
{
//query data and version
_version=data.version;
return data.ToMyObject();
}
public void Save(MyObject data)
{
//update row in db where version=_version.Value
if (rowsUpdated==0)
{
//things have changed since we've retrieved the object
throw new NewerVersionExistsException();
}
}
ModelTools.TryUpdateEntity and NewerVersionExistsException are part of my CavemanTools generic purpose library (available on Nuget).
The idea is to try doing things normally, then if the object version (rowversion/timestamp in sql) has changed we'll retry the whole operation again after waiting a couple of miliseconds. And that's exactly what the TryUpdateEntity() method does. And you can tweak how much to wait between tries or how many times it should retry the operation.
If you need to notify the user, then forget about retrying, just catch the exception directly and then tell the user to refresh or something.

Partition based solution
Achieve node stickiness by routing the incoming command based on the object's ID (eg. articleId modulo your-number-of-nodes) to make sure the commands of User1 and User2 ends up on the same node, then process the commands consecutively. You can choose to process all commands one by one or if you want to parallelize the execution, partition the commands on something like ID, odd/even, by country or similar.
Grid based solution
Use an in-memory grid (eg. Hazelcast or Coherence) and use a distributed Executor Service (http://docs.hazelcast.org/docs/2.0/manual/html/ch09.html#DistributedExecution) or similar to coordinate the command processing across the cluster.
Regardless - before adding this kind of complexity, you should of course ask yourself if it's really a problem if User2's command would be accepted and User1 got a concurrency error back. As long as User1's changes are not lost and can be re-applied after a refresh of the article it might be perfectly fine.

C++ Multi-threading with multiple machines

Well my problem is the following. I have a piece of code that runs on several virtual machines, and each virtual machine has N interfaces(a thread per each). The problem itself is receiving a message on one interface and redirect it through another interface in the fastest possible manner.
What I'm doing is, when I receive a message on one interface(Unicast), calculate which interface I want to redirect it through, save all the information about the message(Datagram, and all the extra info I want) with a function I made. Then on the next iteration, the program checks if there are new messages to redirect and if it is the correct interface reading it. And so on... But this makes the program exchange information very slowly...
Is there any mechanism that can speed things up?

Somebody has already invented this particular wheel - it's called MPI
Take a look at either openMPI or MPICH

Why don't you use queuing? As the messages come in, put them on a queue and notify each processing module to pick them up from the queue.
For example:
MSG comes in
Module 1 puts it on queue
Module 2,3 get notified
Module 2 picks it up from queue and saved it in the database
In parallel, Module 3 picks it up from queue and processes it
The key is "in parallel". Since these modules are different threads, while Module 2 is saving to the db, Module 3 can massage your message.
You could use JMS or MQ or make your own queue.

It sounds like you're trying to do parallel computing across multiple "machines" (even if virtual). You may want to look at existing protocols, such as MPI - Message Passing Interface to handle this domain, as they have quite a few features that help in this type of scenario

Architectural Suggestions in a Linux App

I've done quite a bit of programming on Windows but now I have to write my first Linux app.
I need to talk to a hardware device using UDP. I have to send 60 packets a second with a size of 40 bytes. If I send less than 60 packets within 1 second, bad things will happen.
The data for the packets may take a while to generate. But if the data isn't ready to send out on the wire, it's ok to send the same data that was sent out last time.
The computer is a command-line only setup and will only run this program.
I don't know much about Linux so I was hoping to get a general idea how you might set up an app to meet these requirements.
I was hoping for an answer like:
Make 2 threads, one for sending packets and the other for the calculations.
But I'm not sure it's that simple (maybe it is). Maybe it would be more reliable to make some sort of daemon that just sent out packets from shared memory or something and then have another app do the calculations? If it is some multiple process solution, what communication mechanism would you recommend?
Is there some way I can give my app more priority than normal or something similar?
PS: The more bulletproof the better!

I've done a similar project: a simple software on an embedded Linux computer, sending out CAN messages at a regular speed.
I would go for the two threads approach. Give the sending thread a slightly higher priority, and make it send out the same data block once again if the other thread is slow in computing those blocks.
60 UDP packets per second is pretty relaxed on most systems (including embedded ones), so I would not spend much sweat on optimizing the sharing of the data between the threads and the sending of the packets.
In fact, I would say: keep it simple! I you really are the only app in the system, and you have reasonable control over that system, you have nothing to gain from a complex IPC scheme and other tricks. Keeping it simple will help you produce better code with less defects and in less time, which actually means more time for testing.

Two threads as you've suggested would work. If you have a pipe() between them, then your calculating thread can provide packets as they are generated, while your comms thread uses select() to see if there is any new data. If not, then it just sends the last one from it's cache.
I may have over simplified the issue a little...

The suggestion to use a pair of threads sounds like it will do the trick, as long as the burden of performing the calculations is not too great.
Instead of using the pipe() as suggested by Cogsy, I would be inclined to use a mutex to lock a chunk of memory that you use to contain the output of your calculation thread - using it as a transfer area between the threads.
When your calculation thread is ready to output to the buffer it would grab the mutex, write to the transfer buffer and release the mutex.
When your transmit thread was ready to send a packet it would "try" to lock the mutex.
If it gets the lock, take a copy of the transfer buffer and send it.
If it doesn't get the lock, send the last copy.
You can control the priority of your process by using "nice" and specifying a negative adjustment figure to give it higher priority. Note that you will need to do this as superuser (either as root, or using 'sudo') to be able to specify negative values.
edit: Forgot to add - this is a good tutorial on pthreads on linux. Also describes the use of mutexes.

I didn't quite understand how hard is your 60 packets / sec requirement. Does a burst of 60 packets per second fill the requirement? Or is a sharp 1/60 second interval between each packet required?
This might go a bit out of topic, but another important issue is how you configure the Linux box. I would myself use a real-time Linux kernel and disable all unneeded services. Other wise there is a real risk that your application misses a packet at some time, regardless of what architecture you choose.
Any way, two threads should work well.

I posted this answer to illustrate a quite different approach to the "obvious" one, in the hope that someone discovers it to be exactly what they need. I didn't expect it to be selected as the best answer! Treat this solution with caution, because there are potential dangers and concurrency issues...
You can use the setitimer() system call to have a SIGALRM (alarm signal) sent to your program after a specified number of milliseconds. Signals are asynchronous events (a bit like messages) that interrupt the executing program to let a signal handler run.
A set of default signal handlers are installed by the OS when your program begins, but you can install a custom signal handler using sigaction().
So all you need is a single thread; use global variables so that the signal handler can access the necessary information and send off a new packet or repeat the last packet as appropriate.
Here's an example for your benefit:
#include <stdio.h>
#include <signal.h>
#include <sys/time.h>
int ticker = 0;
void timerTick(int dummy)
{
printf("The value of ticker is: %d\n", ticker);
}
int main()
{
int i;
struct sigaction action;
struct itimerval time;
//Here is where we specify the SIGALRM handler
action.sa_handler = &timerTick;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
//Register the handler for SIGALRM
sigaction(SIGALRM, &action, NULL);
time.it_interval.tv_sec = 1; //Timing interval in seconds
time.it_interval.tv_usec = 000000; //and microseconds
time.it_value.tv_sec = 0; //Initial timer value in seconds
time.it_value.tv_usec = 1; //and microseconds
//Set off the timer
setitimer(ITIMER_REAL, &time, NULL);
//Be busy
while(1)
for(ticker = 0; ticker < 1000; ticker++)
for(i = 0; i < 60000000; i++)
;
}

Two threads would work, you will need to make sure you lock your shared data structure through so the sending thread doesn't see it half way through an update.
60 per second doesn't sound too tricky.
If you are really concerned about scheduling, set the sending thread's scheduling policy to SCHED_FIFO and mlockall() its memory. That way, nothing will be able to stop it sending a packet (they could still go out late though if other things are being sent on the wire at the same time)
There has to be some tolerance of the device - 60 packets per second is fine, but what is the device's tolerance? 20 per second? If the device will fail if it doesn't receive one, I'd send them at three times the rate it requires.

I would stay away from threads and use processes and (maybe) signals and files. Since you say "bad things" may happen if you don't send, you need to avoid lock ups and race conditions. And that is easier to do with separate processes and data saved to files.
Something along the line of one process saving data to a file, then renaming it and starting anew. And the other process picking up the current file and sending its contents once per second.
Unlike Windows, you can copy (move) over the file while it's open.

Follow long-time Unix best practices: keep it simple and modular, decouple the actions, and let the OS do as much work for you as possible.
Many of the answers here are on the right track, but I think they can be even simpler:
Use two separate processes, one to create the data and write it to stdout, and one to read data from stdin and send it. Let the basic I/O libraries handle the data stream buffering between processes, and let the OS deal with the thread management.
Build the basic sender first using a timer loop and a buffer of bogus data and get it sending to the device at the right frequency.
Next make the sender read data from stdin - you can redirect data from a file, e.g. "sender < textdata"
Build the data producer next and pipe its output to the sender, e.g. "producer | sender".
Now you have the ability to create new producers as necessary without messing with the sender side. This answer assumes one-way communication.
Keeping the answer as simple as possible will get you more success, especially if you aren't very fluent in Linux/Unix based systems yet. This is a great opportunity to learn a new system, but don't over-do it. It is easy to jump to complex answers when the tools are available, but why use a bulldozer when a simple trowel is plenty. Mutex, semaphores, shared memory, etc, are all useful and available, but add complexity that you may not really need.

I agree with the the two thread approach. I would also have two static buffers and a shared enum. The sending thread should have this logic.
loop
wait for timer
grab mutex
check enum {0, 1}
send buffer 0 or 1 based on enum
release mutex
end loop
The other thread would have this logic:
loop
check enum
choose buffer 1 or 0 based on enum (opposite of other thread)
generate data
grab mutex
flip enum
release mutex
end loop
This way the sender always has a valid buffer for the entire time it is sending data. Only the generator thread can change the buffer pointer and it can only do that if a send is not in progress. Additionally, the enum flip should never take so many cycles as to delay the higher priority sender thread for very long.

Thanks everyone, I will be using everyones advice. I wish I could select more answers than 1!
For those that are curious. I dont have source for the device, its a propietary locked down system. I havent done enough testing to see how picky the 60 packets a second is yet. Thats all their limited docs say is "60 packets a second". Due to the nature of the device though, bursts of packets will be a bad thing. I think I will be able to get away with sending more than 60 a second to make up for the occasional missed packets..

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js