Request for suggestions on doing IPC/event capture - c++

I have a simple python server script which forks off multiple instances (say N) of C++ program. The C++ program generates some events that need to be captured.
The events are currently being captured in a log file (1 logfile per forked process). In addition, i need to periodically (T minutes) get the rate at which the events are being produced across all child processes to either the python server or some other program listening for these events (still not sure). Based on rate of these events, some "re-action" may be taken by the server (say reduce the number of forked instances)
Some pointers i have briefly looked at:
grep log files - go through the running process log files (.running), filter those entries generated in the last T minutes, analyse the data and report
socket ipc - add code to c++ program to send the events to some server program which analyses the data after T minutes, reports and starts all over again
redis/memcache (not sure completely) - add code to c++ program to use some distributed store to capture all the generated data, analyses the data after T minutes, reports and starts all over again
Please let me know your suggestions.
Thanks

if time is not of the essence (T minutes sounds like it is long compared to whatever events are happening in the C++ programs that are kicked off) then dont make things any more complicated than they need to be. forget IPC (sockets, shared mem, etc), just have each C++ program log what you need to know about time/performance and let the python script check logs every T minutes that you need the data. dont waste time overcomplicating something that you can do in a simple manner

As a alternative to your socket IPC suggestion, how about 0mq. It's a library (in C with python bindings available) that can do message transfer on an inter-thread, inter-process or inter-machine level. Pretty simple to get going, and pretty quick.
I'm not affiliated with it. I'm just evaluating it for other uses and thought it might be a fit for you as well.

Related

Dead-simple POSIX C++ IPC

I am interested in a c++ linux-only (possible relaxed to posix-only) IPC solution that would behave as follows; a program called 'calculator' is started, and can listen to messages. Calculator would have a loop that periodically checks for message strings and then acts on the based on their content.
Another program called 'send_msg' can send messages to its pid (ideally a hostname/pid, through tcp or udp).
$ calculator &
// awhile later
$ send_msg <calculator pid> show calculations
Calc1: 52% complete
Calc2: 21% complete
$ send_msg <calculator pid> alter Calc2 <numeric parameters>
Ok! I'm restarting my calculations!
$
I am very versed in c++, but know nothing about network programming and am not interested in spending much time right now to learn it. Is there an easy-to-use c++ package that does the above? I would rather not have to choose things like port numbers, file locations, etc.
I think you may like zeromq (spelled 0mq), or the forked crossroadsio, as they abstract away a lot of the handholding allowing you simply pub/sub, as well as many other patterns. 0mq has (had?) a bunch of examples starting with simple ping-pong.
The setup you are requesting is anything but simple, I think.
You'd probably do best with either a Unix-domain socket or a TCP socket (port number) for communication between the background calculator and the front end. So, for example, you might run:
calculator -p 3456 &
The calculator is then listening on port 3456. Your send_msg program can then be used to make the calculator do things:
send_msg -p 3456 show calculations
When the calculator receives the message, it acts according to the orders, sending the answer back to the send_msg progam on the socket, which then echoes it to its standard output.
Meanwhile, you have a calculator that may need to be multi-threaded. It also needs to be able to determine how much work is involved in each calculation, so that it can report on the progress of each calculation. Neither you nor I have specified how the calculation is set up, but it might be:
send_msg -p 3456 new calc.file
to indicate that the calculator should start a new calculation, reading the problem from the file calc.file. It might echo back:
Calc1: ETC = 3:15
where, by some more or less devious means, it has determined that the Estimated Time to Completion (ETC) is 3 minutes, 15 seconds. You can set up the second calculation in a similar way. To handle this, you need a controller thread that is listening for connections from send_msg. When it gets told to create a new job, it starts a new thread (or process) to do the calculation. There has to be some agreed mechanism between the master thread in the calculator and the actual calculating threads. This might be as simple as a location where each thread writes its progress and the master reads. But the calculation threads need to keep track of how much work they've done, how much there is left to do, and whether the estimates need to be changed.
Now, I might be making things too complicated, but the interface you showed suggests that something similar to that might be necessary. If you single-thread the calculator, it has to do some sort of round-robin scheduling of its work on each calculation you set it, as well as periodically checking to see whether the send_msg program has sent a new message.
Have a look at RCF - it's native C++ and has publish/subscribe support which should make this pretty easy.

How to smooth restart a c++ program without shut down the running program?

I have a server program which should run full time a day. If I want to change some parameters of it, Is there any way rather than shut down then restart way?
There are quite a few ways of doing this, including, but almost certainly not limited to:
You can maintain the parameters in a separate file so that the program will periodically check that file and update its internal information.
Similar to (1) but you can send some sort of signal to the application to get it to immediately re-read the file.
You can do either (1) or (2) but using shared memory rather than a configuration file.
You can have your program sit at the server end of an IPC conversation, so that a client can open up a connection to it to provide new parameters. Anything from a simple message queue to a full-blown HTTP server and associated pages.
Of course, all of these tend to need a fair amount of work in your program to get it to look for the new information.
You should take that into account when making your decision. By far the quickest solution to implement is to just (cleanly) kill off the process at something like 11:55pm then immediately restart it. It's simpler because your code probably already has the ability to load the information on startup, so this could be a simple cron one-liner.
Some people speak of laziness as a bad thing but that's not always the case :-)
If the Server maintains many alive connections from clients, restarting the server process is the last way you should consider. Except reloading configuration files, inserting a proxy process between clients and server can be another way.
The proxy process is Responsible for 2 things.
a. Maintaining the connection from clients and forwarding packets to Server for handling.
b. Judging weather the current server process(Server A) is alive and if it not, switching to another server(Server B) automatically.
Then you can change parameters by restart server without worrying about interrupting clients since there is always two(or more) servers running.

How to check if an application is in waiting

I have two applications running on my machine. One is supposed to hand in the work and other is supposed to do the work. How can I make sure that the first application/process is in wait state. I can verify via the resources its consuming, but that does not guarantee so. What tools should I use?
Your 2 applications shoud communicate. There are a lot of ways to do that:
Send messages through sockets. This way the 2 processes can run on different machines if you use normal network sockets instead of local ones.
If you are using C you can use semaphores with semget/semop/semctl. There should be interfaces for that in other languages.
Named pipes block until there is both a read and a write operation in progress. You can use that for synchronisation.
Signals are also good for this. In C it is called sendmsg/recvmsg.
DBUS can also be used and has bindings for variuos languages.
Update: If you can't modify the processing application then it is harder. You have to rely on some signs that indicate the progress. (I am assuming you processing application reads a file, does some processing then writes the result to an output file.) Do you know the final size the result should be? If so you need to check the size repeatedly (or whenever it changes).
If you don't know the size but you know how the processing works you may be able to use that. For example the processing is done when the output file is closed. You can use strace to see all the system calls including the close. You can replace the close() function with the LD_PRELOAD environment variable (on windows you have to replace dlls). This way you can sort of modify the processing program without actually recompiling or even having access to its source.
you can use named pipes - the first app will read from it but it will be blank and hence it will keep waiting (blocked). The second app will write into it when it wants the first one to continue.
Nothing can guarantee that your application is in waiting state. You have to pass it some work and get back a response. It might be transactions or not - application can confirm that it got the message to process before it starts to process it or after it was processed (successfully or not). If it does not wait, passing a piece of work should fail. Whether when trying to write to a TCP/IP socket or other means, or if timeout occurs. This depends on implementation, what kind of transport you are using and other requirements.
There is actually a way of figuring out if the process (thread) is in blocking state and waiting for data on a socket (or other source), but that means that client should be on the same computer and have access privileges required to do that, but that makes no sense other than debugging, which you can do using any debugger anyway.
Overall, the idea of making sure that application is waiting for data before trying to pass it that data smells bad. Not to mention the racing condition - what if you checked and it was OK, and when you actually tried to send the data, you found out that application is not waiting at that time (even if that is microseconds).

Interprocess Communication in C++

I have a simple c++ application that generates reports on the back end of my web app (simple LAMP setup). The problem is the back end loads a data file that takes about 1.5GB in memory. This won't scale very well if multiple users are running it simultaneously, so my thought is to split into several programs :
Program A is the main executable that is always running on the server, and always has the data loaded, and can actually run reports.
Program B is spawned from php, and makes a simple request to program A to get the info it needs, and returns the data.
So my questions are these:
What is a good mechanism for B to ask A to do something?
How should it work when A has nothing to do? I don't really want to be polling for tasks or otherwise spinning my tires.
Use a named mutex/event, basically what this does is allows one thread (process A in your case) to sit there hanging out waiting. Then process B comes along, needing something done, and signals the mutex/event this wakes up process A, and you proceed.
If you are on Microsoft :
Mutex, Event
Ipc on linux works differently, but has the same capability:
Linux Stuff
Or alternatively, for the c++ portion you can use one of the boost IPC libraries, which are multi-platform. I'm not sure what PHP has available, but it will no doubt have something equivalent.
Use TCP sockets running on localhost.
Make the C++ application a daemon.
The PHP front-end creates a persistent connection to the daemon. pfsockopen
When a request is made, the PHP sends a request to the daemon which then processes and sends it all back. PHP Sockets C++ Sockets
EDIT
Added some links for reference. I might have some really bad C code that uses sockets of interprocess communication somewhere, but nothing handy.
IPC is easy on C++, just call the POSIX C API.
But what you're asking would be much better served by a queue manager. Make the background daemon wait for a message on the queue, and the frontend PHP just add there the specifications of the task it wants processed. Some queue managers allow the result of the task to be added to the same object, or you can define a new queue for the finish messages.
One of the best known high-performance queue manager is RabbitMQ. Another one very easy to use is MemcacheQ.
Or, you could just add a table to MySQL for tasks, the background process just queries periodically for unfinished ones. This works and can be very reliable (sometimes called Ghetto queues), but break down at high tasks/second.

Web application background processes, newbie design question

I'm building my first web application after many years of desktop application development (I'm using Django/Python but maybe this is a completely generic question, I'm not sure). So please beware - this may be an ultra-newbie question...
One of my user processes involves heavy processing in the server (i.e. user inputs something, server needs ~10 minutes to process it). On a desktop application, what I would do it throw the user input into a queue protected by a mutex, and have a dedicated background thread running in low priority blocking on the queue using that mutex.
However in the web application everything seems to be oriented towards synchronization with the HTTP requests.
Assuming I will use the database as my queue, what is best practice architecture for running a background process?
There are two schools of thought on this (at least).
Throw the work on a queue and have something else outside your web-stack handle it.
Throw the work on a queue and have something else in your web-stack handle it.
In either case, you create work units in a queue somewhere (e.g. a database table) and let some process take care of them.
I typically work with number 1 where I have a dedicated windows service that takes care of these things. You could also do this with SQL jobs or something similar.
The advantage to item 2 is that you can more easily keep all your code in one place--in the web tier. You'd still need something that triggers the execution (e.g. loading the web page that processes work units with a sufficiently high timeout), but that could be easily accomplished with various mechanisms.
Since:
1) This is a common problem,
2) You're new to your platform
-- I suggest that you look in the contributed libraries for your platform to find a solution to handle the task. In addition to queuing and processing the jobs, you'll also want to consider:
1) status communications between the worker and the web-stack. This will enable web pages that show the percentage complete number for the job, assure the human that the job is progressing, etc.
2) How to ensure that the worker process does not die.
3) If a job has an error, will the worker process automatically retry it periodically?
Will you or an operations person be notified if a job fails?
4) As the number of jobs increase, can additional workers be added to gain parallelism?
Or, even better, can workers be added on other servers?
If you can't find a good solution in Django/Python, you can also consider porting a solution from another platform to yours. I use delayed_job for Ruby on Rails. The worker process is managed by runit.
Regards,
Larry
Speaking generally, I'd look at running background processes on a different server, especially if your web server has any kind of load.
Running long processes in Django: http://iraniweb.com/blog/?p=56