How to prevent procmail from crashing the platform and make it run one process at a time? - procmail

I have the problem that I capture emails and they arrive in masses, the issue is that every time they arrive in masses the platform crashes, the question is how to make it go running the process 1 at a time, is it possible? because currently I filled the entire procmail server where there were multiple processes at once, plus we add the executives who were working and the server died and we had to reboot and delete data from the procmail to get it working again.
Because once we capture the data it is working and making subprocesses.
This is the code:
SHELL = /bin/sh
LOGFILE = /var/log/procmail.log
LOGABSTRACT = "all"
VERBOSE = "on"
:0c
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
:0:
/var/log/plaform_catchemail

If by "platform" you mean the PHP script, you can serialize access to it by using a lock file.
:0c:.catchemail.lock
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
This means, if the file .catchemail.lock does not exist in your $MAILDIR, go ahead and create it, and hold it for the duration of this recipe.
If it does exist, sleep and try again.
There is a failure scenario if the lock is held for too long; Procmail's default behavior in this case is to bounce the message (i.e. cause the delivering MTA to regard it as undeliverable, and return an error message to the sender). You probably want to avoid that, ideally by telling the MTA to attempt delivery again at a later time. (The precise mechanism will depend on your MTA; but basically, by setting a suitable exit code.) But what's feasible and scalable ultimately depends on how many messages you receive vs how many you can process under this constraint.

Related

log4cpp stops working properly after sometime

I have a log4cpp implementation in a multiple process environment . Logger is configured once during initialization and then is shared among forked processes which server http requests.
During first minute or so , I see the logs rolls perfectly fine at the query per second load( say it runs at 100qps).
After that, the log slows down dramatically. So, I logged pid as well and notice that only one process gets to write to the log for a time duration ( around 10-15 seconds) and then another process starts writing and so on so forth . Processes don't die. They just don't get a chance to write.
This is different from what happens when the server starts . At that time, every other log line is written by a different process. ( Also, I write one-log-line per process at the end of serving the request. )
At this point, I can't think of what could be going wrong.
This is how my log4cpp conf file looks
log4cpp.rootCategory=DEBUG,rootAppender
log4cpp.appender.rootAppender=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.rootAppender.fileName=/tmp/mylogfile.log
log4cpp.appender.rootAppender.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.rootAppender.layout.ConversionPattern=%d|%p|%m%n
log4cpp.category.http.server.main=INFO,MAIN
log4cpp.additivity.http.server.main=false
log4cpp.appender.MAIN=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.MAIN.maxBackupIndex=10
log4cpp.appender.MAIN.maxFileAge=1
log4cpp.appender.MAIN.append=true
log4cpp.appender.MAIN.fileName=/tmp/mylogfile.log
log4cpp.appender.MAIN.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.MAIN.layout.ConversionPattern=%d|%p|%m%n
Edit: more updates : Thanks #Botje for your time.
I see that whenever a new child process is created , it is only that process that gets to write to the log. That tells me that all the reference other processes were holding become invalid.
I also tried setting additive property to true. With that , server starts properly writing into the /tmp/myfile.log and then switches to writing into /tmp/myfile.log.1 withing a minute . And then stops writing after a minute.
At that point logs gets directed to stderr which is directed to another log file.
Also,
I did notice that the log4cpp FileAppender uses seek to determine the file size before writing log entries. If the file handle is shared between processes that will cause writes to end up at the start of the file instead of the end. Even if you fix that, you still have multiple processes that think they are in charge of log file rotation.
I suggest you have all processes write to a common udp/tcp/Unix socket and designate one process that collects all log entries and actually writes it to a file. You don't have to reinvent the wheel, you can use the syslog protocol and either the system syslog or a copy running in userspace.

C++ executing a bash script which terminates and restarts the current process

So here is the situation, we have a C++ datafeed client program which we run ~30 instances of with different parameters, and there are 3 scripts written to run/stop them: start.sh stop.sh and restart.sh (which runs stop.sh and then start.sh).
When there is a high volume of data the client "falls behind" real time. We test this by comparing the system time to the most recent data entry times listed. If any of the clients falls behind more than 10 minutes or so, I want to call the restart script to start all the binaries fresh so our data is as close to real time as possible.
Normally I call a script using System(script.sh), however the restart script looks up and kills the process using kill, BUT calling System() also makes the current program execution ignore SIGQUIT and SIGINT until system() returns.
On top of this if there are two concurrent executions with the same arguments they will conflict and the program will hang (this stems from establishing database connections), so I can not start the new instance until the old one is killed and I can not kill the current one if it ignores SIGQUIT.
Is there any way around this? The current state of the binary and missing some data does not matter at all if it has reached the threshold, I also can not just have the program restart itself, since if one of the instances falls behind, we want to restart all 30 of the instances (so gaps in the data are at uniform times). Is there a clean way to call a script from within C++ which hands over control and allows the script to restart the program from scratch?
FYI we are running on CentOS 6.3
Use exec() instead of system(). It will replace your process with the new one. Note there is a significant different in how exec() is called and how it behaves: system() passes its string argument to the system shell to run. exec() actually executes an executable file, and you need to supply the arguments to the process one at a time, instead of letting the shell parse them apart for you.
Here's my two cents.
Temporary solution: Use SIGKILL.
Long-term solution: Optimize your code or the general logic of your service tree, using other system calls like exec or by rewritting it to use threads.
If you want better answers maybe you should post some code and or degeneralize the issue.

Request for suggestions on doing IPC/event capture

I have a simple python server script which forks off multiple instances (say N) of C++ program. The C++ program generates some events that need to be captured.
The events are currently being captured in a log file (1 logfile per forked process). In addition, i need to periodically (T minutes) get the rate at which the events are being produced across all child processes to either the python server or some other program listening for these events (still not sure). Based on rate of these events, some "re-action" may be taken by the server (say reduce the number of forked instances)
Some pointers i have briefly looked at:
grep log files - go through the running process log files (.running), filter those entries generated in the last T minutes, analyse the data and report
socket ipc - add code to c++ program to send the events to some server program which analyses the data after T minutes, reports and starts all over again
redis/memcache (not sure completely) - add code to c++ program to use some distributed store to capture all the generated data, analyses the data after T minutes, reports and starts all over again
Please let me know your suggestions.
Thanks
if time is not of the essence (T minutes sounds like it is long compared to whatever events are happening in the C++ programs that are kicked off) then dont make things any more complicated than they need to be. forget IPC (sockets, shared mem, etc), just have each C++ program log what you need to know about time/performance and let the python script check logs every T minutes that you need the data. dont waste time overcomplicating something that you can do in a simple manner
As a alternative to your socket IPC suggestion, how about 0mq. It's a library (in C with python bindings available) that can do message transfer on an inter-thread, inter-process or inter-machine level. Pretty simple to get going, and pretty quick.
I'm not affiliated with it. I'm just evaluating it for other uses and thought it might be a fit for you as well.

Kill Bash copy child process to simulate crash

I'm trying to test a Bash script which copies files individually and does some stuff to each file. It is meant to be resumable, so I'd like to make sure to test this properly. What is an elegant solution to kill or otherwise abort the script which does the copying from the test script, making sure it does not have time to copy and process all the files?
I have the PID of the child process, I can change the source code of both scripts, and I can create arbitrarily large files to test on.
Clarification: I start the script in the background with &, get the PID as $!, then I have a loop which checks that there is at least one file in the target directory (the test script copies three files). At that point I run kill -9 $PID, but the process is not interrupted - The files are copied successfully. This happens even if the files are big enough that creating them (with dd and /dev/urandom) takes a couple seconds.
Could it be that the files are only visible to the shell when cp has finished? It would be a bit strange, but it would explain why the kill command is too late.
Also, the idea is not to test resuming the same process, but cutting off the first process (simulate a system crash) and resuming with another invocation.
Send a KILL signal to the child process:
kill -KILL $childpid
You can try an play the timing game by using large files and sleeps. You may have an issue with the repeatability of the test.
You can add throttling code to the script your testing and then just throttle it all the way down. You can do throttling code by passing in a value which is:
a sleep value for sleeping in the loop
the number of files to process
the number of seconds after which the script will die
a nice value to execute the script at
Some of these may work better or worse from a testing point of view. nice'ing may get you variable results, as will setting up a background process to kill your script after N seconds. You can also try more than one of these at the same time which may give you the control you want. For example, accepting both a sleep value and the kill seconds could give you fine grained throttling control.

How to check if an application is in waiting

I have two applications running on my machine. One is supposed to hand in the work and other is supposed to do the work. How can I make sure that the first application/process is in wait state. I can verify via the resources its consuming, but that does not guarantee so. What tools should I use?
Your 2 applications shoud communicate. There are a lot of ways to do that:
Send messages through sockets. This way the 2 processes can run on different machines if you use normal network sockets instead of local ones.
If you are using C you can use semaphores with semget/semop/semctl. There should be interfaces for that in other languages.
Named pipes block until there is both a read and a write operation in progress. You can use that for synchronisation.
Signals are also good for this. In C it is called sendmsg/recvmsg.
DBUS can also be used and has bindings for variuos languages.
Update: If you can't modify the processing application then it is harder. You have to rely on some signs that indicate the progress. (I am assuming you processing application reads a file, does some processing then writes the result to an output file.) Do you know the final size the result should be? If so you need to check the size repeatedly (or whenever it changes).
If you don't know the size but you know how the processing works you may be able to use that. For example the processing is done when the output file is closed. You can use strace to see all the system calls including the close. You can replace the close() function with the LD_PRELOAD environment variable (on windows you have to replace dlls). This way you can sort of modify the processing program without actually recompiling or even having access to its source.
you can use named pipes - the first app will read from it but it will be blank and hence it will keep waiting (blocked). The second app will write into it when it wants the first one to continue.
Nothing can guarantee that your application is in waiting state. You have to pass it some work and get back a response. It might be transactions or not - application can confirm that it got the message to process before it starts to process it or after it was processed (successfully or not). If it does not wait, passing a piece of work should fail. Whether when trying to write to a TCP/IP socket or other means, or if timeout occurs. This depends on implementation, what kind of transport you are using and other requirements.
There is actually a way of figuring out if the process (thread) is in blocking state and waiting for data on a socket (or other source), but that means that client should be on the same computer and have access privileges required to do that, but that makes no sense other than debugging, which you can do using any debugger anyway.
Overall, the idea of making sure that application is waiting for data before trying to pass it that data smells bad. Not to mention the racing condition - what if you checked and it was OK, and when you actually tried to send the data, you found out that application is not waiting at that time (even if that is microseconds).