Kill Bash copy child process to simulate crash

Kill Bash copy child process to simulate crash - unit-testing

I'm trying to test a Bash script which copies files individually and does some stuff to each file. It is meant to be resumable, so I'd like to make sure to test this properly. What is an elegant solution to kill or otherwise abort the script which does the copying from the test script, making sure it does not have time to copy and process all the files?
I have the PID of the child process, I can change the source code of both scripts, and I can create arbitrarily large files to test on.
Clarification: I start the script in the background with &, get the PID as $!, then I have a loop which checks that there is at least one file in the target directory (the test script copies three files). At that point I run kill -9 $PID, but the process is not interrupted - The files are copied successfully. This happens even if the files are big enough that creating them (with dd and /dev/urandom) takes a couple seconds.
Could it be that the files are only visible to the shell when cp has finished? It would be a bit strange, but it would explain why the kill command is too late.
Also, the idea is not to test resuming the same process, but cutting off the first process (simulate a system crash) and resuming with another invocation.

Send a KILL signal to the child process:
kill -KILL $childpid

You can try an play the timing game by using large files and sleeps. You may have an issue with the repeatability of the test.
You can add throttling code to the script your testing and then just throttle it all the way down. You can do throttling code by passing in a value which is:
a sleep value for sleeping in the loop
the number of files to process
the number of seconds after which the script will die
a nice value to execute the script at
Some of these may work better or worse from a testing point of view. nice'ing may get you variable results, as will setting up a background process to kill your script after N seconds. You can also try more than one of these at the same time which may give you the control you want. For example, accepting both a sleep value and the kill seconds could give you fine grained throttling control.

Related

How to prevent procmail from crashing the platform and make it run one process at a time?

I have the problem that I capture emails and they arrive in masses, the issue is that every time they arrive in masses the platform crashes, the question is how to make it go running the process 1 at a time, is it possible? because currently I filled the entire procmail server where there were multiple processes at once, plus we add the executives who were working and the server died and we had to reboot and delete data from the procmail to get it working again.
Because once we capture the data it is working and making subprocesses.
This is the code:
SHELL = /bin/sh
LOGFILE = /var/log/procmail.log
LOGABSTRACT = "all"
VERBOSE = "on"
:0c
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
:0:
/var/log/plaform_catchemail

If by "platform" you mean the PHP script, you can serialize access to it by using a lock file.
:0c:.catchemail.lock
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
This means, if the file .catchemail.lock does not exist in your $MAILDIR, go ahead and create it, and hold it for the duration of this recipe.
If it does exist, sleep and try again.
There is a failure scenario if the lock is held for too long; Procmail's default behavior in this case is to bounce the message (i.e. cause the delivering MTA to regard it as undeliverable, and return an error message to the sender). You probably want to avoid that, ideally by telling the MTA to attempt delivery again at a later time. (The precise mechanism will depend on your MTA; but basically, by setting a suitable exit code.) But what's feasible and scalable ultimately depends on how many messages you receive vs how many you can process under this constraint.

log4cpp stops working properly after sometime

I have a log4cpp implementation in a multiple process environment . Logger is configured once during initialization and then is shared among forked processes which server http requests.
During first minute or so , I see the logs rolls perfectly fine at the query per second load( say it runs at 100qps).
After that, the log slows down dramatically. So, I logged pid as well and notice that only one process gets to write to the log for a time duration ( around 10-15 seconds) and then another process starts writing and so on so forth . Processes don't die. They just don't get a chance to write.
This is different from what happens when the server starts . At that time, every other log line is written by a different process. ( Also, I write one-log-line per process at the end of serving the request. )
At this point, I can't think of what could be going wrong.
This is how my log4cpp conf file looks
log4cpp.rootCategory=DEBUG,rootAppender
log4cpp.appender.rootAppender=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.rootAppender.fileName=/tmp/mylogfile.log
log4cpp.appender.rootAppender.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.rootAppender.layout.ConversionPattern=%d|%p|%m%n
log4cpp.category.http.server.main=INFO,MAIN
log4cpp.additivity.http.server.main=false
log4cpp.appender.MAIN=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.MAIN.maxBackupIndex=10
log4cpp.appender.MAIN.maxFileAge=1
log4cpp.appender.MAIN.append=true
log4cpp.appender.MAIN.fileName=/tmp/mylogfile.log
log4cpp.appender.MAIN.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.MAIN.layout.ConversionPattern=%d|%p|%m%n
Edit: more updates : Thanks #Botje for your time.
I see that whenever a new child process is created , it is only that process that gets to write to the log. That tells me that all the reference other processes were holding become invalid.
I also tried setting additive property to true. With that , server starts properly writing into the /tmp/myfile.log and then switches to writing into /tmp/myfile.log.1 withing a minute . And then stops writing after a minute.
At that point logs gets directed to stderr which is directed to another log file.
Also,

I did notice that the log4cpp FileAppender uses seek to determine the file size before writing log entries. If the file handle is shared between processes that will cause writes to end up at the start of the file instead of the end. Even if you fix that, you still have multiple processes that think they are in charge of log file rotation.
I suggest you have all processes write to a common udp/tcp/Unix socket and designate one process that collects all log entries and actually writes it to a file. You don't have to reinvent the wheel, you can use the syslog protocol and either the system syslog or a copy running in userspace.

C++ pause/resume system on large operation

I have a C++ program that loads a file with few millions lines and starts processing, the same operation was done by a php script, but in order to reduce the execution time I switched to C++.
In the old script, I checked whether there is a file with the current operation id in a "pause" folder, the file is empty It is just to check if a pause is requested, the script then checks after each 5 iterations if there is such file, if so It stuck on an empty loop until the file is deleted (a.k.a resume) :
foreach($lines as $line)
{
$isFinished = $index >= $countData - 1;
if($index % 5 == 0)
{
do
{
$isPaused = file_exists("/home/pauses/".$content->{'drop-id'});
}while($isPaused);
}
// Starts processing the line here
}
But since disk accessing is relatively slow, I don't want to follow the same approach, so I was thinking of some sort of commands that simulates this :
$ kill cpp_program // C++ program returns the last index checked e.g: 37710
$ ./main 37710
$ // cpp_program escapes the first 37709 lines and continues its job
What do you think of this approach ? Is-it feasible ? Is-it non time-consuming ? Is there any better approach ?
Thank you
Edit : A clarification because this seems a little ambiguous, this task runs in the background, there is another application which starts this one, I want to be able to send command from the management app (through Linux commands) to the background task to pause/resume.

Jumping to the 37710 line of a text file sadly requires reading all 37710 lines before it on most operating systems.
On most operating systems, text files are binary files with a convention about newlines. But the OS doesn't cache where the newlines are.
So to find the newlines, you have to read every byte.
If your program saved the byte offset of the file it had reached, it could seek to that location, however.
You can save the state of your program to some config file as you are shutting down, and set it to resume by default when it starts up again. This will require catching the signal you use to shut down, making your main logic notice the signal flag being set, and then cleanly shutting down. It is a very C-esque operation.
Now, a different traditional way to make a program controllable remotely is to have it listen on a TCP port (and/or stdin) and take command line commands there.
To go that way, you'd write a REPL component, then hook that up to whatever input and output.
Either you'd do the REPL in a coroutine like way between processing files, or you'd spawn a separate thread to do REPL and have it communicate asynchronously with the processing thread.
However, this could be beyond your skill. Each step of this (writing a REPL system, having it not block the main work, responding to commands, then attaching it to a TCP port) would take some effort and learning on your part.

C++: How to set a timeout (not reading input, not threaded)?

Got a large C++ function in Linux that calls a whole lot of other functions, making up an algorithm. At various points given certain bad inputs, the algorithm can get "stuck" and go on forever. Adding a timeout seems appropriate as all potential "stuck" points cannot be predicted. But despite scouring the Internet for timeout examples I've only found how to apply timeouts when either the thing your timing is a separate thread or it's reading inputs. My code is a single thread and does not modify file descriptors, so not coming up with any luck. Do I basically have no choice but to thread it?

I am not sure about the situation, actually server applications or embedded applications often run for years in background without stopping. I think one option is to let your program run in background and log to a file(or screen) timely, and, if you really want to stop the program after certain time, you can use timeout command or a script to kill your program after that time, say, timeout 15s your-prog.

C++ executing a bash script which terminates and restarts the current process

So here is the situation, we have a C++ datafeed client program which we run ~30 instances of with different parameters, and there are 3 scripts written to run/stop them: start.sh stop.sh and restart.sh (which runs stop.sh and then start.sh).
When there is a high volume of data the client "falls behind" real time. We test this by comparing the system time to the most recent data entry times listed. If any of the clients falls behind more than 10 minutes or so, I want to call the restart script to start all the binaries fresh so our data is as close to real time as possible.
Normally I call a script using System(script.sh), however the restart script looks up and kills the process using kill, BUT calling System() also makes the current program execution ignore SIGQUIT and SIGINT until system() returns.
On top of this if there are two concurrent executions with the same arguments they will conflict and the program will hang (this stems from establishing database connections), so I can not start the new instance until the old one is killed and I can not kill the current one if it ignores SIGQUIT.
Is there any way around this? The current state of the binary and missing some data does not matter at all if it has reached the threshold, I also can not just have the program restart itself, since if one of the instances falls behind, we want to restart all 30 of the instances (so gaps in the data are at uniform times). Is there a clean way to call a script from within C++ which hands over control and allows the script to restart the program from scratch?
FYI we are running on CentOS 6.3

Use exec() instead of system(). It will replace your process with the new one. Note there is a significant different in how exec() is called and how it behaves: system() passes its string argument to the system shell to run. exec() actually executes an executable file, and you need to supply the arguments to the process one at a time, instead of letting the shell parse them apart for you.

Here's my two cents.
Temporary solution: Use SIGKILL.
Long-term solution: Optimize your code or the general logic of your service tree, using other system calls like exec or by rewritting it to use threads.
If you want better answers maybe you should post some code and or degeneralize the issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js