Reading #Akka Concurrency. I'm a little bit screwed-up.
If i don't manage children restart during parent restart, Derek states that there is no way to escape for children. The children will die or their state will be completely wiped out.
The question is what exactly will happen with children if not restart them during parent restart
One can do that by overwrite preRestart and postRestart methods simply skip stop in first and skip start in second
If you want to stop child-actors from being terminated when their parent dies, simply override the preRestart method without delegating to super.preRestart - the logic of terminating children is in that method in the Actor class.
This is explained in detail in the akka docs section on "Supervision - What restarting means"
Related
I have a systemd service which runs and does its thing. Periodically I need it to upgrade itself, which requires a shutdown and a restart of the service. For question purposes the upgrade script can be as simple as:
echo "Stopping service..."
systemctl stop myservice
echo "Doing some stuff..."
sleep 10s
echo "Starting service..."
systemctl start myservice
I want to call this within the service itself, preferably using boost::process:
boost::process::child instexe{
boost::process::search_path("bash"),
std::vector<std::string>{"installerscript.sh"},
boost::process::start_dir("/installer/folder"),
boost::process::std_out > "/some/log/file.txt"
};
instexe.detach();
The problem is that as soon as the script calls systemctl stop myservice, the installer script is killed.
Is there a way I can do what I want to do with boost::process? Or how can I do it?
If the upgrades are at predefined period you can think of using crontab.
https://opensource.com/article/17/11/how-use-cron-linux
00 09-17 * * 1-5 /usr/local/bin/installerScript.sh
The above entry in crontab will make the program upgrade every hour between 9 am to 5pm from Monday to Friday. There are many combinations that you can think and configure.
Is there a way I can do what I want to do with boost::process? Or how can I do it?
If you have the child process killing the parent, there's always going to be a race condition by definition.
The quick hack is to put a sleep statement at the start of the installer script, but the correct solution is to explicitly synchronize with the child:
have the installer script detect whether it's running interactively (ie, being run manually from a terminal instead of by your service)
if it is non-interactive (your use case), have it wait for some input in stdin
connect the stdin pipe when you create the child
detach the child and then write something to tell the child it's safe
Other synchronization mechanisms are available, you could use a lockfile or a signal - you just need to make sure the child doesn't do anything until after the parent has detached it.
I turns out (from this question, which leads to the excellent-but-unfindable systemd.kill manpage) that systemd has four different ways of stopping a unit, controlled by the KillMode variable in your unit configuration:
control-group will send SIGTERM (by default, overridable with KillSignal) to every process in the unit's cgroup. That means both parent and child.
mixed will send SIGTERM (or KillSignal) to your main process and SIGKILL to the child.
process will kill only the main process and leave the child alone
none is not recommended, it will just run your ExecStop procedure
You can probably just set KillMode=process, but note that if SendSIGKill or SendSIGUP are true, those signals will still be delivered to your child after TimeoutStopSec.
It seems like it might be simpler to restart your service and have a launch script that can update it at startup, or to perform the update in your ExecStop procedure, than to persuade systemd to leave the child alone until the update is complete, without the risk of a hung child updater hanging around forever.
Either way, your remaining problems are exclusively with systemd rather than with boost.Process.
I have read the Akka docs on fault tolerance & supervision, and I think I totally get them, with one big exception (no pun intended).
Why would you ever want/need to stop a child actor???
The only clue in the docs is:
Closer to the Erlang way is the strategy to just stop children when they fail and then take corrective action in the supervisor...
But to me, stopping a child is the same as saying "don't execute this code any longer", which to me, is effectively the same as deploying new changes to the code which has that actor removed entirely:
Every Actor plays some critical role in the actor system
To simply stop the actor means that actor currently doesn't have a role any longer, and presumes the system can now somehow (magically) work without it
So again, to me, this is no different than refactoring the code to not even have the actor any more, and then deploying those changes
I'm sure I'm just not seeing the forest through the trees on this one, but I just don't see any use cases where I'd have this big complex actor system, where each actor does critical work and then hands it off to the next critical actor, but then I stop an actor, and magically the whole system keeps on working perfectly.
In short: stopping an actor (to me) is like ripping the transmission out of a moving vehicle. How can this ever be a good/desirable thing?!?
The essence of the "error kernel" pattern is to delegate risky operations and protect essential state, it is common to spawn child-actors for one-off operations, and when that operation is completed and its result send off somewhere else, the child-actor or the parent-actor needs to stop it. (otherwise the child-actor will remain active/leak)
If the child actor is doing a longer process that could be terminated safely, such as video coding, or some kind of file transformation and you have to deploy a new build, in that case a terminate sign would be useful to stop running processes gracefully.
Every Actor plays some critical role in the actor system
This is where you are running into trouble, I can create a child actor to do a job, for example execute a query against a database or maintain the state of a connected user and this is its only purpose.
Once the database query is complete or the user has gracefully disconnected the child actor no longer has any role to play and should be stopped so that it will release any resources it holds.
To simply stop the actor means that actor currently doesn't have a role any >longer, and presumes the system can now somehow (magically) work without it
The system is able to continue because I can create new child actors if/when they are needed.
Is there a way to start slurmctld daemon with the execution nodes off, but making it to belive that he has requested the suspend for these nodes (e.g. like if it had called the SuspendProgram)?
I am setting up a virtual cluster, so the SuspendProgram and ResumeProgram do terminate and instanciate virtual machines. In this way I could power on only the master node, and he would fire up nodes only when requested.
The problem is that for the moment, when I start slurmctld I need the nodes to get up, tell him that they exits, and wait that he shut them down. This adds unwanted costs, because I need to poweron all the "supposed" instances.
I would like to instanciate the master, the one running slurmctld, and let him think that the nodes are idle~ like after SuspendProgram.
Cheers
What you can try is set the nodes to state POWER_DOWN in slurm.conf so that at startup, slurmctld will see those nodes as powered down by SuspendProgram
NodeName=... Sockets=... CoresPerSocket... [etc] State=POWER_DOWN
I'm using a List of QProcess objects to keep track of some processes that need to be start/stopped at user defined intervals.
I'm able to start and stop the processes OK. But the issue arises when I stop a process using the following methods (Pseudo code):
process->start("PathToProcess","Some Arguments");
//Do some stuff.
process->terminate();
However, if I try to start the process again at another time, I get the error:
QProcess::start: Process is already running
I can do a ps -ef|grep processName and find that it is indeed dead, but it's sitting in a defunct state which I think is preventing me from starting it again.
What do I need to do to prevent this defunct state, or remove the defunct method so I can start my process again without reconstruction?
Figured out what was causing the error.
In qprocess_unix.cpp, you'll find a class called QProcessManager. Essentially this class has signal handlers that watch for child processes that have died. When a child dies, the QProcessManager sends a message across a pipe that lets the QProcess class know that it terminated/died.
In a unrelated part of my code, I had set up some signal catching statements that I used for various puposes. However, these signal catches were catching my SIGCHLD event and thus the QProcessManager was never being triggered to pipe to the QProcess that it died.
In my case, my only options are to either watch for the death of the child manually or to remove the signal catching I'm performing in my other sections of code.
For future reference, if you have this problem, you may be better off doing POSIX calls for kills and terminates, and checking the return value of those calls manually. If success, perform a:
process->setProcessState(ProcessState::NotRunning);//Specify the process is no longer running
waitpid(process->pid(),NULL,WNOHANG); //Clear the defunct process.
Thanks all.
Call process->waitForFinished() after calling process->terminate() in order to reap the zombie process. Then you can reuse the process object.
I will jump right in, to be brief and descriptive:
C++, Windows API
I am creating child processes using CreateProcess to run external (command-line) applications. I have built in a time-out, and if the child process has not returned normal execution by that time, I wish to force termination on that child process.
Ideally, I would like for that child process to act the same as if it had called ExitProcess, or as if a Ctrl+C was sent to its console (which calls ExitProcess from the default console control handler).
My solution so far has been the use of TerminateProcess to kill the child forcefully. This does force the child to terminate immediately, but unfortunately if that child spawned any children of its own they are left to run until their "natural" completion.
Is there a way to tell the child process to call ExitProcess, or to force all of the child's children to also terminate when TerminateProcess is called?
These external applications are beyond my control, and as such I can not modify them to provide a custom work-around.
Assume no knowledge of grand-child processes (names/pids/etc) that would allow me to manually call TerminateProcess on grand-child processes individually. Although this could be done by manually enumerating all processes, mapping process relationships, and tracking all processes, I do not consider this a valid solution except as the absolute last resort.
Thank you for your time.
You can use Job objects to kill all the processes as a unit. You create a job object via the CreateJobObject API, and assign a process to it with AssignProcessToJobObject. New processes created by a process in a job object belong to the same job object by default. Calling TerminateJobObject will terminate all associated processes in the job object.