Occasionally my instances get into a corrupted state (especially since min-instance=1). I would like to restart one manually. Is this possible?
I know I can go through the console to create a new version, but this messes up my Terraform state. I would like to keep with the current version and just cycle the instance, a classic IT procedure called "Turning it on and off again" to fix my short term issue while I figure out the larger issue.
No, you can't do it. If you have a routine that can detect a corruption, you can exit the container (the instance stopped and a new one is created). For that, 2 options:
Either you have an internal check that detects automatically the state of the container and exits in case of corruption (works for max-instance >=1 )
Or you can have 2 different endpoints (works only for max-instance=1):
One tell you the state of the container (OK or KO)
In case of KO, you can call an endpoint in your app that stop the instance ( And if your container is public, it's dangerous because anyone can restart your container!)
Related
I'm trying to find a workaround to the following limitation: When starting an Akka Cluster from scratch, one has to make sure that the first seed node is started. It's a problem to me, because if I have an emergency to restart all my system from scratch, who knows if the one machine everything relies on will be up and running properly? And I might not have the luxury to take time changing the system configuration. Hence my attempt to create the cluster manually, without relying on a static seed node list.
Now it's easy for me to have all Akka systems registering themselves somewhere (e.g. a network filesystem, by touching a file periodically). Therefore when starting up a new system could
Look up the list of all systems that are supposedly alive (i.e. who touched the file system recently).
a. If there is none, then the new system joins itself, i.e. starts the cluster alone. b. Otherwise it tries to join the cluster with Cluster(system).joinSeedNodes using all the other supposedly alive systems as seeds.
If 2. b. doesn't succeed in reasonable time, the new system tries again, starting from 1. (looking up again the list of supposedly alive systems, as it might have changed in the meantime; in particular all other systems might have died and we'd ultimately fall into 2. a.).
I'm unsure how to implement 3.: How do I know whether joining has succeeded or failed? (Need to subscribe to cluster events?) And is it possible in case of failure to call Cluster(system).joinSeedNodes again? The official documentation is not very explicit on this point and I'm not 100% how to interpret the following in my case (can I do several attempts, using different seeds?):
An actor system can only join a cluster once. Additional attempts will
be ignored. When it has successfully joined it must be restarted to be
able to join another cluster or to join the same cluster again.
Finally, let me precise that I'm building a small cluster (it's just 10 systems for the moment and it won't grow very big) and it has to be restarted from scratch now and then (I cannot assume the cluster will be alive forever).
Thx
I'm answering my own question to let people know how I sorted out my issues in the end. Michal Borowiecki's answer mentioned the ConstructR project and I built my answer on their code.
How do I know whether joining has succeeded or failed? After issuing Cluster(system).joinSeedNodes I subscribe to cluster events and start a timeout:
private case object JoinTimeout
...
Cluster(context.system).subscribe(self, InitialStateAsEvents, classOf[MemberUp], classOf[MemberLeft])
system.scheduler.scheduleOnce(15.seconds, self, JoinTimeout)
The receive is:
val address = Cluster(system).selfAddress
...
case MemberUp(member) if member.address == address =>
// Hooray, I joined the cluster!
case JoinTimeout =>
// Oops, couldn't join
system.terminate()
Is it possible in case of failure to call Cluster(system).joinSeedNodes again? Maybe, maybe not. But actually I simply terminate the actor system if joining didn't succeed and restart it for another try (so it's a "let it crash" pattern at the actor system level).
You don't need seed-nodes. You need seed nodes if you want the cluster to auto-start up.
You can start your individual application and then have them "manually" join the cluster at any point in time. For example, if you have http enabled, you can use the akka-management library (or implement a subset of it yourself, they are all basic cluster library functions just nicely wrapped).
I strongly discourage the touch approach. How do you sync on the touch reading / writing between nodes? What if someone reads a transient state (while someone else is writing it) ?
I'd say either go full auto (with multiple seed-nodes), or go full "manual" and have another system be in charge of managing the clusterization of your nodes. By that I mean you start them up individually, and they join the cluster only when ordered to do so by the external supervisor (also very helpful to manage split-brains).
We've started using Constructr extension instead of the static list of seed-nodes:
https://github.com/hseeberger/constructr
This doesn't have the limitation of a statically-configured 1st seed-node having to be up after a full cluster restart.
Instead, it relies on a highly-available lookup service. Constructr supports etcd natively and there are extensions for (at least) zookeeper and consul available. Since we already have a zookeeper cluster for kafka, we went for zookeeper:
https://github.com/typesafehub/constructr-zookeeper
I have a small Windows application which is allowed to run only once. The single instance check is done using a Windows mutex (CreateMutex).
The application should bring the already running application to the front and show an info message to the user.
The info message is created using the MessageBox function from Windows. However, each time I try to start a new instance of my application, a new messagebox is created (allowing me to open hundreds of messageboxes).
Is there a way to limit the number of message boxes to one (besides locking another mutex for the message box)?
There are three states of your program:
Normal (first instance),
Informing about second run (second instance, showing message box about it right now),
Extra (more than second instance, which knows about showed message box).
It seems reasanoble for the program in third state to quit silently. There are many ways to do it. And since you are already using mutex for detection of second state, is will be reasanoble to use the same logic for detection of third state too.
But you can do it another way:
Second instance of your program can inform first instance of the program about second run and quit.
And main (first) instance of the program will show the message box after receiving of this kind of message. In this case your program should ignore all such messages begore closing of current message box (to avoid showing of hungreds of such message boxes one by one).
From my point of view, first approach is better (i.e. to show message box by second/third/... instance of the program, and only after it to bring main instance to the front).
I was writing an application in Play 2.3.7 and when trying to create an actor (using the default Akka.system() of Play) inside the beforeStart overriden method of the Global object, the application crashes with some infinite recursive call of beforeStart, ultimately throwing an exception due to Global object not being initialized. If I create this actor inside the onStart method, then everything goes well.
My "intuition" was: "ok, this actor must be ready before the application receives the first request, so it must be created on beforeStart, not in onStart".
When is Akka.system() ready to use?
Akka.system returns an ActorSystem held by the AkkaPlugin. Therefore, if you want to use it, you must do so after the AkkaPlugin has been initialized. The AkkaPlugin is given priority 1000, which means its started after most other internal plugins (database, evolutions, ..). The Global plugin has priority 10000, which means the AkkaPlugin is available there (and for any plugin with priority > 1000).
Note the warning in the docs about beforeStart:
Called before the application starts.
Resources managed by plugins, such as database connections, are likely not available at this point.
You have to start this in onStart() because beforeStart() is called too early - way before anything like Akka (which is actually a plugin) or any database connections are created. In fact, the documentation for GlobalSettings states:
Resources managed by plugins, such as database connections, are likely not available at this point.
The general guidance (confirmed by this thread) is that onStart() is the place to create your actors. And in practice, that has worked for me as well.
Processes A and B both operate on a Redis resource R.
These processes may be executed in parallel, and I need both processes to be certain of the value of R at the moment they change it.
I'm therefore using Redis transactions with the WATCH command. From the docs: "we are asking Redis to perform the transaction only if no other client modified any of the WATCHed keys. Otherwise the transaction is not entered at all."
To retry in case of failure, the suggested way is looping the Watch/Multi-exec loop until it succeeds. However, I'm worried that both A and B might starting looping indefinitely (i.e.: livelock).
It this something to be worried about? Better yet, what to do about it? Would setting a random timeout on the retry solve the issue?
No need to worry because only A or B will succeed with their EXEC and change R (Redis is [mostly] single threaded). The one that fails will need to retry the transaction with the new R value.
I'm maintaining an MFC application in some points it stop responding for 30 seconds , some times 1 minute or more. I'm supposed to fix that issue, I tried tracking the code[all methods in this class] and also the issue is still, I tried to pause debugging during this time and I got nothing as in this image
I want to know how to track the code that cause the application to stop responding
add watch variable for any separate thread event
increment each watch variable inside its thread
also very useful can be flag if the thread/event is executing (especially for events)
you must set this flag on enter
end clear before exit
visualize watch variables somehow.
either use some debug print inside your App.
or use separate Window
or even better separate App connected with any IPC method. (this will work even if your App UI hangs)
When the app hangs just see which variables are incrementing and which not
with flags you can determine exact what is hanging up on you ...
Good luck with debuging.