AWS Batch jobs stuck in PENDING when they `dependsOn` - amazon-web-services

I have an issue chaining AWS Batch jobs.
There are 3 Compute environments (CE_A, CE_B, CE_C) and they have associated one Job queue each (JQ_A, JQ_B, JQ_C).
There are 6 Job definitions (JD_1, JD_2, ..., JD_6).
Let <jqce>-<jd>-<name> be a Job launched on job queue (or compute environment) <jqce> and with job definition <jd>. Example: A-1-a, C-6-z.
I want to execute sequentially about 20 jobs (launched with different environment variables): A-1-a, A-1-b, B-2-c, A-3-d, A-3-e, A-3-f, ...
For each job I specify the dependency on previous job with:
params.dependsOn = [{ "jobId": "xxxxx-xxxx-xxxx-xxxxxx"}] in Batch.submitJob(params).
The first two jobs A-1-a and A-1-b execute successfully after waiting few minutes for ressource allocation.
The third job, B-2-c also executes successfully, after a some minutes of waiting for the Compute environment CE_B to be up.
Meanwhile, the compute environment CE_A is turned off since no job has presented.
HERE IS THE PROBLEM:
I expect at this point that CE_B goes down and CE_A goes up. CE_A is not going up.
The A-3-d is never executed, 16 hours later it is still in PENDING status.
The dependsOn is ok, its dependency ended long time ago.
Without dependsOn the Batch runs ok, with the same environment variables and config.
QUESTIONS
Did you face similar problems with AWS Batch and dependsOn?
Is it possible to chain batches from different Job Queues?
Is it possible to chain batches from different Compute Environments?
Does the params.dependsOn = [{ "jobId": "xxx-xxx-xxx-xxx" }] seem ok to you? It seems I do not have to set the type attribute see array jobs;

Does the params.dependsOn = [{ "jobId": "xxx-xxx-xxx-xxx" }] seem ok to you? It seems I do not have to set the type attribute see array jobs;
Yes, type is only required when it's defined as an Array job. And the JobID you're providing is what was returned when you submitted the specific job?
Is it possible to chain batches from different Job Queues?
Is it possible to chain batches from different Compute Environments?
You should be able to do it but I've never done that.
Meanwhile, the compute environment CE_A is turned off since no job has presented.
So CE_A was running already and ran A-1-a, A-1-b already?
As I recall AWS checks every 10 minutes for certain statuses and people have run into cases where the system seems stuck.
You could set CE_A to always have a minimum of 1 CPU so it doesn't disappear or become difficult to get a version of.
Can you simply for testing purposes? Shorter actions, reducing Queues, etc
Consider checking the AWS forum on Batch. Not much activity there but worth an additional set of eyes.

Related

More Precise and Higher Frequency Scheduling in Google Cloud functions / Cloud Scheduler

I would like a cloud function to be called precisely every 15 seconds exactly when the server clock is at 15 sec intervals i.e. at XX:XX:00 XX:XX:15 XX:XX:30 and XX:XX:45. I tried using cloud scheduler but I am having the following issues:
Using unix-cron the highest frequency is every minute. In addition using * * * * * does not seem guarantee that an event will happen at XX:XX:00, but rather every minute from when I start the the task.
My cloud function can still be called by a post call which could really mess up my clock. The temporary solution I have for this is putting in a really long hash like SH256 into the header and requiring it to run the cloud function. This can slow things down a lot, originally I wanted to use RSA or ECC but that would make things even slower I think, so for now I just have a string compare against a long SH256 string. I hope there's a better / more efficient way around this.
The aforementioned issue with scheduling at XX:XX:00 XX:XX:15 XX:XX:30 and XX:XX:45 is also a real pain. The best solution I found is to have the function called once and then sleep and try to sync up to those times. This is supper janky / hacky at best.
Perhaps there is another service within the Google Cloud platform that is more suited to this specific use case? Or another approach to this?
When I need an higher frequency than the minute I use Cloud Task by doing this:
Create a Cloud Scheduler which trigger each minute my function 1. I set a retry policy on this scheduler
The function 1 create, for the minute after, the 4 next tasks to run (for example, the function is trigger at 00:00:00, it creates the task for 00:01:00, 00:01:15, 00:01:30, 00:01:45). Cloud task can create delayed tasks
The task trigger the function 2
Function 2 performs the business logic

Dataflow job stuck and not reading messages from PubSub

I have a dataflow job which reads JSON from 3 PubSub topics, flattening them in one, apply some transformations and save to BigQuery.
I'm using a GlobalWindow with following configuration.
.apply(Window.<PubsubMessage>into(new GlobalWindows()).triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterFirst.of(AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations))))
.discardingFiredPanes());
The job is running with following configuration
Max Workers : 20
Disk Size: 10GB
Machine Type : n1-standard-4
Autoscaling Algo: Throughput Based
The problem I'm facing is that after processing few messages (approx ~80k) the job stops reading messages from PubSub. There is a backlog of close to 10 Million messages in one of those topics and yet the Dataflow Job is not reading the messages or autoscaling.
I also checked the CPU usage of each worker and that is also hovering in single digit after initial burst.
I've tried changing machine type and max worker configuration but nothing seems to work.
How should I approach this problem ?
I suspect the windowing function is the culprit. GlobalWindow isn't suited to streaming jobs (which I assume this job is, due to the use of PubSub), because it won't fire the window until all elements are present, which never happens in a streaming context.
In your situation, it looks like the window will fire early once, when it hits either that element count or duration, but after that the window will get stuck waiting for all the elements to finally arrive. A quick fix to check if this is the case is to wrap the early firings in a Repeatedly.forever trigger, like so:
withEarlyFirings(
Repeatedly.forever(
AfterFirst.of(
AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations)))))
This should allow the early firing to fire repeatedly, preventing the window from getting stuck.
However for a more permanent solution I recommend moving away from using GlobalWindow in streaming pipelines. Using fixed-time windows with early firings based on element count would give you the same behavior, but without risk of getting stuck.

Azure Batch Job sequential execution not working

We are using azure web job for batch processing, the job will trigger when there is a message in the storage queue.
We have configured the job to execute the messages one by one.
JobHostConfiguration config = new JobHostConfiguration();
config.Queues.BatchSize = 1;
config.Queues.MaxDequeueCount = 1;
even though the job is taking multiple messages from the storage queue and executing parallelly.
Please help.
taking multiple messages from the storage queue and executing
parallelly
How did you judge take multiple messages and executing in parallel? Did you have multiple instances?
I test the code in different situations.
1)The normal situation ,not set the batchsize, it will drag all messages in the queue.However i think it still run one by one.But from the result i think it won't wait last running completely over.Here is result.
2)Set the batchsize to 1, if you debug the code or refresh the queue frequently, you will find it did drag one message one time run. And here is result.
3) Set the batchsize to three and debug , it just change the message number dragged, each time it will drag 3 messages, then it will run like normal without setting batchsize.Here is the result.And i found if you just run not debug , the order console showing is very orgnized.
So if you don't have other instance running, i think this is working in sequential mode.
If this doesn't match your requirements or you still have questions, please let me know.

Akka Cluster manual join

I'm trying to find a workaround to the following limitation: When starting an Akka Cluster from scratch, one has to make sure that the first seed node is started. It's a problem to me, because if I have an emergency to restart all my system from scratch, who knows if the one machine everything relies on will be up and running properly? And I might not have the luxury to take time changing the system configuration. Hence my attempt to create the cluster manually, without relying on a static seed node list.
Now it's easy for me to have all Akka systems registering themselves somewhere (e.g. a network filesystem, by touching a file periodically). Therefore when starting up a new system could
Look up the list of all systems that are supposedly alive (i.e. who touched the file system recently).
a. If there is none, then the new system joins itself, i.e. starts the cluster alone. b. Otherwise it tries to join the cluster with Cluster(system).joinSeedNodes using all the other supposedly alive systems as seeds.
If 2. b. doesn't succeed in reasonable time, the new system tries again, starting from 1. (looking up again the list of supposedly alive systems, as it might have changed in the meantime; in particular all other systems might have died and we'd ultimately fall into 2. a.).
I'm unsure how to implement 3.: How do I know whether joining has succeeded or failed? (Need to subscribe to cluster events?) And is it possible in case of failure to call Cluster(system).joinSeedNodes again? The official documentation is not very explicit on this point and I'm not 100% how to interpret the following in my case (can I do several attempts, using different seeds?):
An actor system can only join a cluster once. Additional attempts will
be ignored. When it has successfully joined it must be restarted to be
able to join another cluster or to join the same cluster again.
Finally, let me precise that I'm building a small cluster (it's just 10 systems for the moment and it won't grow very big) and it has to be restarted from scratch now and then (I cannot assume the cluster will be alive forever).
Thx
I'm answering my own question to let people know how I sorted out my issues in the end. Michal Borowiecki's answer mentioned the ConstructR project and I built my answer on their code.
How do I know whether joining has succeeded or failed? After issuing Cluster(system).joinSeedNodes I subscribe to cluster events and start a timeout:
private case object JoinTimeout
...
Cluster(context.system).subscribe(self, InitialStateAsEvents, classOf[MemberUp], classOf[MemberLeft])
system.scheduler.scheduleOnce(15.seconds, self, JoinTimeout)
The receive is:
val address = Cluster(system).selfAddress
...
case MemberUp(member) if member.address == address =>
// Hooray, I joined the cluster!
case JoinTimeout =>
// Oops, couldn't join
system.terminate()
Is it possible in case of failure to call Cluster(system).joinSeedNodes again? Maybe, maybe not. But actually I simply terminate the actor system if joining didn't succeed and restart it for another try (so it's a "let it crash" pattern at the actor system level).
You don't need seed-nodes. You need seed nodes if you want the cluster to auto-start up.
You can start your individual application and then have them "manually" join the cluster at any point in time. For example, if you have http enabled, you can use the akka-management library (or implement a subset of it yourself, they are all basic cluster library functions just nicely wrapped).
I strongly discourage the touch approach. How do you sync on the touch reading / writing between nodes? What if someone reads a transient state (while someone else is writing it) ?
I'd say either go full auto (with multiple seed-nodes), or go full "manual" and have another system be in charge of managing the clusterization of your nodes. By that I mean you start them up individually, and they join the cluster only when ordered to do so by the external supervisor (also very helpful to manage split-brains).
We've started using Constructr extension instead of the static list of seed-nodes:
https://github.com/hseeberger/constructr
This doesn't have the limitation of a statically-configured 1st seed-node having to be up after a full cluster restart.
Instead, it relies on a highly-available lookup service. Constructr supports etcd natively and there are extensions for (at least) zookeeper and consul available. Since we already have a zookeeper cluster for kafka, we went for zookeeper:
https://github.com/typesafehub/constructr-zookeeper

How to setup ZERO-MQ architecture to deal with workers of different speed

[as a small context provider: I am new to networking and ZERO-MQ, but I did spend quite a bit of time on the guide and examples]
I have the following challenge (done in C++, but irrelevant to the question). I have a single source that generates tasks. I have multiple engines that need to process those tasks, and send back the result.
First attempt:
I created a client with a ZMQ_PUSH socket. The engines have a ZMQ_PULL socket. To get the answers back to the client, I created the reverse: a ZMQ_PUSH on the workers and a ZMQ_PULL on the client. It worked out of the box. Only to find out that after some time the client ran out of memory since I was pushing way more requests than the workers could process. I need some backpressure.
Second attempt:
I added a counter on the client that took care of only pushing when no more than say 1000 tasks were 'in progress'. The out of memory issue was solved, since I was never having more than 1000 'in progress' tasks. But ... some workers were slower than others. Since PUSH/PULL uses fair queueing, the amount of work for that slow worker kept increasing and increasing...until the slowest worker had all 1000 requests queued and the others were starved. I was not using my workers effectively.
Now, what architecture could I use that solves the issue of 'workers with different speed'? Is the 'count the number of in progress tasks' approach a good way of balancing the number of pushed requests? Or is there a way I can PUSH tasks to the workers, and the pushing blocks on a predefined point? Can I do that with HWM?
I am sure this problem is of such a generic nature that I should be able to easily deal with this. Can anyone point me in the right direction?
Thanks!
we used the Paranoid Pirate Protocol http://rfc.zeromq.org/spec:6,
but in case of many very small jobs, where the overhead of communication might be high, a credit-based flow control pattern might be more efficient. http://unprotocols.org/blog:15
in both cases it is necessary for the requester to directly assign jobs to individual workers. this is abstracted away of course and, depending on the use-case, could be made available as a sync call, which returns when all tasks have been processed.