Postman scheduled runs are periodically paused - postman

Postman scheduled runs are periodically paused.
When will it be paused? How can I avoid pausing?
I cannot find the condition of the pausing on documentation.
https://learning.postman.com/docs/running-collections/scheduling-collection-runs/
If any request fail in the scheduled run, will it be paused?
One of the responses is as follows, and I think Postman seems to see this as a failure.
Could not get response
(Error: read ECONNRESET)
I hope it continues to run without pausing regardless of whether it fails or not.
I added a request to send a notification to Slack in the scheduled run.
So if the notification doesn't come, I know it's been paused and I'm manually resuming it.

Related

AWS Beanstalk: Cannot retrieve logs in degraded state

After some time of serving my app has died and gone to "degraded" state. I have no idea what happened because no one was using it. Maybe it was hibernated and did not wake up?
Now I am trying to check the logs but I am not able to do it. Requesting logs takes ages and from time to time I get timeouts. When I click Request logs (100 lines or full logs ) I get this message
Elastic Beanstalk is updating your environment.
To cancel this operation select Abort Current Operation from the Actions dropdown.
this takes some time and finally nothing happens. Moreover I cannot abort this operation as is written because:
Error
Could not abort the current environment operation for MY_APP_NAME: Environment named MY_ENV_NAME is in an invalid state for this operation. Must be pending deployment.

Impact of AWS SWF connectivity on currently executing workflows?

I am curious to understand the loss of connectivity with AWS SWF on currently executing workflows. Could someone please help me understand.
I understand there would be timeout of deciders and workers. But not sure of the exact behavior.
Activity worker that waits on a poll will get an error and is expecting to keep retrying until connectivity is back. Activity worker that has completed a task is expected to keep retrying to complete the task until the task is expired.
Workflow worker that waits on a poll will get an error and is expecting to keep retrying until connectivity is back. Workflow worker that has completed a decision task can retry to complete it until it is expired. After it is expired the decision task is automatically rescheduled and is available for poll as soon as connectivity is back.
Scheduled activity that wasn't picked up for a specified schedule to start timeout is automatically failed. Its failure is recorded into workflow history and the new decision is scheduled.
Picked up activity that wasn't completed for a specified start to complete timeout is automatically failed. Its failure is recorded into workflow history and the new decision is scheduled.

Approach to crashed workers in amazon swf

We're currently implementing a workflow in Amazon SWF where we submit jobs/workflow executions from our web application. Everything was fairly quick and painless to get set up using the Ruby Flow framework. As long as the deciders/activity workers don't crash we seem to be able to handle most issues/exceptions gracefully.
My question is, what is common practice for the scenario where the decider process crashes midway through a workflow execution? If the task fails in that way, is it possible to push an SNS notification (I've seen no examples) or something to indicate to another process that there's been an unexpected failure/crash?
There are various types of "decider" failures.
Workflow worker crashes while processing a decision. The decision task is automatically rescheduled after specified timeout. Make sure that workflow type defaultTaskStartToCloseTimeout is not set too high. If this crash is not related to code correctness then rescheduled task is processed and workflow execution continues normally.
Workflow worker doesn't crash but workflow execution itself fails. In this case you can use ListClosedWorkflowExecutions to count such failed workflows.
Workflow worker doesn't crash but a decision task cannot complete as RespondDecisionTaskCompleted fails due to a bug in the Flow framework. As from SWF point of view task is never completed it at some point is marked as timed out and rescheduled. As bug is still present a new task is again never completes and rescheduled, and so on. The workflow execution that is experiencing such issue has a history with a tail that consists from repeated "decision task scheduled, decision task timed out" events. If your workflow has a known execution time limit then the best way to catch this issue is to set reasonable executionStartToCloseTimeout and look for timed out workflow executions. If the decision task timeout is set too low such workflows can also hit the limit on history size before the execution timeout.
All swf metrics are not published to cloud watch. So all completed and failed workflows will send the metrics to cloudwatch where you can create alarms to send you notifications when any workflow fails.

Break out of loop in AWS SWF activity

I'm running permanent loop in SWF Activity. Say like a web crawler crawling a website www.example1.com. However, I don't want to wait until it finishes crawling, but at certain time I want to terminate the activity and switch it to craw website www.example2.com instead.
I have tried to use 'try-cancel', 'terminate', workflow by workflow-id. It seems like it just sends signal to SWF to indicate that the task is finished in the AWS console, but the Activity process on worker is still running.
Any solution for this?
When activity is cancelled a heartbeat call returns flag that indicates that. So your activity loop should include heartbeating code to support cancellation. See "activity heartbeat" section from "error handling" page of AWS Flow Framework for Java
Developer Guide for an example.

How to kill /re-start a long running task

Is there a way to kill / re-start a long running task in AWS SWF? Sometimes some of our tasks run for a longer duration and we would like to manually kill a certain task (either via UI or programmatically) and re-start the task if possible. How to achieve this?
Console is option to manually kill workflow.
You can also set timeouts to whole workflow execution time or to individual activities. This can be set when you register your activity or when you start your activity (defaultTaskStartToCloseTimeoutSecond).
It's not clear what language you're using.
If you're using java, then you should look into Exponential Retry in Flow Framework. This make SDK restart your activity if it fails.
Long running activity is expected to heartbeat using RecordActivityTaskHeartbeat. It leads to timeout failure after short hearbeat interval instead of long task execution timeout if the activity process hangs or crashes.
The workflow code (decider) can always request activity cancellation through RequestCancelActivityTask decision. The cancellation request is returned as output of the RecordActivityTaskHeartbeat call. Activity implementation should cancel itself and report back to the service using RespondActivityTaskCanceled API call.
See Error Handling section of AWS Flow Framework Developer Guide for the AWS Flow Framework way of cancelling activities.
Sometimes activity implementation cannot support heartbeating and self cancellation. The solution is to execute another kill activity that terminates the first activity execution. For example under Unix such kill activity could emit "kill -9" command for the process that implements the first one.