How does Airflow's built-in Time Delta Sensor behave as part of a manually triggered DAG run?
https://airflow.readthedocs.io/en/stable/_modules/airflow/sensors/time_delta_sensor.html
"Waits for a timedelta after the task's execution_date + schedule_interval"
If I trigger a DAG containing this sensor manually, the Time Delta Sensor seems to sit there indefinitely, never completing. It does complete as expected from a scheduled DAG run.
How is this sensor supposed to behave in a manual run?
Is this behaviour due to the "execution_date + schedule_interval" mentioned in the docs there? If so, what does that mean exactly?
Related
I want to know if a lambda execution continues to be performed even if the state of the step function correlated to it times out. If it happens, how can i stop it?
There is no way to kill a running lambda. However, you can set concurrency limit to 0 to stop it from starting further executions
Standard StepFunctions have a max timeout of 1 year. (yes! One year)
As such any individual task also has a max timeout of 1 year.
(Express StepFunctions have a timeout of 30 seconds mind you)
Lambda's have a max time out of 15 mins.
If you need your lambda to complete in a certain amount of time, you are best served by setting your lambda timeout to that - not your state machine. (i see in your comments you say you cannot pass a value for this? If you cannot change it then you have no choice but to let it run its course)
Consider StepFunctions and state machines to be orchestrators, but they have very little control over the individual components. They tell who to act and when but otherwise are stuck waiting on those components to reply before continuing.
If your lambda times out, it will cause your StateMachine to fail that task as as it receives a lambda service error. You can then handle that in the StepFunction without failing the entire process, see:
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html
You could specifically use: TimeoutSecondsPath in your definition to set a specific result if the task timesout.
But as stated, no, once a lambda begins execution it will continue until it finishes or it times out at 15 mins / its set timeout.
Since my project has so many moving parts.. probably best to explain the symptom
I have 1 scheduler running on 1 queue. I add scheduled jobs ( to be executed within seconds of the scheduling).
I keep repeating scheduling of jobs with NO rq worker doing anything (in fact, the process is completely off). In another words, the queue should just be piling up.
But ALL of a sudden.. the queue gets chopped off (randomly) and first 70-80% of jobs just disappear.
Does this have anything to do with:
the "max length" of queue? (but i dont recall seeing any limits)
does the scheduler automatically "discard" jobs where the start time
is BEFORE the current time?
ran my own experiment. RQ scheduler does indeed remove jobs whose start date < now.
I have a dataflow job which reads JSON from 3 PubSub topics, flattening them in one, apply some transformations and save to BigQuery.
I'm using a GlobalWindow with following configuration.
.apply(Window.<PubsubMessage>into(new GlobalWindows()).triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterFirst.of(AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations))))
.discardingFiredPanes());
The job is running with following configuration
Max Workers : 20
Disk Size: 10GB
Machine Type : n1-standard-4
Autoscaling Algo: Throughput Based
The problem I'm facing is that after processing few messages (approx ~80k) the job stops reading messages from PubSub. There is a backlog of close to 10 Million messages in one of those topics and yet the Dataflow Job is not reading the messages or autoscaling.
I also checked the CPU usage of each worker and that is also hovering in single digit after initial burst.
I've tried changing machine type and max worker configuration but nothing seems to work.
How should I approach this problem ?
I suspect the windowing function is the culprit. GlobalWindow isn't suited to streaming jobs (which I assume this job is, due to the use of PubSub), because it won't fire the window until all elements are present, which never happens in a streaming context.
In your situation, it looks like the window will fire early once, when it hits either that element count or duration, but after that the window will get stuck waiting for all the elements to finally arrive. A quick fix to check if this is the case is to wrap the early firings in a Repeatedly.forever trigger, like so:
withEarlyFirings(
Repeatedly.forever(
AfterFirst.of(
AfterPane.elementCountAtLeast(20000),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(durations)))))
This should allow the early firing to fire repeatedly, preventing the window from getting stuck.
However for a more permanent solution I recommend moving away from using GlobalWindow in streaming pipelines. Using fixed-time windows with early firings based on element count would give you the same behavior, but without risk of getting stuck.
We are using azure web job for batch processing, the job will trigger when there is a message in the storage queue.
We have configured the job to execute the messages one by one.
JobHostConfiguration config = new JobHostConfiguration();
config.Queues.BatchSize = 1;
config.Queues.MaxDequeueCount = 1;
even though the job is taking multiple messages from the storage queue and executing parallelly.
Please help.
taking multiple messages from the storage queue and executing
parallelly
How did you judge take multiple messages and executing in parallel? Did you have multiple instances?
I test the code in different situations.
1)The normal situation ,not set the batchsize, it will drag all messages in the queue.However i think it still run one by one.But from the result i think it won't wait last running completely over.Here is result.
2)Set the batchsize to 1, if you debug the code or refresh the queue frequently, you will find it did drag one message one time run. And here is result.
3) Set the batchsize to three and debug , it just change the message number dragged, each time it will drag 3 messages, then it will run like normal without setting batchsize.Here is the result.And i found if you just run not debug , the order console showing is very orgnized.
So if you don't have other instance running, i think this is working in sequential mode.
If this doesn't match your requirements or you still have questions, please let me know.
What happens if a function gets invoked by a TimerTigger every 5 minutes and for some reasons the code takes more than 5 minutes to complete?
Does this result in my function running twice at the same time?
Or does the interval start when the triggered code execution is completed?
I could not find an answer myself in the docs.
I have to ensure that my function is running always as singleton.
Thanks,
Alex
If your function execution takes longer than the timer interval, another execution won't be triggered until after the current invocation completes. The next execution is scheduled after the execution completes. You can see this in the code here. You can prove this to yourself by trying a simple local example - create a function that runs every 5 seconds, and put a sleep in there for a minute. You won't see another function start until the first finishes.
As far as running singleton, the above shows that only a single function invocation runs at a given time on the same instance (VM). The SDK further ensures that no other functions are running across scaled out instances. You can read more about that here. To see this in action, you can simulate by starting two instances of your console app locally - one will run the schedule the other will not. However, if you kill the one running the schedule, the other one will pick it up after a short time (within a minute).