Delay job in AWS - amazon-web-services

I have messages inside amazon SQS. for some of the messages I need to perform a delay of six hours before I can start working on them (the delay is a giving).
one solution would be to do Thread.Sleep(6h).
I don't like this solution because I'm afraid something will happen to the thread and I'll lose the data. another solution will be to read the message see if 6 hours have passed, and if not return the message to the queue. again I don't like it because the procedure will happen a lot.
Is there any better solution ??

Can you create individual Multiple Queues and put the queue items separately.
Example 1:
You can have 6 queues like Queue0, Queue1, Queue2, Queue3, Queue4, Queue5 and use a hash-function like hash(x) = current-hour % 6 - this function will return values from 0 to 5 and you can put the items in Queue_f(x) and read the queues individually based on current time.
Example 2:
If the current time is 01:00 Hours you can create separate queues like Queue0700Hours, if the current time is 02:00 hours you can create a another new queue as Queue0800Hours like wise and go.
This way you are decoupling the need to wait / stop a processing and pick up the producers and consumers independently based on the current timestamp.

Related

Create a parallel step function with a lambda

I have a question on the step function part of AWS
I have a function to watch and update datas in databases. But because we can have only 1000 as we can have 1 000 000 items to update, I would like to manage it by 10 000 or 100 000 with a lambda.
But the optimal solution should be to manage them in parallel to update every datas at the same time and finish them together
So for that I would like to create a Lambda function with aws-sdk which should create a parallel step function with X tasks and every tasks will manage 10 000 or 100 000 items of the database
But when I read the aws-sdk documentation, it looks like there is no way to create a parallel step function, even from a template
So my question is, is it possible to create a parallel step function from a Lambda function with aws-sdk ? Or do you have a better solution to my problem ?
Thanks in advance
Update : To give you more informations, my problem is I'll have to update a insert an unknown of datas in my DB each first day of month, and the problem is that I need to call an API that takes 15 seconds to return the data (it's not our API so I cannot try to upgrade return time).
If I just use a Lambda function, it will be in timeout after 15 minutes.
Suddenly, I thought of using Step function to execute the Lambda function for each data, but the problem is, if we have a lot of datas, it will maybe take more than 24 hours and I would like to find a solution where I can execute my Lambda function in parallel to optimize the time, so i thought about parallel task of step function.
But because the number of datas will change every month, I don't know how to dynamically increase or decrease branch number of my step function, and that's why I thought of generate my step function from another Lambda
I have a function to watch and update data in databases.
I suppose what you need to watch is some kind of user/data events? what to watch? what to update?
Can you provide more info before I can give you some architectural suggestions?
By the way, it is Step Functions to orchestrate/invoke Lambda functions, not the other around.
updated answer:
so you seem to face the 15 mins hard limit for Lambda max execution time. there are 3 approaches I can see:
instead of using a Lambda function, use an ECS container or EC2 instance to handle the large volume of data processing and database writing. however, this requires substantial code re-rewrite and infrastructure/architectural change.
figure out a way to break down the input data so you can fan out the handling to multiple Lambda function instances, i.e.: input data -> Lambda to break down task -> SQS messages -> Lambda to handle each task. but my concern is that the task to break down input data may also need substantial time.
before Lambda execution timeout, mark the current processed position, invoke the same Lambda function with the original event + position offset. the next Lambda instance would pick up the data processing from where the previous execution stopped. https://medium.com/swlh/processing-large-s3-files-with-aws-lambda-2c5840ae5c91

What is the meaning of the error message "scheduler queue is full" in supercollider?

I am using a class that contains a function involving TempoClock.default.sched [I'm preparing an MWE]. If I make a new instance of the class and apply the function, I obtain following error message:
scheduler queue is full.
This message is repeated all the time. What does it mean?
Every clock has a queue to store scheduled events. The size of the queue is very large - but still limited (I think ~4096 items?). The "scheduler cue is full" error happens when this queue is full - this can either happen when you legitimately have more than 4096 events scheduled on a given clock. But, a common bug case is accidentally queueing events far in the future, such that they hang out in the queue forever, eventually filling it up. It's easy to do this if you, e.g. call .sched(...), which takes a relative time value, but try to pass it an absolute time (which would schedule the event far far in the future).
If you need to actually schedule more than 4096 events at a given time - I believe the Scheduler class has a queue that can be arbitrarily large. AppClock uses this scheduler, so it shouldn't have a problem with large numbers of events. However - the timing of AppClock is less accurate than SystemClock, and isn't good for fine-grained music events. If you need highly accurate timing, you can use multiple TempoClocks and e.g. use different ones for each instruments, or each different kind of event etc.

How to prevent other workers from accessing a message which is being currently processed?

I am working on a project that will require multiple workers to access the same queue to get information about a file which they will manipulate. Files are ranging from size, from mere megabytes to hundreds of gigabytes. For this reason, a visibility timeout doesn't seem to make sense because I cannot be certain how long it will take. I have though of a couple of ways but if there is a better way, please let me know.
The message is deleted from the original queue and put into a
‘waiting’ queue. When the program finished processing the file, it
deletes it, otherwise the message is deleted from the queue and put
back into the original queue.
The message id is checked with a database. If the message id is
found, it is ignored. Otherwise the program starts processing the
message and inserts the message id into the database.
Thanks in advance!
Use the default-provided SQS timeout but take advantage of ChangeMessageVisibility.
You can specify the timeout in several ways:
When the queue is created (default timeout)
When the message is retrieved
By having the worker call back to SQS and extend the timeout
If you are worried that you do not know the appropriate processing time, use a default value that is good for most situations, but don't make it so big that things become unnecessarily delayed.
Then, modify your workers to make a ChangeMessageVisiblity call to SQS periodically to extend the timeout. If a worker dies, the message stops being extended and it will reappear on the queue to be processed by another worker.
See: MessageVisibility documentation

Siddhi CEP 4.x: Multiple results per group when using time batch window

Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.
Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?
I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.

Scheduling Task Based on Date time

I am working for private video network where I have to schedule the
task based on following parameter.There is client Portal, Server and Gateway.
Through portal a user can request Streaming the video.
User can also Schedule Streaming for some future time.Each each task is having a task ID.
Task is scheduled based on following date time parameter.
start time
end time
Repeat (every day,just once, a particular day)
start date
end date
Now at the gateway I need to add logic to Implement schedule task.
I am exploring Waitable Timer Objects and CreateWaitableTimerEe.
I am bit confused whether it is possible to implement the feature using this.
I am using C++, MFC and can't use third party library.
I need Suggestion how to implement this.
There are dozens of ways to design this. It all depends on what you want to do and what the specific requirements are.
In a basic design I'd create an additional field called "next run time" which will be calculated by using start time, frequency and previous (if any) end time. Then I'd dump all the tasks in a queue sorted using this field.
The main scheduling will pick up the first queue item and create a suspended thread for that specific task. Now just calculate the time difference to the first item's 'next run time' and sleep for that time period. When you wake up just resume the thread and pick the next queue item and repeat.
I would just create a timer thread callback loop that checks the time every minute and executes your task on the specified schedule.