I am working on a project which has a table "Games" with columns ID, Name, Start-time, End-time. Whenever an item is added to the table, and the end-time is reached, I want to trigger an event (SNS notification, but I can work my way around with anything else). The current implementation polls the table to check what "games" have expired and generates the event. Is there a technique to avoid polling, like SQSs, SWFs which I can use. The problem with the current approach is the polling the DB is costing me too much money since the complete infra is on cloud.
AWS DynamoDB is a perfect suit for this. You can set the record to Auto-Expire with a TTL attribute and the record will be removed.
If you want to extend the time of auto expiry, you can update the ttl attribute of the record and it will get extended.
Time to Live on DynamoDB:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html
You can configure a lambda to capture the deleted record and do whatever you want to do from there.
DynamoDB Streams:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
It is all event-based and zero polling.
Hope it helps.
Related
I want to create a DynamoDB WebAPI. It allows the creation and reading of Posts. Now I would like to implement a click counter that updates the popularity of a post each time a user requests it. For this reason, every time a GET request for a posts comes in, I would change the Post object itself.
But I know that DynamoDB is optimized for reads, not for writes. So updating the object that is being fetched everytime would probably be a problem.
So how can I measure the popularity of posts without slowing down the API itself? I was thinking of generating a random number for every fetch and only updating it if it is below 0.05 or something similar.
But is there a better solution for this?
Dynamo DB isn't "optimized for reads" it's optimized to provide "consistent, single-digit millisecond response times at any scale."
To optimize DDB for reads, you'd want to stick a Amazon DynamoDB Accelerator (DAX) instance in front of it for "faster access with microsecond latency".
In actuality, the DDB read/write performance isn't going to be an issue. In your case the network latency between your app and DDB will be orders of magnitude higher. By making two calls synchronously one after the other you'd be doubling your response time; regardless of what cloud DB you're writing too.
Assuming the data and counter are in the same record, the simple DDB solution in this case would be to not make a call to GetItem() and one to UpdateItem(). Instead, simply call UpdateItem() with an UpdateExpression that uses the ADD expression to add 1 to your counter and the ReturnValues attribute to return either ALL_OLD or ALL_NEW.
Other more complex solutions
assuming you've already got the data for display, do an async call to UpdateItem().
At scale, you might consider disconnecting the counter update from your app. Your app post a SQS message, that's processed by a lambda which could use batch updates to DDB.
I have this flow in which we have to persist in DynamoDb some items for a specific time. After the items has expired, we have to call some other services, to notify them that data got expired.
I was thinking about two solutions:
1) Move expiry check to Java logic:
Retrieve DynamoDb data in batches, verify the expiry items in Java, and after that delete the data in batches, and notify other services.
There are some limitations:
BatchGetItem let you retrieve max 100 items.
BatchWriteItem let you delete max 25 items
2) Move expiry check to the db logic:
Query the DynamoDb, in order to check which items has expired(and delete them), and return the id's to the client, in order for us to notify other services.
Again, there are some limitations:
The result set from a Query is limited to 1 MB per call.
For both solutions, there will be a job, that will be run periodically, or we're going to use some aws lambda that will be triggered periodically and will call an endpoint from our app that is going to delete the item from db and notify other services.
My question is if DynamoDb is proper for my case, or should I use some relational db that doesn't have these kind of limitations like Mysql? What do you think ? Thanks!
Have you considered using the DynamoDB TTL feature? This allows you to create a time-based column in your table that DynamoDB will use to automatically delete the items based on the time value.
This requires no implementation on your part and no polling, querying, or batching limitations. You will need to populate a TTL column but you may already have that information present if you are rolling your own expiration logic.
If other services need to be notified when a TTL event occurs, you can create a Lambda that processes a DynamoDB stream and take action when a TTL delete event occurs.
Basically, I have a web service that receives a small json payload (an event) a few times per minute, say 60. This event must be sent to an SQS queue only after 1 year has elapsed (it's ok to have it happen a few hours sooner or later, but the day of month should be exactly the same).
This means I'll have to store more than 31 million events somewhere before the first one should be sent to the SQS queue.
I thought about using SQS message timers, but they have a limit of only 15 minutes, and as pointed out by #Charlie Fish, it's weird to have an element lurking around on a queue for such a long time.
A better possibility could be to schedule a lambda function using a Cron expression for each event (I could end up with millions or billions of scheduled lambda functions in a year, if I don't hit an AWS limit well before that).
Or I could store these events on DynamoDB or RDS.
What would be the recommended / most cost-effective way to handle this using AWS services? Scheduled lambda functions? DynamoDB? PostgreSQL on RDS? Or something entirely different?
And what if I have 31 billion events per year instead of 31 million?
I cannot afford to loose ANY of those events.
DynamoDB is a reasonable option, as is RDS - SQS for long term storage is not a good choice. However - if you want to keep your costs down, I may suggest another: accumulate the events for a single 24 hour period (or a smaller interval if that is desirable), and write that set of data out as an S3 object instead of keeping it in DynamoDB. You could employ dynamodb or rds (or just about anything else) as a place to accumulate events for the day (or hour) before it then writes out that data to S3 as a single set of data for the interval.
Each S3 object could be named appropriately, either indicating the date/time it was created, or the data/time it needs to be used, i.e. 20190317-1400 to indicate that on March 17th, 2019 at 2PM this file needs to be used.
I would imagine a lambda function, called by a cloudwatch event that is triggered every 60 minutes, scans your s3 bucket looking for files that are due to be used, and it then reads in the json data and puts them into an SQS queue for further processing and moves the processed s3 object to another 'already processed' bucket
Your storage costs would be minimal (especially if you batch them up by day or hour), S3 has 11 9's of durability, and you can archive older events off to Glacier if you want to keep them around even after the are processed.
DynamoDB is a great product, it provides redundant storage, and super high performance - but I see nothing in your requirements to that would warrant incurring that cost or requiring the performance of DynamoDB; and why keep millions of records of data in a 'always on' database when you know in advance that you don't need to use or see the records until a year from now.
I mean you could store some form of data in DynamoDB, and run some daily Lambda task to query for all the items that are greater than a year old, remove those from DynamoDB and import it into SQS.
As you mentioned SQS doesn't have this functionality built in. So you need to store the data using some other technology. DynamoDB seems like a responsible choice based on what you have mentioned above.
Of course you also have to think about if doing a cron task once per day is sufficient for your task. Do you need it to be exactly after 1 year? Is it acceptable to have it be one year and a few days? Or one year and a few weeks? What is the window that is acceptable for importing into SQS?
Finally, the other question you have to think about is if SQS is even reasonable for your application. Having a queue that has a 1 year delay seems kinda strange. I could be wrong, but you might want to consider something besides SQS because SQS is meant for much more instantaneous tasks. See the examples on this page (Decouple live user requests from intensive background work: let users upload media while resizing or encoding it, Allocate tasks to multiple worker nodes: process a high number of credit card validation requests, etc.). None of those examples are really meant for a year of wait time before executing. At the end of the day it depends on your use case, but off the top of my head I can't think of a situation that makes sense for delaying entry into an SQS queue for a year. There seem to be much better ways to handle this, but again I don't know your specific use case.
EDIT another question is if your data is consistent? Is the amount of data you need to store consistent? How about the format? What about the number of events per second? You mention that you don’t want to lose any data. For sure build in error handling and backup systems. But for DynamoDB it doesn’t scale the best if one moment you store 5 items then the next moment you want to store 5 million items. If you set your capacity to account for 5 million then it is fine. But the question is will the amount of data and frequency be consistent or not?
I have to set up a management on an AWS process.
To keep things simple I have some clients that sends me heartbeat, let's say every 5 minutes, via SOAP requests to my SOAP server deployed on an Elastic Beanstalk NodeJS app. Every time I receive a heartbeat, I store the last time I received it on a DynamoDB table by updating a field on the table.
I now need to create a process that, if I haven't received an heartbeat in the last 30 minutes, does stuff (updates another tables, calls Lambda functions, etc). I don't know now how many clients I will have, but they will be pontentially growing with time, and connected to my server 24/7.
I was hoping on something like an event that triggers a Lambda function (or posts a message on a SNS topic) after those 30 minutes that that specific row in the table is not updated, but I don't know how to get this last part to work. This event should be checking every row in the document.
How would you do it?
Thank you!
You can use DynamoDB with TTL, DynamoDB Streams and AWS Lambda for this. No need for cron.
When you create a new row or update an existing row, set that row's TTL to 30 minutes in the future.
When that 30 minutes is reached, it will fire up a DynamoDB stream which you can use as a trigger for a Lambda function.
This Lambda function can then do the custom processing that you want to do (i.e. updates another tables, calls Lambda functions, etc).
Take note that the original DynamoDB row will be deleted when its TTL expires. If you need to keep that record, you can let the Lambda function recreate it and set a new TTL to another 30 minutes in the future.
References:
DynamoDB Streams and Time To Live
DynamoDB Streams and AWS Lambda Triggers
do you need to check in 30 minutes interval across all clients: this can be done by a cron job on a server that executes an SQL statement where Date()- Timestamp > 30 min ( just stating the logic ), in the DynamoDB format just changes the syntax.
or you need to check after 30 minutes from the last time a particular client sent you.
in this case your cron would be run every minute with same logic as above
if you need to use lambda in a schedule refer to this link Using Lambda in schedule
Hope this helps
if you need further help in the syntax i m ready to help
I have the following use case scenario for which I am considering aws services to see how I can come up with a scalable solution. Thanks in advance for your help!
Scenario:
Users can sign up to an application(which is named say 'Let's Remind' or something else) using their email and phone.
The app does one thing that is to send email and sms alerts to user.
User can create n number of tasks for which he wants to be reminded. For instance he can set up a
monthly reminder for paying card dues. Currently the value of n is from 5 to 10 per user.
The notifications are flexible meaning it can be daily, weekly, monthly, bi-weekly. User can also
specify the start date of a notification. The end date is the date when the event is due (for instance
the day the card payment is due). Once this date is expired the notification is rendered inactive for
the current month.
For weekly,daily,bi-weekly notifications, the notifications are deleted once the event date is passed.
These are not recurring in nature.
For monthly recurring events such as payment of apartment rent etc, notification itself is not
deleted but rendered inactive after the event due date. Once the next event cycle (typically next
month billing cycle for payments use case) starts, the notification comes back to life and starts all
over again.
Use can delete any event anytime he wants. If an event is deleted, the notifications for that event
will be deleted as well.
First of all, I hope the use case is clear. Now here's my thoughts so far about solving this use case -
1) Use SNS since I need to send email and sms both. SES only supports emails.
2) When a user registers for the app, create 2 subscriptions(one for his email and one for his sms endpoint) and also create a topic for the user(maybe a dynamically generated random userid)
3) Once user creates an event (e.g. reminder for monthly apartment rental), store the event data such as userid, startdate, duedate, frequency, isactive in a dynamodb table.
4) Create a lambda function that will wake up when an entry is written to the dynamodb table (step 3); it will do the following -
i) it will read the event data from the dynamodb table
ii) determine the next date of the notification to be sent based on the current date and event
data
iii) For active events (check isActive column of the dynamodb record) create a scheduled cron
expression rule based on ii above in cloudwatch events and add the
target as the user's topic (created in step 2 above). For now, the notification message is
static.
I have some doubts/queries about step iii -
Is it possible to create cloudwatch event cron rule dynamically and add the user's topic as target dynamically as I described? Or is it better to trigger a second lambda function dedicated for sending messages to the user's topic using SNS notification? Which approach will be better for this use case?
If user base grows large, is it recommended to create one topic per user?
Am I on the right track with my approach above in general from a scalability point of view?
If not, can anyone suggest any better idea for implementing this use case?
Thanks in advance!
This will not work.
For SNS email subscriber to receive email notification sent via SNS it has to first confirm the subscription. You cannot just create subscriptions on the fly and send them email notification
I don't think SNS fits your use case. You would be better off sending email notifications using SES.
You can write your scheduling logic in Lambda though