AWS Lambda best practices for Real Time Tracking

AWS Lambda best practices for Real Time Tracking - amazon-web-services

We currently run an AWS Lambda function that primarily simply redirects the user to a different URL. The function is invoked via API-Gateway.
For tracking purposes, we would like to create a widget on our dashboard that provides real-time insights into how many redirects are performed each second. The creation of the widget itself is not the problem.
My main question currently is which AWS Services is best suited for telling our other services that an invocation took place. We plan to register the invocation in our database.
Some additional things:
low latency (< 5 seconds) in order to be real-time data
nearly no increased time wait for the user. We aim to redirect the user as fast as possible
Many thanks in advance!
Best Regards
Martin

I understand that your goal is to simply persist the information that an invocation happened somewhere with minimal impact on the response time of the Lambda.
For that purpose I'd probably use an SQS standard queue and just send a message to the queue that the invocation happened.
You can then have an asynchronous process (Lambda, Docker, EC2) process the messages from the queue and update your Dashboard.
Depending on the scalability requirements looking into Kinesis Data Analytics might also be worth it.
It's a fully managed streaming data solution and the analytics part allows you to do sliding window analyses using SQL on data in the Stream.
In that case you'd write the info that something happened to the stream, which also has a low latency.

Related

Triggering an AWS Lambda Function based on multiple (1k-10k) schedules

We have an AWS Lambda function which queries some data for a client from our DB and sends a report to the client. Some clients want daily reports, some might need weekly or monthly reports. The number of clients can go up ~1000 and each client might have ~10 such reports.
So we are looking for a way to trigger the Lambda function with different parameters based on schedules set by each client.
For Example:
Client A wants daily report of their data to be sent to abc#clienta.com and Client B wants a weekly report of their data to be sent to xyz#clientb.com. So the Lambda function will be invoked twice on Sunday 12 AM (for both clients) and once on Monday-Saturday 12 AM (for Client A).
We found the following solutions on AWS, but both have some limitations.
Approach 1: Use CloudWatch Events
We can create a CloudWatch Events Rule for each client and each report that could trigger our Lambda function on each schedule.
Pros:
Simple setup, easy to implement.
Cons:
There is a limitation of 100 Event Rules per AWS Account. It's mentioned that we can contact AWS to get it increased, but we are not sure if it can be increased to the number we are looking for (Currently it is ~10k, but we would prefer a solution in which there is no such limit). Also, a limit of 100 per account gives an indication that this is not a suitable solution for such a use case.
Approach 2: Using Step Functions
For each client and each report, we can create one AWS State Machine. We can use the Iterator pattern in Step Functions to wait for a day/week/month and then re-invoke the Lambda Function.
Pros:
No limitations on number of State Machines, so this enables us to scale easily.
Cons:
Step Functions have a limitation that they can run for a year, at maximum. This will be a problem in our case because the users will need to get the reports for a much longer period. There is a way to overcome this in Step Functions. Just before it's about to reach the 1-year limit, we can cancel the execution and start a fresh execution. So overall, this solution looks complex.
Can someone suggest a better solution for this on AWS?

Do you really need a CloudWatch for each client? Why not do something like the following architecture.
Have cloudwatch kick off a lambda that checks schedules for all clients each day (or whatever the most frequent report schedule you allow). You don't want this to take a long time so you just have this check a database (i.e. DynamoDB) of schedules and drop metadata about any reports that need to be generated onto an SQS queue (i.e. type of report, client information, destination email). Worst case, this execute and finds nothing to schedule but this should only takes seconds so the cost is very low to just run this everyday.
Then you have a lambda that actually does the report generator and email that consumes the queue. This report generator lambda will scale and spin up as many instances it needs to handle the messages on the queue. You can set the concurrency limit for the report generator lambda to ensure it doesn't spin up too many at a time if that is a concern once you are having 1000s of clients.
The definition and deployment of all these components can easily be automated via an AWS SAM.
Hope this alternate approach gives you a few more ideas.

You can combine both approach, to get the best result.
step 1: Use stepfunction to run your lambdas.
step 2: Trigger your stepfunction from cloudwatch, based on stepfunction event(SUCCESS,FAILED ETC).
In this way when step 1 fails or completes 1 year run. Cloudwatch event can trigger it back on, based on the json input you pass.

Concurrency Issues between Google Cloud Functions and Sheets v4 API

I have 2 problems related to managing concurrency between Google Cloud Functions.
The setup is I have a slackbot enabling use of a "checkoff" slash command. This slash command sends another Slack user yes/no buttons whether to authorize the checkoff. When the user clicks an option, it sends that response to a Google Cloud Function which 1) Sends a response back to Slack to close the buttons and 2) Records the checkoff if authorized in a Google Sheet using the Sheets v4 API (spreadsheets.values.append)
Issue #1: Users who spam the yes/no buttons trigger multiple Slack requests to the Cloud Function before the Function can acknowledge and close the buttons. This leads to multiple Cloud Functions spawning and multiple checkoffs being recorded in the sheet. If I could maintain state, I could save unique information from the request and check to make sure that request had not been already serviced. Is there a pattern to do this with Cloud Functions?
Issue #2: Sometimes multiple checkoffs are authorized at similar times by independent users. These requests spawn independent Cloud Function instances which attempt to append to the Sheet. There is a rare case where another Function writes in between the first Function's read then write causing an overwrite. I would use a read-write lock to deal with this but there's no way to share concurrency resources between Cloud Functions I'm aware of.
(Less important) Issue #3: I'd really love to batch the spreadsheet writes but it seems against the grain of serverless computing in the 1st place. Is there a way to do this?
Any help is appreciated.

I had a similar issues with Cloud Functions and Firestore. In my case I was receiving notifications about new and updated data in the form of 'order/123', I was then creating a copy of the order in Firestore, the problem was that sometimes multiple notifications arrived at the same time resulting in duplicated orders because of race conditions.
My solution to the problem was to use Google Cloud Tasks, https://console.cloud.google.com/cloudtasks, I have a cloud function that receives the notification, that adds a message to the queue to be processed with concurrency of 1, then other cloud function takes care of the processing.
Receive notification -> Post message to queue (concurrency 1) -> Process message
In this case I have 1 queue per customer, I am sure there a better ways but for now this is good enough. You can later on route customers to the same queues but always having the same customer on the same queue.

Display real time data on website that scales?

I am starting a project where I want to create a website which will display LIVE flight information and status. We all have seen this at airport. An example is given here - http://www.computronics.biz/productimages/prodairport4.jpg. As you can see this information changes continuously. The website will talk to a backend api and the this backend api will talk to database. Now the important part is that the flight information in the database will be updated by the airline itself. There could be several airlines and they will update their data respectively. I have drawn a diagram and uploaded here - https://imgur.com/a/ssw1S.
Now those airlines will obviously have an interface (website talking to some backend API) through which they will update the database.
Now here is my attempt to solve it. We need to have some sort of trigger such that if any airline updates a flight detail in the database between current time - 1 hour to current + 4 hours (website will only display few hours of flights), we need to call the web api and then send the update to the website in the real time. The user must not refresh the page at all. At the same time the website needs to scale well i.e. if 1 million users are on the website, and there is an update in the database in the correct time range, all 1 million user's website should get updated within a decent amount of time.
I did some research and it looks like we need to have an event based approach. For example - we need to create a function (AWS lambda or Azure function) that should be called whenever there is an update in the database (Dynamo DB for example) within the correct time range. This function then should call an API which should then update the website through web socket technology for example.
I am not looking for any code but just some alternative suggestions on how this can be solved in a scalable way. Also how do we test scalability?

Dont use serverless functions(Lambda/Azure functions)
Although I am a huge fan of serverless functions, and currently running a full web app in Lambda, I don't think its needed for your use case and doesn't make sense economically. As you've answered in the comments, each airline will not write directly to the database, they'll push to an API, meaning you are explicitly told when flights have changed. When an airline has sent you new data you can simply propagate this to all the browser endpoints via websockets. This keeps the design very simple. There is no need to artificially create a database event that then triggers a function that will then tell you a flight has been updated. Thats like removing your doorbell and replacing it with a motion detector that triggers a doorbell :)
Cost
Money always deserves its own section. Lambda is more of an economic break through than a technological one. You have to know when its cost effective. You pay per request so if your dealing with a process that handles 10,000 operations a month, or something that only fires 1,000 times a day, than lambda is dirt cheap and practically free. You also pay for the length of time the function is executing and the memory consumed while executing. Generally, it makes sense to use lambda functions where a dedicated server would be sitting idle for most of the time. So instead of a whole EC2 instance, AWS provides you with a container on demand. There are points at which high requests rates and constantly running processes makes lambda more expensive than EC2. This article discusses how generally its cheaper to use lambda up to a point -> https://www.trek10.com/blog/lambda-cost/ The same applies to Azure functions and googles equivalent. They are all just containers offered on demand.
If you're dealing with flight information I would imagine you will have thousands of flights being updated every minute so your lambda functions will be firing constantly as if you were running an EC2 instance. You will end up paying a lot more than EC2. When you have a service that needs to stay up 24/7 and run 24/7 with high activity that is most certainly a valid use case for a dedicated server or servers.
Proposed Solution
These are the components I would use below:
Message Queue of some sort (RabbitMQ or AWS SQS with SNS perhaps)
Web Socket Backend (The choice will depend on programming language)
Airline input API (REST,GraphQL, or maybe AWS Kinesis Data Firehose)
The airlines publish their data to a back-end api. The updates are stored on a message queue and the web applicaton that actually displays the results to users, via websockets, reads from the queue.
Scalability
For scalability you can run the websocket application on multiple EC2 instances (all reading from the same queuing service) in an autoscaling group, so with extra load more instances will be created automatically hence the name "autoscaling". And those instances can sit behind an elastic load balancer. Lots of AWS documentation on how to do this and its their flagship design pattern. If you use AWS SQS you don't have to manage the scalability details yourself, aws handles that. The only real components to scale are your websocket application and the flight data input endpoint. You can run the flight api in an autoscaling group as well but AWS does offer an additional tool for high traffic data processing. I detail that below.
Testing Scalability
It would be fairly easy to have a mock airline blast your service with thousands and thousands of fake updates and on the other end you can easily run multiple threads of selenium tests simulating browser clicks and validating that the UI is still operational.
Additional tools
If it ends up being large amounts of data, rather than using a conventional REST api for your flight update service you could consider a service AWS offers specifically for dealing with large amounts of real time updates (Kinessis Data Firehose) https://aws.amazon.com/kinesis/data-firehose/ But I've never used it.

First, please don't over think this. This is a trivial problem to solve and doesn't require any special techniques, technologies or trendy patterns & frameworks.
You actually have three functional areas you can address almost separately.
Ingestion - Collection and normalization of the data from the various sources. For this, you'll need a process and transformation engine, LogicApps or such.
Your databases. You'll quickly learn that not all flights are the same ;). While it might seem so, the amount of data isn't that much. Instances of MySQL/SQL Server tuned for a particular function will work just fine. Hint, you don't need to have data for every movement ready to present all the time.
Presentation. The data API and UIs. This, really, is the easy part. I would suggest you use basic polling at first. For reasons you will never have any control over, the SLA for flight data is ~5 minutes so a real-time client notification system is time you should spend elsewhere at first.

AWS Lambda as a social media post scheduler? Single function with unique data per scheduled event trigger?

I would like to use AWS Lambda as a social media post scheduler, but I can't find an elegant way to do so. In our app, users create social media posts and set a time. We then post them via the social network's API at the time specified.
I need to be able to schedule a Lambda to run once at a scheduled time and with unique data (being the user's token and the body of the post) in order to accomplish it. Here's an example:
John wants to post to Twitter next Thursday at 2pm. He's scheduled a
post with the body "Hello world!" for that time via our web app. The
app will talk to AWS Lambda via the API and set a Lambda function to
fire one time next Thursday at 2pm. That function would fire a request
to the Twitter API with John's token and the body ("Hello world!").
Would love to be able to do this serverless with Lambda, but I can't find a great way. If you could pair a Cloudwatch scheduled event trigger with a unique payload, that might work, but I don't see that it's possible. Otherwise, it seems this would require creating a new Lambda function for each post with the data hard-coded or having the Lambda hit the database to look for the scheduled post. Creating potentially hundreds of bespoke Lambda functions seems like a huge mess, and hitting the database at Lambda runtime seems like undue stress on the database since we have all the data we need in-hand at the time we schedule.
Any suggestions for how I might accomplish this with Lambda? Is there another AWS service that is better suited to the task? Should I give up on serverless and just set up another EC2 instance to handle the scheduler?

You definitely don't want to be creating a function + event per scheduled task.
The scalable way to do this would be to schedule a single function to run regularly (e.g. hourly) and check a database to see if any posts where scheduled for the last hour (i.e. since the last run), and perform them if so.
The reason I am suggesting a database is because you need to manage your state (that is, the post payload/details) somewhere, and relying on CloudWatch Events for this is not the right way, for all the reasons you've listed in your question.
An alternative to a database would be to put the payload in S3, and have the scheduled function check a specific location/bucket for the payloads that need processing. Lambda to S3 communication is very fast, and you don't need to worry about load or network transfers.

You could use AWS Step Functions for this task. With these you can model a state machine which waits for the exact timestamp to trigger.
https://aws.amazon.com/step-functions/
The only drawback of those is, that documentation is still pretty scarce, but if you log into the AWS console, they provide some samples how to implement those wait processes.

AWS CloudWatchLog limit

I am trying to find centralized solution to move my application logging from database (RDS).
I was thinking to use CloudWatchLog but noticed that there is a limit for PutLogEvents requests:
The maximum rate of a PutLogEvents request is 5 requests per second
per log stream.
Even if I will break my logs into many streams (based on EC2, log type - error,info,warning,debug) the limit of 5 req. per second is still very restrictive for an active application.
The other solution is to somehow accumulate logs and send PutLogEvents with log records batch, but it means then I am forced to use database to accumulate that records.
So the questions is:
May be I'm wrong and limit of 5 req. per second is not so restrictive?
Is there any other solution that I should consider, for example DynamoDB?

PutLogEvents is designed to put several events by definition (as per it name: PutLogEvent"S") :) Cloudwatch logs agent is doing this on its own and you don't have to worry about this.
However please note: I don't recommend you to generate to much logs (e.g don't run debug mode in prodution), as cloudwatch logs can become pretty expensive as your volume of log is growing.

My advice would be to use a Logstash solution on an AWS instance.
In alternative, you can run logstash on another existing instance or container.
https://www.elastic.co/products/logstash
It is designed for this scope and it does it wonderfully.
Cloudwatch, is not designed mainly for your needs.
I hope this helps somehow.

If you are calling this API directly from your application: the short answer is that you need to batch you log events (it's 5 for PutLogEvents).
If you are writing the logs to disk and after that you are pushing them there is already an agent that knows how to push the logs (http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/QuickStartEC2Instance.html)
Meta: I would suggest that you prototype this and ensure that it works for the log volume that you have. Also, keep in mind that, because of how the cloudwatch api works, only one application/user can push to a log stream at a time (see the token you have to pass in) - so that you probably need to use multiple stream, one per user / maybe per log type to ensure that your applicaitions are not competing for the log.
Meta Meta: think about how your application behaves if the logging subsystem fails and if you can live with the possibility of losing the logs (ie is it critical for you to always/always have the guarantee that you will get the logs?). this will probably drive what you do / what solution you ultimately pick.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js