I am new to aws. I read few articles to directly connect api gateway to DynamoDB and few used lambda to route the requests. I want to understand in what cases we need lambda and when we can avoid it.
It is super-convenient that API Gateway can insert directly into DynamoDB. Many times that is sufficient. There may be times when you need to do "a little more" than just insert into DynamoDB. For example, you may need to:
Append some additional data to items going into a DynamoDB table (like a postal code)
Reformat some of the data (like split a name field into first name and last name)
Execute some other action (like also insert into an RDS database)
Processing a DynamoDB stream is another way in which you might handle those actions. However, DynamoDB stream processing happens asynchronously so you cannot immediately report a failure back through the original API Gateway endpoint. For example, if there's a problem in the Lambda you may want to have the API Gateway endpoint reply with an appropriate HTTP status code so that the caller knows about the problem.
Related
I have a project where I am building a simple single page app, that needs to pull data from an api only once a day. I have a backend that I am thinking of building with golang, where I need to do 2 things:
1) Have a scheduled job that would once a day update the DB with the new data.
2) Serve that data to the frontend. Since the data would only be updated once a day, I would like to cache it after each update.
Since, the number of options that AWS is offering is a bit overwhelming, I am wondering what would be the ideal solution for this scenario. Should I use lambda that connects to DB and updates it with a scheduled job? Should I create then a separate REST API lambda where I would pull that data from the DB and call it from the frontend?
I would really appreciate suggestions for this problem.
Her is my suggestion;
Create a lambda function
it will fetch required information from database
You may use S3 or DynamoDB to save your content. Both of the solutions may be free please check for free tier offers depending on your usage
it will save the fetched content to S3 or DynamoDB (you may check Dax for DynamoDB caching)
Create an Api gateway and integrate it to your lambda (Elastic LoadBalancer is another choice)
Create a Schedule Expressions on CloudWatch to trigger lambda daily
Make a request from your front end to Api Gateway or ELB.
You may use Route 53 for domain naming.
Your lambda should have two separate functions, one is to respond schedule expression, the other one is to serve your content via communicating with S3/DynamoDB.
Edit:
Here is the architecture
Edit:
If the content is going to be static, you may configure a S3 bucket for static site serving and your daily lambda may write it in there when it is triggered. Then you no longer need api gateway and DynamoDB.
here is the documentation for s3 static content
I have an API Gateway endpoint that I would like to limit access to. For anonymous users, I would like to set both daily and monthly limits (based on IP address).
AWS WAF has the ability to set rate limits, but the interval for them is a fixed 5 minutes, which is not useful in this situation.
API Gateway has the ability to add usage plans with longer term rate quotas that would suit my needs, but unfortunately they seem to be based on API keys, and I don't see a way to do it by IP.
Is there a way to accomplish what I'm trying to do using AWS Services?
Is it maybe possible to use a usage plan and automatically generate an api key for each user who wants to access the api? Or is there some other solution?
Without more context on your specific use-case, or the architecture of your system, it is difficult to give a “best practice” answer.
Like most things tech, there are a few ways you could accomplish this. One way would be to use a combination of CloudWatch API logging, Lambda, DynamoDB (with Streams) and WAF.
At a high level (and regardless of this specific need) I’d protect my API using WAF and the AWS security automations quickstart, found here, and associate it with my API Gateway as guided in the docs here. Once my WAF is setup and associated with my API Gateway, I’d enable CloudWatch API logging for API Gateway, as discussed here. Now that I have things setup, I’d create two Lambdas.
The first will parse the CloudWatch API logs and write the data I’m interested in (IP address and request time) to a DynamoDB table. To avoid unnecessary storage costs, I’d set the TTL on the record I’m writing to my DynamoDB table to be twice whatever my analysis’s temporal metric is... ie If I’m looking to limit it to 1000 requests per 1 month, I’d set the TTL on my DynamoDB record to be 2 months. From there, my CloudWatch API log group will have a subscription filter that sends log data to this Lambda, as described here.
My second Lambda is going to be doing the actual analysis and handling what happens when my metric is exceeded. This Lambda is going to be triggered by the write event to my DynamoDB table, as described here. I can have this Lambda run whatever analysis I want, but I’m going to assume that I want to limit access to 1000 requests per month for a given IP. When the new DynamoDB item triggers my Lambda, the Lambda is going to query the DynamoDB table for all records that were created in the preceding month from that moment, and that contain the IP address. If the number of records returned is less than or equal to 1000, it is going to do nothing. If it exceeds 1000 then the Lambda is going to update the WAF WebACL, and specifically UpdateIPSet to reject traffic for that IP, and that’s it. Pretty simple.
With the above process I have near real-time monitoring of request to my API gateway, in a very efficient, cost-effective, scaleable manner in a way that can be deployed entirely Serverless.
This is just one way to handle this, there are definitely other ways you could accomplish this with say Kinesis and Elastic Search, or instead of logs you could analyze CloudTail events, or by using a third party solution that integrates with AWS, or something else.
I have an application which needs to read data from AWS dynamodb table every 5 seconds.
Currently I fetch the data using lambda, and then getting the data from dynamodb back to the user.
The problem with querying the table every 5 seconds is that it can have performance affect and moreover there is a pricing issue. (Most of the time the data might not even be changed at all but when it is changed I want to be notified it immediately).
An important clarification is that my app sits outsite of AWS, and only access the AWS dynamodb to get data (using simple http request built with c#).
Is there any way I can get a notification to my app when a new data is inserted into dynamodb?
Just to add something on top of #john-rotenstein answer:
Once you have properly configured a Lambda function to be triggered by an event from a DynamoDB Stream, you could have your Lambda function notify your Web Application via an HTTP Request.
Another option is to use Lambda to put this notification in a Queue you may be using outside AWS and then have your C# code be a consumer of this Queue. There are several possibilities to notify your application, you just need to see which one is the best / most cost effective for your current scenario.
A data update in DynamoDB can trigger a DynamoDB Stream, which can trigger an AWS Lambda function.
The Lambda function could notify your application in some way.
See: DynamoDB Streams and AWS Lambda Triggers
Streams is the right answer in terms of engineering, but just to say your concern about the polling option being expensive is unfounded. Therefore if you have a working solution I would be tempted to leave it.
If you queried a table every 5 seconds, it would cost you $0.25 every 2 months.
This assumes your table has on-demand pricing, and the query returns less than 4KB of data.
https://aws.amazon.com/dynamodb/pricing/on-demand/
I am new to AWS and please forgive me if this question is asked previously.
I have a REST API which returns 2 parameters (name, email). I want to load this data into Redshift.
I thought of making a Lambda function which starts every 2 minutes and call the REST API. The API might return max 3-4 records within this 2 minutes.
So, under this situation is it okay to just do a insert operation or I have to still use COPY (using S3)? I am worried only about performance and error-free (robust) data insert.
Also, the Lambda function will start asynchronously every 2 mins, so there might be a overlap of insert operation (but there won't be an overlap in data).
At this situation and if I go with S3 option, I am worried the S3 file generated by previous Lambda invoke will be overwritten and a conflict occurs.
Long story short, what is the best practise to insert fewer records into redshift?
PS: I am okay with using other AWS components as well. I even looked into Firehose which is perfect for me but it can't load data into Private Subnet Redshift.
Thanks all in advance
Yes, it would be fine to INSERT small amounts of data.
The recommendation to always load via a COPY command is for large amounts of data because COPY loads are parallelized across multiple nodes. However, for just a few lines, you can use INSERT without feeling guilty.
If your SORTKEY is a timestamp and you are loading data in time order, there is also less need to perform a VACUUM, since the data is already sorted. However, it is good practice to still VACUUM the table regularly if rows are being deleted.
As you don't have much data; you can use either copy or insert. Copy command is more optimized for bulk insert .. its like giving u capability of batch insert..
both will work equally fine
FYI, AWS now supports Data API feature.
As described in the official document, you can easily access Redshift data using HTTP request without JDBC connection anymore.
The Data API doesn't require a persistent connection to the cluster. Instead, it provides a secure HTTP endpoint and integration with AWS SDKs. You can use the endpoint to run SQL statements without managing connections. Calls to the Data API are asynchronous.
https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html
Here's the steps you need to use Redshift Data API
Determine if you, as the caller of the Data API, are authorized. For more information about authorization, see Authorizing access to the Amazon Redshift Data API.
Determine if you plan to call the Data API with authentication credentials from Secrets Manager or temporary credentials. For more information, see Choosing authentication credentials when calling the Amazon Redshift Data API.
Set up a secret if you use Secrets Manager for authentication credentials. For more information, see Storing database credentials in AWS Secrets Manager.
Review the considerations and limitations when calling the Data API. For more information, see Considerations when calling the Amazon Redshift Data API.
Call the Data API from the AWS Command Line Interface (AWS CLI), from your own code, or using the query editor in the Amazon Redshift console. For examples of calling from the AWS CLI, see Calling the Data API with the AWS CLI.
My system architecture looks like as follows:-
SNS -> AWS Lambda -> Dynamo Db
So, SNS is publishing messages to which AWS Lambda function is the subscriber and then AWS Lambda pushes the data into Dynamo Db. In this, I am doing some transformation of messages in AWS Lambda. For the transformation, I have to fetch some rules from some place. These rules are basically the mapping between fields of the original messages to fields to transformed messages.
Eg.
Say, Original Message looks like below:-
{"id": 1,
"name":"dsadas",
"house":"dsads dsadsa",
"speciality":"asjdsa"
}
and my mapping is something like:-
{"id":"id",
"house":"home",
"speciality":"area"
}
So, basically I am saying that id should be mapped to id, house to home and so on.
So, I want to keep this mapping at some places like Dynamo Db or some config service. I do not want to directly keep it in aws lambda code as there is a chance that I might have to change. But keeping it in Dynamo Db will be very costly in terms of latency I think because I will make a call on each message request. So, can anyone suggest, any aws resource which can be used for keeping these configs which is very fast and normally used for keeping configuration.
If you need full flexibility of modifying the mapping, without modifying the Lambda code, you have to rely on S3 or Dynamodb or any other storage service to keep the mappings which also adds latency and cost.
You can also keep a separate mapping.json(or js) and upload the file along with your Lambda code. The drawback is, you need to redeploy the Lambda function for each mapping.json modification.
Another option is to use environment variables, only to keep the attribute mapping key value pairs and the mapping template is constructed inside Lambda using these variables.
You can also use base64 encoding of the mapping template to an environment variable and use it in Lambda.