My AWS Lambda function needs to access data that is updated every hour and is going to be called very often via api. What is the most efficient and least expensive way?
The data that is already updated every hour is configured through Lambda batch, but I don't know where to store this data.
How about putting the latest data in the latest bucket of Amazon S3 every time? Or, even if there is a problem with the hot partition, how about storing it in Amazon DynamoDB because it is simple access? I considered the gateway cache, which is updated every hour, but at a cost. Please advise.
As you have mentioned "least expensive way" I will suggest to use Amazon DynamoDB because 25GB of space is free (always not free tier). Now if your data size is more than 25GB then also you can use DynamoDB over other services like RDS or S3 that comes at a cost.
The simplest option would be to use AWS Systems Manager Parameter Store. It is secured via IAM and is a great way to share parameters between AWS Lambda functions.
If your data is too big to store in Parameter Store, then consider storing it in Amazon S3. It is easily accessible and low-cost.
If there are problems using these services, then you could look at using databases but there is insufficient information in your question make an appropriate recommendation.
Related
I am looking to trigger code every 1 Hour in AWS.
The code should: Parse through a list of zip codes, fetch data for each of the zip codes, store that data somewhere in AWS.
Is there a specific AWS service would I use for parsing through the list of zip codes and call the api for each zip code? Would this be Lambda?
How could I schedule this service to run every X hours? Do I have to use another AWS Service to call my Lambda function (assuming that's the right answer to #1)?
Which AWS service could I use to store this data?
I tried looking up different approaches and services in AWS. I found I could write serverless code in Lambda which made me think it would be the answer to my first question. Then I tried to look into how that could be ran every x time, but that's where I was struggling to know if I could still use Lambda for that. Then knowing where my options were to store the data. I saw that Glue may be an option, but wasn't sure.
Yes, you can use Lambda to run your code (as long as the total run time is less than 15 minutes).
You can use Amazon EventBridge Scheduler to trigger the Lambda every 1 hour.
Which AWS service could I use to store this data?
That depends on the format of the data and how you will subsequently use it. Some options are
Amazon DynamoDB for key-value, noSQL data
Amazon Aurora for relational data
Amazon S3 for object storage
If you choose S3, you can still do SQL-like queries on the data using Amazon Athena
I have 8 TB of on premise data at present. I need to transfer it to AWS S3. Going forward every month 800gb of data will be required to update. What will be the cost of the different approaches?
Run a python script in ec2 instance.
Use AWS Lambda for the transfer.
Use AWS DMS to transfer the data.
I'm sorry that I wont do the calculations for you,
but i hope with this tool you can do it yourself :)
https://calculator.aws/#/
According to
https://aws.amazon.com/s3/pricing/
Data Transfer IN To Amazon S3 From Internet
All data transfer in $0.00 per GB
Hope you will find your answer !
While data is inside SQL, you need to move that out of it first. If your SQL is AWS's managed RDS, that's easy task, just backup to s3. Yet if it's something you manage by hand, figure out to move data to s3. Btw, you can not only use s3, but disk services too.
You do not need EC2 instance to make data transfer unless you need some compute on that data.
Then to move 8Tb there are couple of options. Cost is tricky thing while downtime of slower transfer may mean losses, maybe security risk is another cost to think about, developer's time etc. etc. so it really depends on your situation
Option A would be to use AWS File Gateway and mount locally network drive with enough space and just sync from local to that drive. https://aws.amazon.com/storagegateway/file/ Maybe this would be the easiest way, while File Gateway will take care of failed connections, retries etc. You can mount local network drive to your OS which sends data to S3 bucket.
Option B would be just send over the public network. Which may be not possible if connection is slow or insecure by your requirements.
Option C which is usually not used for single time transfer - private link to AWS. This would provide more security and probably speed.
Option D would be to use snow family products. Smallest AWS Snowcone has exactly 8Tb of capacity, so if you really under 8Tb, maybe it would be more cost effective way to transfer. If you actually have a bit more than 8Tb, you need AWS Snowball, which can handle much more then 8Tb but it's <80Tb, which is enough in your case. Fun note, for up to 100PB data transfer there is Snowmobile.
I'm building a Lambda function in AWS that need to load reference data from a mysql database. There is no real issue right now as it very limited amount of data. But what is best practice here? Is there away to keep this data within Lambda (or some other similar functionality) so that I don't need to request it for every invocation of the function? I'm using Node js though I don't think that affects this question.
Many thanks,
Marcus
There is no build-in persistent storage for lambda. Any data that you would like to keep reliably (not counting temporary persistence due to lambda execution context) between invocations is to store data outside of lambda itself.
You already store it in MySQL, but other popular choices are:
SSM Parameter Store
S3
EFS
DynamoDB
ElastiCache if you really need fast access to the data.
Since you already get the data from MySQL the only advantage of using SSM or DynamoDB would be that you can use AWS API to access and update them, or inspect/modify in AWS Console. You don't need to bundle any MySQL client with your function nor establish any connections to the database.
I have an AWS Lambda function that is fronted by an API gateway for access.
I need to store the last time this was executed so I can retrieve data from an external service since the last execution.
I had planned to use DynamoDB for this purpose.
Is this the simplest option for this scenario?
DynamoDB is a really good option for that. DynamoDB and AWS Lambda work really well together. I definitely recommend DynamoDB for this scenario.
With DynamoDB, you can create database tables that can store and retrieve any amount of data and serve any level of request traffic.For this case DynamoDB is the best option.
I haven't been able to find a clear answer on this from the documentation.
Is is discouraged to access DynamoDB from outside the region it is hosted in? For example, I want to do a lot of writes to a DynamoDB table in us-west-2, from a cluster in us-east-1 (or even ap-southeast-1). My writes are batched and non-real-time, so I don't care so much about a small increase in latency.
Note that I am not asking about cross-region replication.
DynamoDB is a hosted solution but that doesn't mean you need to be inside AWS to use it.
There are cases, especially for storing user information for clients making queries against DynamoDB - outside of "AWS region".
So to answer your question - best performance will be achieved when you mitigate the geo barrier, but you can work with any endpoint you'd like from anywhere in the world.