What AWS service is appropriate for storing a single key-value pair data that is updated daily? The stored data will be retrieved by other several services throughout the day (~ 100 times total per day).
My current solution is to create and upload a JSON to an S3 bucket. All other services download the JSON and get the data. When it's time to update the data, I create a new JSON and upload it to replace the previously uploaded JSON. This works pretty well but I'm wondering if there is a more appropriate way.
There's many:
AWS Systems Manager Parameter Store
AWS Secrets Manager
Dynamo
S3
^ those are some of the most common. Without knowing more I'd suggest you consider Dynamo or Param Store. Both are simple and inexpensive--although S3 is fine, too.
The only reason to not use S3 is governance of the key expires etc., automatically from AWS side - like using a secret manager - therefore, giving it to third parties will be much harder.
Your solution seems very good, especially since S3 IS the object store database - json is an object.
The system you described is such a low usage that you shouldn't spend time thinking if there is any better way :)
Just make sure you are aware that amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write
and to refer to your comment:
The S3 way seemed a little hacky, so I am trying to see if there is a better approach
S3 way is not hacky at all - intended use of S3 is to store some objects in the key-value database :)
Related
My AWS Lambda function needs to access data that is updated every hour and is going to be called very often via api. What is the most efficient and least expensive way?
The data that is already updated every hour is configured through Lambda batch, but I don't know where to store this data.
How about putting the latest data in the latest bucket of Amazon S3 every time? Or, even if there is a problem with the hot partition, how about storing it in Amazon DynamoDB because it is simple access? I considered the gateway cache, which is updated every hour, but at a cost. Please advise.
As you have mentioned "least expensive way" I will suggest to use Amazon DynamoDB because 25GB of space is free (always not free tier). Now if your data size is more than 25GB then also you can use DynamoDB over other services like RDS or S3 that comes at a cost.
The simplest option would be to use AWS Systems Manager Parameter Store. It is secured via IAM and is a great way to share parameters between AWS Lambda functions.
If your data is too big to store in Parameter Store, then consider storing it in Amazon S3. It is easily accessible and low-cost.
If there are problems using these services, then you could look at using databases but there is insufficient information in your question make an appropriate recommendation.
I am writing a script in python where I need to get the latest modified file in a bucket (using a prefix), but as far as I have read, I cannot do that query directly from python (using boto3 at least), So I have to retrieve the information of every object in my bucket.
I would have to do some query of several thousands of files, and I do not want to get any surprise in my billing.
If I do a query where I retrieve the metadata of all the objects in my bucket to sort them later locally, will I be charged as a single request or will it count as a request per object?
Thank you all in advance
Popular
A common method people use is via s3api to consolidate multiple calls into a single LIST request for every 1000 objects and then use --query to define your filtering operation, such as:
aws s3api list-objects-v2 --bucket your-bucket-name --query 'Contents[?contains(LastModified, `$DATE`)]'
Although please keep in mind that this isn't a good solution for two reasons:
This does not scale really well especially with large buckets and it does not help much in minimizing the data outbound.
It does not reduce the number of S3 API calls because the --query parameter isn't performed in the server-side. It just so happened to be a feature of this aws-cli command. To illustrate, this is how it would look like in boto3 and as you can see we'd still need to query it on client-side:
import boto3
client = boto3.client('s3',region_name='us-east-1')
response = client.list_objects_v2(Bucket='your-bucket-name')
results = sorted(response['Contents'], key=lambda item: item['LastModified'])[-1])
Probably
One thing you could *probably* do depending on your specific use case is to utilize S3 Event Notifications to automatically publish an event to SQS which gives you the opportunity to poll for all the S3 object events along with their metadata information which is more lightweight. This is still going to cost some money and it's not going to work if you already have an existing big bucket to begin with. Plus the fact that you'll have to actively poll for the messages since they won't persist too long.
Perfect (sorta)
This sounds to me like a good use case for S3 Inventory. It will deliver a daily file for you which is comprised of the list of objects and their metadata information based on your specifications. See https://docs.aws.amazon.com/AmazonS3/latest/user-guide/configure-inventory.html
I am planning to develop a web application which can perform some basic text edit functions (like insert and delete) on S3 files. Could anyone show me a path forward? I am currently learning Lambda, and have followed tutorial here: http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
I can create a Lambda function which can modify files on S3, and call the function by AWS CLI now. What else do I need to know and do to create this web application? Thank you very much.
You would need to look at AWS API Gateway. This can be the front end to your web application.
Also note that S3 is a block storage mechanism, and if your file edits are too frequent it is not suitable for your use case because every time you want to edit the text you will have to download the entire file, modify that and upload that back again. And be mindful of the S3 eventual consistency
Amazon S3 Data Consistency Model
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
Gone through amazon SDK/documentation and there isn't a lot around programtically querying/searching for documents on S3 bucket.
Sure, can get document by id/name but i want to have ability to search by other meta tags such as author.
Would appreciate some guidance and a specific example of a query being executed and not a local iteration once all documents or items have been pulled locally.
[…] there isn't a lot around programtically querying/searching for documents on S3 bucket.
Right. S3 is flat file storage, and doesn't provide a query interface.
[…] i want to have ability to search by other meta tags such as author.
This will need to be solved by your application logic. This is not built-in to S3.
For example, you can store the metadata about an S3 document/file in DynamoDB. You query DynamoDB for the metadata, which includes a pointer to the file in S3.
Unfortunately, if you already have a bunch of files in S3, you'll need to find a way to build that initial index of your data.
Amazon just released new features for cloud search
http://aws.amazon.com/about-aws/whats-new/2014/03/24/amazon-cloudsearch-introduces-powerful-new-search-and-admin-features/.
I'd like to set up a separate s3 bucket folder for each of my mobile app users for them to store their files. However, I also want to set up size limits so that they don't use up too much storage. Additionally, if they do go over the limit I'd like to offer them increased space if they sign up for a premium service.
Is there a way I can set folder file size limits through s3 configuration or api? If not would I have to use the apis somehow to calculate folder size on every upload? I know that there is the devpay feature in Amazon but it might be a hassle for users to sign up with Amazon if they want to just use small amount of free space.
There does not appear to be a way to do this, probably at least in part because there is actually no such thing as "folders" in S3. There is only the appearance of folders.
Amazon S3 does not have concept of a folder, there are only buckets and objects. The Amazon S3 console supports the folder concept using the object key name prefixes.
— http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
All of the keys in an S3 bucket are actually in a flat namespace, with the / delimiter used as desired to conceptually divide objects into logical groupings that look like folders, but it's only a convenient illusion. It seems impossible that S3 would have a concept of the size of a folder, when it has no actual concept of "folders" at all.
If you don't maintain an authoritative database of what's been stored by clients (which suggests that all uploads should pass through an app server rather than going directly to S3, which is the the only approach that makes sense to me at all) then your only alternative is to poll S3 to discover what's there. An imperfect shortcut would be for your application to read the S3 bucket logs to discover what had been uploaded, but that is only provided on a best-effort basis. It should be reliable but is not guaranteed to be perfect.
This service provides a best effort attempt to log all access of objects within a bucket. Please note that it is possible that the actual usage report at the end of a month will slightly vary.
Your other option is to develop your own service that sits between users and Amazon S3, that monitors all requests to your buckets/objects.
— http://aws.amazon.com/articles/1109#13
Again, having your app server mediate all requests seems to be the logical approach, and would also allow you to detect immediately (as opposed to "discover later") that a user had exceeded a threshold.
I would maintain a seperate database in the cloud to hold each users total hdd usage count. Its easy to manage the count via S3 Object Lifecycle Events which could easily trigger a Lambda which in turn writes to a DB.