How do you test S3 in a sandbox environment when your code uses an AmazonS3Client?
IAmazonS3 amazonS3Client =
Amazon.AWSClientFactory.CreateAmazonS3Client(s3AccessKey, s3SecretKey, config);
var request = new PutObjectRequest()
{
BucketName = "bucketname",
Key = "filename",
ContentType = "application/json",
ContentBody = "body"
};
amazonS3Client.PutObject(request);
I've tried S3Ninja and FakeS3 but the AmazonS3Client didn't hit them, which leads me to believe that an AmazonS3Client doesn't behave like a normal rest client.
The only solution I can think of is to convert the code to use a rest client and manually build the requests and headers just so that I can use S3Ninja or FakeS3.
I got FakeS3 to work. My problem was that I needed to update my host file (/etc/hosts):
127.0.0.1 myBucketName.s3.amazon.com s3.amazon.com
The reason we have to do this is because the url to put objects to is http://bucketname.s3.amazon.com, so when running locally your dns will not understand where to go for bucketname.s3.amazon.com
Instead of spending time & energy in getting on the S3 Mockups & Fake S3 - better approach would be to create a new S3 bucket and use it.
S3 isn't costly ( relatively ); there are several way you can optimize the cost in S3 - by setting up life-cycle rule ( delete all the objects - in a day - so today's object would be deleted at Close of today ) you keep the cost at check.
Also setup RRS - Reduced Redundant Storage for your S3 bucket - it roughly reduces the cost further by 1/3 the regular price.
I don't know it is relevant nowdays for you, but there is a fully functional local AWS cloud stack called localstack, and it's free.
Related
I'm trying to implement multi-part upload to Google Storage but to my surprise it does not seem to be straightforward (I could not find java example).
Only mention I found was in the XML API https://cloud.google.com/storage/docs/multipart-uploads
Also found some discussion around a compose API StorageExample.java#L446 mentioned here google-cloud-java issues 1440
Any recommendations how to do multipart upload?
I got the multi-part upload working with #Koblan suggestion. (for details check blog post)
This is how I create the S3 Client and point it to Google Storage
def createClient(accessKey: String, secretKey: String, region: String = "us"): AmazonS3 = {
val endpointConfig = new EndpointConfiguration("https://storage.googleapis.com", region)
val credentials = new BasicAWSCredentials(accessKey, secretKey)
val credentialsProvider = new AWSStaticCredentialsProvider(credentials)
val clientConfig = new ClientConfiguration()
clientConfig.setUseGzip(true)
clientConfig.setMaxConnections(200)
clientConfig.setMaxErrorRetry(1)
val clientBuilder = AmazonS3ClientBuilder.standard()
clientBuilder.setEndpointConfiguration(endpointConfig)
clientBuilder.withCredentials(credentialsProvider)
clientBuilder.withClientConfiguration(clientConfig)
clientBuilder.build()
}
Because I'm doing the upload from the frontend (after I generate signled URLs for each part using the AmazonS3 client) I needed to enable CORS.
For testing, I enabled everything for now
$ gsutil cors get gs://bucket
$ echo '[{"origin": ["*"],"responseHeader": ["Content-Type", "ETag"],"method": ["GET", "HEAD", "PUT", "DELETE", "PATCH"],"maxAgeSeconds": 3600}]' > cors-config.json
$ gsutil cors set cors-config.json gs://bucket
See https://cloud.google.com/storage/docs/configuring-cors#gsutil_1
Currently Java Client library for multi part upload in Cloud Storage is not available. You can raise a feature request for the same in this link. As mentioned by John Hanley, the next best thing you can do is, do a parallel composite upload with gsutil (CLI), JSON and XML support/ resumable upload with Java libraries.
In parallel compose, the parallel writes can be done by using the JSON or XML API for Google Cloud Storage. Specifically, you would write a number of smaller objects in parallel and then (once all of those objects have been written) call the Compose request to compose them into one larger object.
If you're using the JSON API the compose documentation is at : https://cloud.google.com/storage/docs/json_api/v1/objects/compose
If you're using the XML API the compose documentation is at : https://cloud.google.com/storage/docs/reference-methods#putobject (see the compose query parameter).
Also there is an interesting document link provided by Kolban which you can try and work out. Also I would like to mention that you can have multi part uploads in Java, if you use the Google Drive API(v3). Here is the code example where we use the files.create method with uploadType=multipart.
My boss wants me to build an API that returns the daily currency exchange ratio beween USD and JPY.
For this information, my boss wants to use a specific website. This website published a daily exchange ratio on 10AM everyday, which is available from a certain public API.
Maybe the simplest solution is to invoke this public API from my API. The catch is that this public API has a limit of 1000 invocations daily, but we expect our customers to invoke my API way more than that.
I can run a cronjob to get the latest information on 10AM every day, but I don't know how to transfer this information to my API in AWS environment.
Database is clearly an overkill as this DB only has to store only one entry for the daily info.
Can anybody suggest a better solution for this use case?
There are tons of ways to implement this. Get the data via API call and use any of the following ways to store it:
Store the data in S3 in any format (txt, csv, json, yml, etc). Read the data from this S3 bucket via your API call
If you're planning to use API Gateway then you can cache the API call. Use this cache to serve the data and don't have to persist it anywhere else. Pretty sure you'll not hit 1k limit with this cache implemented. https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html
DynamoDB is also a good place to store such data. It will be cheap also if data is not huge AND super performant
Elastic Cache (Redis) is another place to store the data for a day
CloudFront in front of S3 is also a great way for not so dynamic data. Cache the data for a day and just read it from CloudFront
SSM param store is also an option but SSM is not meant to be persistent database
Storing to S3 should be easy.
let xr = 5.2838498;
await s3
.putObject({
Bucket: 'mybucket',
Key: `mydataobject`,
Body: xr.toString(),
ContentType: 'text/plain;charset=utf-8'
})
.promise();
xr = Number((await s3.getObject({
Bucket: 'mybucket',
Key: 'mydataobject',
}).promise()).Body?.toString('utf-8'));
Specifically, in an origin response triggered function (EX. With 404 Status), how can I read an HTML file stored in S3 and use its content for the response body?
(I would like to manually return a custom error page just as CloudFront does, but choosing it based on cookies).
NOTE: The HTML file in S3 is stored in the same bucket of my website. OAI Enabled.
Thank you very much!
Lambda#Edge functions don't currently¹ have direct access to any body content from the origin.
You will need to grant your Lambda Execution Role the necessary privileges to read from the bucket, and then use s3.getObject() from the JavaScript SDK to fetch the object from the bucket, then use its body.
The SDK is already in the environment,² so you don't need to bundle it with your code. You can just require it, and create the S3 client globally, outside the handler, which saves time on subsequent invocations.
'use strict';
const AWS = require('aws-sdk');
const s3 = new AWS.S3({ region: 'us-east-2' }); // use the correct region for your bucket
exports.handler ...
Note that one of the perceived hassles of updating a Lambda#Edge function is that the Lambda console gives the impression that redeploying it is annoyingly complicated... but you don't have to use the Lambda console to do this. The wording of the "enable trigger and replicate" checkbox gives you the impression that it's doing something important, but it turns out... it isn't. Changing the version number in the CloudFront configurarion and saving changes accomplishes the same purpose.
After you create a new version of the function, you can simply go to the Cache Behavior in the CloudFront console and edit the trigger ARN to use the new version number, then save changes.
¹currently but I have submitted this as a feature request; this could potentially allow a response trigger to receive a copy of the response body and rewrite it. It would necessarily be limited to the maximum size of the Lambda API (or smaller, as generated responses are currently limited), and might not be applicable in this case, since I assume you may be fetching a language-specific response.
²already in the environment. If I remember right, long ago, Lambda#Edge didn't include the SDK, but it is always there, now.
I am trying to understand when I should use a Resource and when I should use a Client.
The definitions provided in boto3 docs don't really make it clear when it is preferable to use one or the other.
boto3.resource is a high-level services class wrap around boto3.client.
It is meant to attach connected resources under where you can later use other resources without specifying the original resource-id.
import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket('mybucket')
# now bucket is "attached" the S3 bucket name "mybucket"
print(bucket)
# s3.Bucket(name='mybucket')
print(dir(bucket))
#show you all class method action you may perform
OTH, boto3.client are low level, you don't have an "entry-class object", thus you must explicitly specify the exact resources it connects to for every action you perform.
It depends on individual needs. However, boto3.resource doesn't wrap all the boto3.client functionality, so sometime you need to call boto3.client , or use boto3.resource.meta.client to get the job done.
If possible use client over resource, especially if dealing with s3 object lists, and then trying to get basic information on those objects themselves.
Client calls s3 10,000/1000 = 10 times and gives you a lot of information on each object in each call..
Resource, I assume calls s3 10,000 times(or maybe same as client??), but if you take that object and try to do something with it, that is probably another call to s3, making this about 20x slower than client.
my Test reveals the following results.
s3 = boto3.resource("s3")
s3bucket = s3.Bucket(myBucket)
s3obj_list = s3bucket.objects.filter(Prefix=key_prefix)
tmp_list = [s3obj.key for s3obj in s3obj_list]
(tmp_list = [s3obj for s3obj in s3obj_list] gives same ~9min results)
When trying to get a list of 150,000 files, took ~9 minutes. If s3obj_list is indeed pulling 1000 files a call and buffering it, s3obj.key is probably not part of it and makes another call.
...some sort of loop, that also sets ContinuationToken...
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
ContinuationToken=response["NextContinuationToken"],
)
...
Client took ~30 seconds to list the 150,000 files.
I don't know if resource buffers 1000 files at a time but if it doesn't that is a problem.
I also don't know if it is possible for resource to buffer the information attached to the object, but that is another problem.
I also don't know if using pagination could make client faster/easier to use.
Anyone who knows the answer to the 3 questions above please do. I'd be very interested to know.
I need a way to allow a 3rd party app to upload a txt file (350KB and slowly growing) to an s3 bucket in AWS. I'm hoping for a solution involving an endpoint they can PUT to with some authorization key or the like in the header. The bucket can't be public to all.
I've read this: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
and this: http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html
but can't quite seem to find the solution I'm seeking.
I'd suggests using a combination of the AWS API gateway, a lambda function and finally S3.
You clients will call the API Gateway endpoint.
The endpoint will execute an AWS lambda function that will then write out the file to S3.
Only the lambda function will need rights to the bucket, so the bucket will remain non-public and protected.
If you already have an EC2 instance running, you could replace the lambda piece with custom code running on your EC2 instance, but using lambda will allow you to have a 'serverless' solution that scales automatically and has no min. monthly cost.
I ended up using the AWS SDK. It's available for Java, .NET, PHP, and Ruby, so there's very high probability the 3rd party app is using one of those. See here: http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjSingleOpNET.html
In that case, it's just a matter of them using the SDK to upload the file. I wrote a sample version in .NET running on my local machine. First, install the AWSSDK Nuget package. Then, here is the code (taken from AWS sample):
C#:
var bucketName = "my-bucket";
var keyName = "what-you-want-the-name-of-S3-object-to-be";
var filePath = "C:\\Users\\scott\\Desktop\\test_upload.txt";
var client = new AmazonS3Client(Amazon.RegionEndpoint.USWest2);
try
{
PutObjectRequest putRequest2 = new PutObjectRequest
{
BucketName = bucketName,
Key = keyName,
FilePath = filePath,
ContentType = "text/plain"
};
putRequest2.Metadata.Add("x-amz-meta-title", "someTitle");
PutObjectResponse response2 = client.PutObject(putRequest2);
}
catch (AmazonS3Exception amazonS3Exception)
{
if (amazonS3Exception.ErrorCode != null &&
(amazonS3Exception.ErrorCode.Equals("InvalidAccessKeyId")
||
amazonS3Exception.ErrorCode.Equals("InvalidSecurity")))
{
Console.WriteLine("Check the provided AWS Credentials.");
Console.WriteLine(
"For service sign up go to http://aws.amazon.com/s3");
}
else
{
Console.WriteLine(
"Error occurred. Message:'{0}' when writing an object"
, amazonS3Exception.Message);
}
}
Web.config:
<add key="AWSAccessKey" value="your-access-key"/>
<add key="AWSSecretKey" value="your-secret-key"/>
You get the accesskey and secret key by creating a new user in your AWS account. When you do so, they'll generate those for you and provide them for download. You can then attach the AmazonS3FullAccess policy to that user and the document will be uploaded to S3.
NOTE: this was a POC. In the actual 3rd party app using this, they won't want to hardcode the credentials in the web config for security purposes. See here: http://docs.aws.amazon.com/AWSSdkDocsNET/latest/V2/DeveloperGuide/net-dg-config-creds.html