I have 2 AWC accounts, each of them has one S3 bucket. I uploaded two same-size .CSV files to each of the S3 bucket.
When I try to Download or Download As, this file is downloaded as .CSV file in first account. BUT(!!) When I try to download this file from second account - it is downloading it as .TXT.
How can this happen? Both files are created in the same way: through Redshift UNLOAD query, that perform copying of selected data from Redshift to S3.
UPDATE:
Can it be because in this account for this document , **Server side encryption is equal to AWS-KMS?
I noticed that file, that converted from .csv to .txt has "Server side encryption: AWS-KMS", while .csv file that is downloaded as .csv - has "Server side encryption: NONE"
UPDATE: tried in different browsers - same result
Check the headers for each object in the AWS S3 console and compare the Content-Type values. Content-Type provides a hint to web browsers on what data the object contains.
If Content-Type does not exist or does not contain text/csv, add or modify the header in the S3 console or via your favorite S3 application such as CloudBerry.
John is right about the Content-Type not being text/csv. Sometimes, S3 will get it right and sometimes it won't. If you can't manually correct this yourself, you can run a Lambda function to do this for you everytime you upload a new object. You can use a Python 2.7 template Lambda function to download the object from the bucket, employ mimetypes library to guess_type for your S3 object, and then re-upload the file in the same bucket. You will need to trigger this function with S3 object upload and give it the necessary permissions (S3:GetObject).
P.S. This will work for files with any extension. If you know you are only going to upload .csv files, you can ignore the mimetypes and directly re-upload the object with
bucket.upload_fileobj(filename, key, ExtraArgs={'ContentType': 'text/csv'})
If the mimetypes cannot guess the typethen you might need to add the types, look at an example here https://www.programcreek.com/python/example/5209/mimetypes.add_type
Good Luck!
Here is scala solution (to specify content type):
val settingsLine: String = "csvdata1,csvdata2,csvdata3"
val settingsStream: InputStream = new ByteArrayInputStream(settingsLine.getBytes())
val metadata: ObjectMetadata = new ObjectMetadata()
metadata.setContentType("text/csv")
s3Client.putObject(bucketName, prefix, settingsStream, metadata)
Related
We are using an S3 bucket to hold customer zip files they created and made ready for them to download. We are using CloudFront only to handle the SSL. We have caching disabled.
The customer receives an email to download their zip file, and that works great. The S3 lifecycle removes the file after 2 weeks. Now, if they add more photos to their account and re-request their zip file, it overwrites the current zip file with the new version. So the link is exactly the same. But when they download, it's the previous zip file, not the new one.
Additionally after the two weeks, the file is removed and they try to download they get an error that basically says they need to login and re-request their photos. So they generate a zip file but their link still gives them the error message.
I could have the lambda that creates the zip file invalidate the file when it creates it, but I didn't think I needed to invalidate since we aren't caching?
Below is the screenshot of the caching policy I have selected in CloudFront
I'm downloading data from an API and writing it to a csv file that I store in an S3 bucket. I'm then copying my file from this input bucket into an output bucket with a Lambda function. From the output bucket I'm ingesting it into a MySQL RDS instance with another Lambda function.
The copy-to-another-bucket and upload-to-RDS lambda functions both get triggered when I create a new object in a bucket. Since I'm appending to my csv file, the upload-to-RDS function gets triggered way more than it should and I end up with ~30 rows in my database instead of 6.
I thought by copying the files between S3 buckets I could avoid this, but it doesn't help. Is there any way to only upload the csv file to the database once it has been written and not while it's being updated? Can I delay the trigger maybe?
The only other solution I can think of is to skip the copy-to-another-bucket function altogether and to schedule the upload-to-RDS function.
You need to realize that S3 doesn't support updating an existing file. If you are appending a row to an existing CSV file in S3, then that operation requires uploading the entire contents of the CSV file to S3 again, which S3 sees as a new object.
If you need to store a temporary version of the CSV file in S3 while you are updating it, then you should store it in a separate path, like s3://your_bucket/tmp and then when you have completed your updates, move it to the final path like s3://your_bucket/complete and only configure the Lambda trigger on the /complete path.
Initially a csv file is uploaded to S3 bucket and we often append that file by scripting when new row is added to that csv file. what we want is we want the script to run only when the csv file is modified, is there any watchers which can notify the script to run when the csv file is changed?
There is S3 event notification for that, you would be interested in the s3:ObjectCreated event
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
You should also take a look at the s3 documentation and note the difference between S3 and a File system. An "update" or "append" operation on s3 is actually replacing the whole object, just for your information
Need suggestions for best way to write database results as csv file to aws s3 bucket.
Note: the csv data may grow form kb to gb in size.
The best way would be:
Write your data to a CSV file on your local computer (or wherever your app is running)
Upload the file to an Amazon S3 bucket using the AWS SDK for Java
Please note that it is not possible to append data to an Amazon s3 object. So, you should either upload a new file each time or, if you want all data in one file, you will need to re-upload the complete file each time.
If you want to send the data as a stream, you can use putObject():
public PutObjectResult putObject(String bucketName,
String key,
InputStream input,
ObjectMetadata metadata)
throws SdkClientException,
AmazonServiceException
I am currently using the range header for GET request on Amazon S3 but I can't find an equivalent for PUT requests.
Do I have to upload the entire file again or can I specify where in the file I want to update? Thanks
Need to upload it again. S3 does not have a concept of either append and/or editing afile
However, if its a long file, you can do something called "Multipart Upload", and send several pieces of file, and merge it back at AWS:
http://docs.amazonwebservices.com/AmazonS3/latest/dev/uploadobjusingmpu.html