Writing to file in S3 Bucket - amazon-web-services

I want to create a file in my S3 bucket.
I have a list which has some 10,000 strings in single iteration, I want to write those to a S3, clear the list. Then, in the second iteration, list is populated again with 10,000 entries, is it possible to write these new 10,000 entries to the same file in my S3 bucket ? How can I store these entries in S3 without storing it on local machine ?

There are several ways to store objects in Amazon S3.
The simplest is to copy a local file to S3, which can be done programmatically or with the AWS Command-Line Interface (CLI). For example:
aws s3 cp foo.txt s3://my-bucket/foo.txt
The aws s3 cp command also has the ability to take input from stdin and send output to stdout. So, if you have a program outputting text to stdout, you could store it in S3 with:
./myapp | aws s3 cp - s3://my-bucket/foo.txt
See: Uploading a local file stream to S3 in AWS S3 CP documentation.
Alternatively, you could write objects to Amazon S3 directly from your application by using an AWS SDK for your preferred language. This could, for example, stream the data into an Amazon S3 object without having to write it to a local disk first.

Related

Get files From SFTP to S3 AWS

What is the better option of get data from a directory in SFTP and copy in bucket of S3 of AWS? In SFTP i only have permission of read so Rsync isn't option.
My idea is create a job in GLUE with Python that download this data y copy in bucket of S3. They are different files, one weighs about 600 MB, others are 4 GB.
Assuming you are talking about an sFTP server that is not on AWS, you have a few different options that may be easier than what you have proposed (although your solution could work):
Download the AWS CLI onto the sFTP server and copy the files via the AWS s3 cp command.
Write a script using the AWS SDK that takes the files and copies them. You may need to use the multi-part upload with the size of your files.
Your can create an AWS managed sFTP server that links directly to your s3 bucket as the backend storage for that server, then use sftp commands to copy the files over.
Be mindful that you will need the appropriate permissions in your AWS account to complete any of these 3 (or 4) solutions.

Downloading (a batch of) files directly to AWS s3 bucket from url

I am using use an API (https://scihub.copernicus.eu/userguide/OpenSearchAPI) to download a large number (100+) of large files (~5GB each) and I want to store these files on an AWS s3 bucket.
My first iteration was to download the files locally and use AWS CLI to move them to an S3 bucket: aws s3 cp <local file> s3://<mybucket>, and this works.
To avoid downloading locally I used an ec2 instance and basically did the same from there. The problem however is that the files are quite large so I'd prefer to not even have to store the files and use my ec2 instance to kind of stream the files to my S3 bucket.
Is this possible?
You can use a byte array to populate an Amazon S3 bucket. For example, assume you are using the AWS SDK for Java V2. You can put an object into a bucket like this:
PutObjectRequest putOb = PutObjectRequest.builder()
.bucket(bucketName)
.key(objectKey)
.metadata(metadata)
.build();
PutObjectResponse response = s3.putObject(putOb,
RequestBody.fromBytes(getObjectFile(objectPath)));
Notice RequestBody.fromBytes method. Full example here:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/s3/src/main/java/com/example/s3/PutObject.java
One thing to note however. If your files are really large, you may want to consider uploading in parts. See this example:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/s3/src/main/java/com/example/s3/S3ObjectOperations.java

Bulk AWS S3 files exist verification?

What is the best/fastest approach to check if multiple files exist in AWS S3 bucket?
For example I have 100k files metadata in my local DB. I would like to make sure all of them exist in S3 bucket. I can do 'aws s3 ls' for particular file, but that would mean 100k aws requests. Is there a better approach to this?
If you are just doing a general audit, you could use Amazon S3 Inventory to obtain a complete daily dump of all object keys and associated metadata.
You could then write some code to compare the contents of the Inventory file against the DB entries.
If you want to retrieve all keys in a specific bucket in one command then you can use this.
aws s3api list-objects --bucket <bucket-name> --no-paginate
Once you have that list, you can process it by a custom code.
If you would like to make sure your local files are on S3 you can try the s3 sync command.
You can also check out which files are there currently with Commandeer, which supports S3 file browsing in a nice tree view.

AWS S3 LISTING is slow

I am trying to execute the following command using AWS CLI on an S3 bucket:
aws s3 ls s3://bucket name/folder_name --summarize --human-readable --recursive
I am trying to get the size of the folder, but given there are multiple levels and a huge number of files, it running for hours.
Is there an efficient way to quickly get the size at folder level on Amazon S3?
You can use Amazon S3 Inventory:
Amazon S3 inventory provides comma-separated values (CSV) or Apache optimized row columnar (ORC) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
You would need to parse the file, but all information is provided.
It is only updated daily, so if you need something faster then you'd have to make the calls yourself.

Why can I not run dynamic content from Amazon S3?

I know that Amazon S3 is a service for storing static files. But what I don't understand is, if I store some PHP files on a S3 bucket, why isn't it possible to have those files executed from a EC2 instance?
Amazon S3 is a data storage service. When a file is requested from S3, it is sent to the requester, regardless of file format. S3 does not process the file in any way, nor does it pass content to Amazon EC2 for execution.
If you want a PHP file executed by a PHP engine, you will need to run a web server on an Amazon EC2 instance.
Run directly from S3 this will never work as objects in s3 aren't presented in a way whilst stored in s3 that your local system can really use.
However good news you can pull the php down from S3 to your local system and execute it!
I use this method myself with an instance created by lambda to do some file processing. Lambda creates the instance, the bash script in the instance UserData will do an s3 copy (see below) to copy the php file down and the data file down that PHP will process and then php is called against my file.
To download a file from s3 in the cli you:
//save as file.php in the current directory
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php .
//or
//save as a different filename
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php my-file.php
//or
//save it in a different folder
aws s3 cp s3://my-s3-bucket-name/my/s3/file.php some/directory/path/file.php
You would then pass this file into PHP for execution like any other file.