Editing Python scripts on s3 bucket - amazon-web-services

Is it possible to open a Python script on the s3 bucket, make changes to it, save it and run it?
For example, if I SSH into a server on FileZilla, I can access the scripts on it, make changes to it, save and run it without having to download the script, making the changes and uploading every time I want to make changes.
Is there a way to do the same for scripts on the s3 bucket as well?

Unfortunately this is not possible to do it directly on S3. S3 is only a object-based storage solution, not a file system.
You have to download it somewhere, edit, and then upload new version to S3. You can have some third-party tools that do this in the background.
One of such third-party tool is s3fs-fuse:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
Among many other things, s3fs-fuse allows:
allows random writes and appends

Related

Run java -jar inside AWS Glue job

I have relatively simple task to do but struggle with best AWS service mix to accomplish that:
I have simple java program (provided by 3rd party- I can't modify that, just use) that I can run anywhere with java -jar --target-location "path on local disc". The program, once executed, is creating csv file on local disc in path defied in --target-location
Once file is created I need to upload it to S3
The way I'm doing it currently is by having dedicated EC2 instance with java installed and first point is covered by java -jar ... and second with aws s3 cp ... command
I'm looking for better way of doing that (preferably serverless). I'm wandering if above points can be accomplished with AWS Glue Job type Python Shell? Second point (copy local file to S3), most likely I can cover with boto3 but first (java -jar execution)- I'm not sure.
Am I force to use EC2 instance or you see smarter way with AWS Glue?
Or most effective would be to build docker image (that contains this two instructions), register in ECR and run wit AWS Batch?
I'm looking for better way of doing that (preferably serverless).
I cannot tell if a serverless option is better, however, an EC2 instance will do the job just fine. Assume that you have CentOS on your instance, you may do it through
aaPanel GUI
Some useful web panels offer cron scheduled tasks, such as backing up some files from one directory to another S3 directory. I will use aaPanel as an example.
Install aaPanel
Install AWS S3 plugin
Configure the credentials in the plugin.
Cron
Add a scheduled task to back up files from "path on local disc" to AWS S3.
Rclone
A web panel goes beyond the scope of this question. Rclone is another useful tool I use to back up files from local disk to OneDrive, S3, etc.
Installation
curl https://rclone.org/install.sh | sudo bash
Sync
Sync a directory to the remote bucket, deleting any excess files in the bucket.
rclone sync -i /home/local/directory remote:bucket

On-Premise file backup to aws

Use case:
I have one directory on-premise, I want to make a backup for it let's say at every midnight. And want to restore it if something goes wrong.
Doesn't seem a complicated task,but reading through the AWS documentation even this can be cumbersome and costly.Setting up Storage gateway locally seems unnecessarily complex for a simple task like this,setting up at EC2 costly also.
What I have done:
Reading through this + some other blog posts:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
What I have found:
1.Setting up file gateway (locally or as an EC2 instance):
It just mount the files to an S3. And that's it.So my on-premise App will constantly write to this S3.The documentation doesn't mention anything about scheduled backup and recovery.
2.Setting up volume gateway:
Here I can make a scheduled synchronization/backup to the a S3 ,but using a whole volume for it would be a big overhead.
3.Standalone S3:
Just using a bare S3 and copy my backup there by AWS API/SDK with a manually made scheduled job.
Solutions:
Using point 1 from above, enable versioning and the versions of the files will serve as a recovery point.
Using point 3
I think I am looking for a mix of file-volume gateway: Working on file level and make an asynchronus scheduled snapshot for them.
How this should be handled? Isn't there a really easy way which will just send a backup of a directory to the AWS?
The easiest way to backup a directory to Amazon S3 would be:
Install the AWS Command-Line Interface (CLI)
Provide credentials via the aws configure command
When required run the aws s3 sync command
For example
aws s3 sync folder1 s3://bucketname/folder1/
This will copy any files from the source to the destination. It will only copy files that have been added or changed since a previous sync.
Documentation: sync — AWS CLI Command Reference
If you want to be more fancy and keep multiple backups, you could copy to a different target directory, or create a zip file first and upload the zip file, or even use a backup program like Cloudberry Backup that knows how to use S3 and can do traditional-style backups.

What is better Mounting S3 bucket or copying files from S3 bucket to windows EC2 instance?

I have a use case where the CSV files are stored on an S3 bucket by a service. My program running on windows EC2 has to use the CSV files dumped on S3 bucket. Mounting or copying, which approach will be better to use the file? And how to approach it.
Mounting the bucket as a local Windows drive will just cache info about the bucket and copy the files locally when you try to access them. Either way you will end up having the files copied to the Windows machine. If you don't want to program the knowledge of the S3 bucket into your application then the mounting system can be an attractive solution, but in my experience it can be very buggy. I built a system on Windows machines in the past that used an S3 bucket mounting product, but after so many bugs and failures I ended up rewriting it to simply perform an aws s3 sync operation to a local folder before the process ran.
I always suggest copying using either by CLI or directly using endpoints or SDK or whatever the way suggested by AWS but not mounting.
Actually, S3 is not built for a filesystem purpose. It's an object storage system. NOt saying that you cannot do it, but it is not advisable. The correct way to use Amazon S3 is to put/get files using the S3 APIs.
And if you are concerned about the network latency, I would say both will be the same and if you are thinking about directly modifying/editing a file within the file system, No you cannot Since Amazon S3 is designed for atomic operations, they have to be completely replaced with modified files.

Uploading 1000s of files to AWS Notebook Instance

When I open AWS Notebook Instance-> Jupyter Notebook. It gives me a storage (probably called an S3 bucket). I created a folder there and tried to upload 1000s of data. However, it asks me to manually click on the upload button next to every single file. Is it possible to upload that data much easier way?
You could use the AWS-CLI or the AWS-S3 SDK (JS in this example).

How can I move my media files stored in local machine to S3?

I have a Django application running on EC2. Currently, all my media files are stored in the instance. All the documents I uploaded to the models are in the instance too. Now I want to add S3 as my default storage. What I am worried about is that, how am I gonna move my current media files and to the S3 after the integration.
I am thinking of running a Python script one time. But I am looking for any builtin solution or maybe just looking for opinions.
Amazon CLI should do the job:
aws s3 cp path/to/file s3://your-bucket/
or if the whole directory then:
aws s3 cp path/to/dir/* s3://your-bucket/ --recursive
All options can be seen here : https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
The easiest method would be to use the AWS Command-Line Interface (CLI) aws s3 sync command. It can copy files to/from Amazon S3.
However, if there are complicated rules associated with where to move the files, then you certain use a Python script and the upload_file() command.