When I open AWS Notebook Instance-> Jupyter Notebook. It gives me a storage (probably called an S3 bucket). I created a folder there and tried to upload 1000s of data. However, it asks me to manually click on the upload button next to every single file. Is it possible to upload that data much easier way?
You could use the AWS-CLI or the AWS-S3 SDK (JS in this example).
Related
Use case:
I have one directory on-premise, I want to make a backup for it let's say at every midnight. And want to restore it if something goes wrong.
Doesn't seem a complicated task,but reading through the AWS documentation even this can be cumbersome and costly.Setting up Storage gateway locally seems unnecessarily complex for a simple task like this,setting up at EC2 costly also.
What I have done:
Reading through this + some other blog posts:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
What I have found:
1.Setting up file gateway (locally or as an EC2 instance):
It just mount the files to an S3. And that's it.So my on-premise App will constantly write to this S3.The documentation doesn't mention anything about scheduled backup and recovery.
2.Setting up volume gateway:
Here I can make a scheduled synchronization/backup to the a S3 ,but using a whole volume for it would be a big overhead.
3.Standalone S3:
Just using a bare S3 and copy my backup there by AWS API/SDK with a manually made scheduled job.
Solutions:
Using point 1 from above, enable versioning and the versions of the files will serve as a recovery point.
Using point 3
I think I am looking for a mix of file-volume gateway: Working on file level and make an asynchronus scheduled snapshot for them.
How this should be handled? Isn't there a really easy way which will just send a backup of a directory to the AWS?
The easiest way to backup a directory to Amazon S3 would be:
Install the AWS Command-Line Interface (CLI)
Provide credentials via the aws configure command
When required run the aws s3 sync command
For example
aws s3 sync folder1 s3://bucketname/folder1/
This will copy any files from the source to the destination. It will only copy files that have been added or changed since a previous sync.
Documentation: sync — AWS CLI Command Reference
If you want to be more fancy and keep multiple backups, you could copy to a different target directory, or create a zip file first and upload the zip file, or even use a backup program like Cloudberry Backup that knows how to use S3 and can do traditional-style backups.
Is it possible to open a Python script on the s3 bucket, make changes to it, save it and run it?
For example, if I SSH into a server on FileZilla, I can access the scripts on it, make changes to it, save and run it without having to download the script, making the changes and uploading every time I want to make changes.
Is there a way to do the same for scripts on the s3 bucket as well?
Unfortunately this is not possible to do it directly on S3. S3 is only a object-based storage solution, not a file system.
You have to download it somewhere, edit, and then upload new version to S3. You can have some third-party tools that do this in the background.
One of such third-party tool is s3fs-fuse:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
Among many other things, s3fs-fuse allows:
allows random writes and appends
I have a previously created jupyter notebook that I'd like to run on the Google Cloud Platform.
I currently have a notebook instance running on a GCP VM and it works fine. I was also able to create a storage bucket and upload all dataset and notebook files to the bucket. However, these files don't show up in the Jupyter Notebook directory tree. I know I can access the dataset files using something like...
client = storage.Client()
bucket = client.getbucket('name-of-bucket')
blob = storage.Blob( 'diretory/to/files', bucket )
fid = BytesIO(blob.downloadas_string())
But I'm not sure how to actually serve up a notebook file to use, and I really don't feel like copying and pasting all my previous work.
All help appreciated!
Very simple. You can upload directly from within the Jupyter Notebook and bypass the bucket if desired (the icon with the up arrow).
Jupyter Notebook upload icon image
The only issue with this is you can't upload folders, so zip the folder first then upload it.
You can use Jupyter Lab's git extension to host your Notebooks in GitHub and pull them from there.
Fyi, if you use GCP's AI Platform Notebooks you'll get a pre-configured Jupyter environment with many ML/DL libraries pre-installed. That git extension will be pre-installed as well.
I need to transfer all our files (With folders structure) to AWS S3. I have researched lot about how this done.
Most of the places mentioned s3fs. But looks like this is bit old. And I have tried to install s3fs to my exsisting CentOS 6 web server. But its stuck on $ make command. (Yes there is Makefile.in)
And as per this answer AWS S3 Transfer Acceleration is the next better option. But still I have to write a PHP script (My application is PHP) to transfer all folders and files to S3. It is working same as how file save in S3 (API putObject), but faster. Please correct me if I am wrong.
Is there any other better solution (I prefer FTP) to transfer 1TB files with folders from CentOS 6 server to AWS S3? Is there any way to use FTP client in EC2 to transfer files from outside CentOS 6 to AWS S3?
Use the aws s3 sync command of the AWS Command-Line Interface (CLI).
This will preserve your directory structure and can be restarted in case of disconnection. Each execution will only copy new, changed or missing files.
Be aware that 1TB is a lot of data and can take significant time to copy.
An alternative is to use AWS Snowball, which is a device that AWS can send to you. It can hold 50TB or 80TB of data. Simply copy your data to the device, then ship it back to AWS and they will copy the data to Amazon S3.
I have a website hosted on ec2 instance(tomcat) and it has an upload image facility. My intention is to switch to CloudFront to reduce the load time of the website. Images on the website are loaded from a directory called "images" and the name of images are stored in database. when a page is loaded the name of the image is loaded from database and then the image is loaded. I can copy the images directory to s3 instance manually. However when an image is uploaded, a entry in database is made, but the "images" directory in s3 instance remain outdated. Need something so that s3 directory updates as soon as image is uploaded. I am new to s3 and CloudFront. Pleas Help!
You need to achieve this using AWS CLI and a cron job that continuously runs on your ec2 instance.
Install AWS CLI in your EC2 instance
Start a Cron job with below command
aws s3 sync [path-to-image-directory]/* s3://mybucket
And your images will go automatically to AWS s3.