Save output file into AWS S3 Bucket - amazon-web-services

Using the FFmpeg I'm trying to output file to the s3 bucket.
ffmpeg -i myfile.mp4 -an -crf 20 -vf crop=200:200 -s 800x600 -f mp4 pipe:1 | aws s3 cp - s3://my.test.bucket
As I'm already advised that this cannot be done since creating an mp4 file requires seeking and piping doesn't allow seeking. if I change this command to store the file on the local disk
ffmpeg -i myfile.mp4 -an -crf 20 -vf crop=200:200 -s 800x600 myfile.mp4
it will store locally under project root folder which is fine.
But since I'm running my app from the container and the ffmpeg itself is installed in the Dockerfile I'm trying to figure out what are the possible options here? (if mp4 cannot be stored on S3 from ffmpeg command).
I need to download the output file myfile.mp4 into the server path if I use IWebHostEnvironment where it would actually be saved? is it inside container? Can I mount some s3 bucket folder into docker file and use it from the actual ffmpeg command again?
Since my input file is on s3 bucket and I want my output file to be on the same s3 bucket is there any solution where I wouldn't need to download the output file from the ffmpeg and upload it again?
I guess this is a lot of questions but I feel like I run into a rabbit hole here.

There are really a lot of questions. :D
To make it fair, a few questions from me, to see if I understand everything.
Where is your docker container running? Lambda, ec2 machine, kubernetes cluster?
If it is on ec2, you can use https://aws.amazon.com/efs/ but....
Can you simply save the file in /tmp? And then make an aws s3 cp command from tmp folde ?
In some environments (for example lambda), /tmp was the only place where I had programmatically access to file system.
Although if I understand correctly, you have write rights in your environment? Because you download the original image from s3 bucket. So can you do something like this?
download source file from s3
create new file with ffmpeg
uploaded the file to s3

Related

How exactly do I use gsutil to download a Google Cloud Storage bucket to a local disk?

I am trying to download a full bucket from my Google Cloud Storage. I am using gsutil and the CLOUD SHELL Terminal.
My current piece of code receives and error: "CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command."
The code is:
gsutil -m cp -r gs://googleBucket D:\GOOGLE BACKUP
where googleBucket is the bucket and D:\GOOGLE BACKUP is the directory to my desired download location. Am I missing something here?
Any help is appreciated.
P.S. I am in no way tech savvy, and most of this is new to me.
download this way first
gsutil -m cp -r gs://googleBucket .
The . downloads it to current directory. Do an ls and you will see the download
Then go to the 3 dots and download locally. The 3 dots is to the right of open editor.

Downloaded csv.gzip from S3 bucket and it's in hieroglyphs (encoding error)

aws s3 cp s3://bucketname/path/to/file/filename.csv.gz . --content-encoding gzip
I'm just trying to download a compressed csv file from a bucket that we don't control but have permissions to. I ran the above and the file downloads but is not viable. The result is in the picture below.
How can I download a viable file?
The object in question still needs to be decompressed.
Try performing the following command instead aws s3 cp s3://bucketname/path/to/file/filename.csv.gz ---content-encoding gzip | gzip -d to automatically decompress it on the way out
You can download the file as it is. It will be download as a csv file but with a compressed content. So, you can rename the file as a gz file and then decompress it. That will solve the problem.
If you are using terminal commands and the downloaded file name is x.csv
mv x.csv x.gz
gzip -d x.gz

Google cloud vm stop or freeze when extracting a .7z file

Resently I am working with Google Cloud Compute Engine to train a ml model
So I am tring to extract a .7z fike that has the data.
But it is too big and the machine even freezes or stops for uncatching error
I am using the Linux command below:
!7zr 'path of the file'
Any help to be able extracting the file ... Thanks in advance
You could try it by using GCS
Create a directory that only has the compressed file in it and nothing else,
yourdir/myfile.7z
Create an environment variable MYFILE=myfile.7z
Create a bucket on GCS using the gsutil cli:
gsutil mb gs://yourbucket/MY_DIR_FOR_ZIP_FILE
Next you upload the file to the bucket, like so
gsutil cp -m -v $MYFILE gs://MYBUCKET/MY_DIR_FOR_ZIP_FILE
Within the VM you can now download the file, again using gsutil cli
gsutil cp -m -v gs://MYBUCKET/MY_DIR_FOR_ZIP_FILE /YOU_DIR
Then extract and also remove the compresses file,
7z x $MYFILE && rm -v $MYFILE
You should now have the uncompressed file on the VM
Make sure to use the -m flag this will perform a parallel (multi-threaded/multi-processing) copy.
Here is the reference cp - Copy files and objects
Using the gsutil tool
The instructions above assumes that the size of your data is less than 1TB, and also you are using a VM with a disk size large enough to accomadate the data.
If your data is more than 1TB, you will need to use Transfer service for on-premises data.
The steps to follow when setting up transfer jobs are listed here
Creating a transfer job

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname

AWS CLI - is there a way to extract tar.gz from S3 to home without storing the tar.gz?

To elaborate,
There is a tar.gz file on my AWS S3, let's call it example.tar.gz.
So, what I want to do is download the extracted contents of example.tar.gz to /var/home/.
One way to do it is to simply download the tar.gz, extract it, then delete the tar.gz.
However, I don't want to use space downloading the tar.gz file, I just want to download the extracted version or only store the extracted version.
Is this possible?
Thanks!
What you need is the following:
aws s3 cp s3://example-bucket/file.tar.gz - | tar -xz
This will stream the file.tar.gz from s3 and extract it directly (in-memory) to the current directory. No temporary files, no extra storage and no clean up after this one command.
Make sure you write the command exactly as above.
Today I tested with Python Boto 3 and aws cli and I noticed that tar.gz is extracted automatically when the file is downloaded
There isn't currently a way you can do this with S3.
You could create the following script though and just run it whenever you wish to download the tar. Just as long as you have the IAM role / access keys setup.
!#/bin/bash
aws s3 cp s3://$1/$2 $3
tar -xvf $3
rm $3
Then just call the script using ./myScript BUCKET_NAME FILE_LOCATION OUTPUT_FILE