how to download file on aws glacier without json format - amazon-web-services

In AWS Glacier , when we initiate job, after 4+ hours we can download it, but download format is supported only json format, so how i can download my original files ? is it by using get-job-output or any other things ?

Related

Convert VOB/BUP files to .mp4 and store them in S3 bucket

So as the title says, I have a couple of files in the VOB/BUP format that I need to convert first to .mp4 (I also have .IFO files and I don't know what that is) and then check for a public url to display them (S3 Bucket) But I don't know which one is the correct service.
I have read about MediaConvert, but I'm not quite sure this is the right service for my need.
Thanks in advance for any tips.
VOB/BUP/IFO files are typically found on a DVD where:
IFO files are an index and hold information about the disc contents
BUP files are backup versions of the IFO files
VOB files hold the video and audio content
AWS Elemental MediaConvert does not support these as an input (1).
To convert these, you can consider leveraging a different tool that is capable, for example FFMPeg.
Here is an example batch script you can reference that does this:
https://gist.github.com/andreasbotsikas/8bad3df5309dd0383f2e2c450b22481c
You can also potentially have this workflow run on AWS by using AWS Lambda to run FFMPeg (2).
References:
Supported input codecs and containers : https://docs.aws.amazon.com/mediaconvert/latest/ug/reference-codecs-containers-input.html
Processing user-generated content using AWS Lambda and FFmpeg : https://aws.amazon.com/blogs/media/processing-user-generated-content-using-aws-lambda-and-ffmpeg/

process non csv, json and parquet files from s3 using glue

Little disclaimer have never used glue.
I have files stored in s3 that I want to process using glue but from what I saw when I tried to start a new job from a plain graph the only option I got was csv, json and parquet file formats from s3 but my files are not of these types. Is there any way processing those files using glue? or do I need to use another aws service?
I can run a bash command to turn those files to json but the command is something I need to download to a machine if there any way i can do it and than use glue on that json
Thanks.

AWS MediaConvert: Delete original file after transcoding

I'm using AWS Media Convert to convert my videos to other formats. I want to be able to delete the original media files as soon as the transcoding is done..How is this possible??
I'm using Boto3 SDK

Spark doesn't output .crc files on S3

When I use spark locally, writing data on my local filesystem, it creates some usefull .crc file.
Using the same job on Aws EMR and writing on S3, the .crc files are not written.
Is this normal? Is there a way to force the writing of .crc files on S3?
those .crc files are just created by the the low level bits of the Hadoop FS binding so that it can identify when a block is corrupt, and, on HDFS, switch to another datanode's copy of the data for the read and kick off a re-replication of one of the good copies.
On S3, stopping corruption is left to AWS.
What you can get off S3 is the etag of a file, which is the md5sum on a small upload; on a multipart upload it is some other string, which again, changes when you upload it.
you can get at this value with the Hadoop 3.1+ version of the S3A connector, though it's off by default as distcp gets very confused when uploading from HDFS. For earlier versions, you can't get at it, nor does the aws s3 command show it. You'd have to try some other S3 libraries (it's just a HEAD request, after all)

Decompress a zip file in AWS Glue

I have a compressed gzip file in an S3 bucket. The files will be uploaded to the S3 bucket daily by the client. The gzip when uncompressed will contain 10 files in CSV format, but with the same schema only. I need to uncompress the gzip file, and using Glue->Data crawler, need to create a schema before running a ETL script using a dev. endpoint.
Is glue capable to decompress the zip file and create a data catalog. Or any glue library available which we can use directly in the python ETL script? or should I opt for an Lambda/any other utility so that as soon as the zip file is uploaded, I run a utility to decompress and provide as a input to Glue?
Appreciate any replies.
Glue can do decompression. But it wouldn't be optimal. As gzip format is not splittable (that mean only one executor will work with it). More info about that here.
You can try to decompression by lambda and invoke glue crawler for new folder.
Use gluecontext.create_dynamic_frame.from_options and mention compression type in connection options. Similarly output can also be compressed while writing to s3. The below snippet worked for bzip, please change format to gz|gzip and try.
I tried the Target Location in UI of glue console and found bzip and gzip are supported in writing dynamic_frames to s3 and made changes to the code generated to read a compressed file from s3. In docs it is not directly available.
Not sure about the efficiency. It took around 180 seconds of execution time to read, Map transform, change to dataframe and back to dynamicframe for a 400mb compressed csv file in bzip format. Please note execution time is different from start_time and end_time shown in console.
datasource0 = glueContext.create_dynamic_frame
.from_options('s3',
{
'paths': ['s3://bucketname/folder/filename_20180218_004625.bz2'],
'compression':'bzip'
},
'csv',
{
'separator': ';'
}
)
I've written a Glue Job that can unzip s3 files and put them back in s3.
Take a look at https://stackoverflow.com/a/74657489/17369563