Aws extract tar.gz on S3 - amazon-web-services

As i'm New on aws and a little confused by all the similar services, I would like to have some leads and know if I am in the right direction.
I have tar.gz archives stored on aws s3 glacier deep archives. I would like that when requesting a restore, the archive is automatically extracted and the folders and files it contains put in s3 (with an expiration date).
these archives are too big to be extracted via lambda (300GB or more).
My idea would be to trigger a lambda function when the restore is complete and use that lambda function to start another aws service that does the extraction. I was thinking either aws batch or fargate. Which service do you think is the most suitable? For this kind of simple task it is preferable to use an arm architecture?
If someone has already done this before and has codes to share I'm interested (if not I'll try to put my final solution here for others).

Related

How to automate cleaning of data in AWS using Jupyter Notebook

I have a Jupyter Notebook file that cleans the data file (.csv) in S3. The cleaning process is taken care of...
However, I want to be able to automatically apply this cleaning process to every file that is uploaded to the S3 bucket. Each file will have the exact same data format. I am thinking maybe of using AWS Glue, but not sure where to start. If we can skip the upload to S3 and go straight into Glue that would be interesting to explore...
The end goal is to load the clean data in Quick Sight and also AWS Sage Maker for ML applications.
Any advice on how to approach this?
Thanks
A simple AWS Lambda function with an AWS EventBridge Rule can do this. Check out this link: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule-schedule.html
Use the above link to set up the cron or event(whatever is your use case), write you s3 clean up code in lambda function. Lambda support multiple programming languages. And then let the lambda take care of it.

Is there any service on AWS that can help me convert mp4 files to mp3?

I'm new to Amazon web services and I'm wondering if the platform offers any solution to convert media files to different formats ( mp4 to mp3) or do I have to use a lambda function with a third party library to achieve this.
Thank you !
You can get up and running quickly with Elastic Transcoder. You will need to:
create two s3 buckets, your 'inbox' and 'outbox'
add a transcoder pipeline specifying which bucket is your in/out buckets, and you what file types you want to transcode from and two.
you can set up a trigger so that every time something hits the in bucket, the process runs, or you can place something in the in bucket and use the sdk or cli to trigger a job.
Two things to note:
When you fire a job, you have to pass in the name of the file that will be created. If the file already exists in the out bucket, an error will be thrown.
As with all of aws' complete services, you get a little free up front, then it gets expensive. Once you get the hang of it, you can save some money rolling your own in lambda like this

Extract .gz files in S3 automatically

I'm trying to find a solution to extract ALB logs file in .gz format when they're uploaded automatically from ALB to S3.
My bucket structure is like this
/log-bucket
..alb-1/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-2/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-3/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
Basically, every 5 minutes, each ALB would automatically push logs to correspond S3 bucket. I'd like to extract new .gz files right at that time in same bucket.
Is there any ways to handle this?
I noticed that we can use Lambda function but not sure where to start. A sample code would be greatly appreciated!
Your best choice would probably be to have an AWS Lambda function subscribed to S3 events. Whenever a new object gets created, this Lambda function would be triggered. The Lambda function could then read the file from S3, extract it, write the extracted data back to S3 and delete the original one.
How that works is described in Using AWS Lambda with Amazon S3.
That said, you might also want to reconsider if you really need to store uncompressed logs in S3. Compressed files are not only cheaper, as they don't take as much storage space as uncompressed ones, but they are usually also faster to process, as the bottleneck in most cases is network bandwidth for transferring the data and not available CPU-resources for decompression. Most tools also support working directly with compressed files. Take Amazon Athena (Compression Formats) or Amazon EMR (How to Process Compressed Files) for example.

how to develop a Lambda based S3 file management application

I am planning to develop a web application which can perform some basic text edit functions (like insert and delete) on S3 files. Could anyone show me a path forward? I am currently learning Lambda, and have followed tutorial here: http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
I can create a Lambda function which can modify files on S3, and call the function by AWS CLI now. What else do I need to know and do to create this web application? Thank you very much.
You would need to look at AWS API Gateway. This can be the front end to your web application.
Also note that S3 is a block storage mechanism, and if your file edits are too frequent it is not suitable for your use case because every time you want to edit the text you will have to download the entire file, modify that and upload that back again. And be mindful of the S3 eventual consistency
Amazon S3 Data Consistency Model
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

S3 to RDS file management system

I'm new to AWS and have a feasibility question for a file management system I'm trying to build. I would like to set up a system where people will use the Amazon S3 browser and drop either a csv or excel file into their specific bucket. Then I would like to automate the process of taking that csv/excel file and inserting that into a table within RDS. Now this is assuming that the table has already been built and those excel/csv file will always be formatted the same and will be in the same exact place every single time. Is it possible to automate this process or at least get it to point where very minimal human interference is needed. I'm new to AWS so I'm not exactly sure of the limits of S3 to RDS. Thank you in advance.
It's definitely possible. AWS supports notifications from S3 to SNS, which can be forwarded automatically to SQS: http://aws.amazon.com/blogs/aws/s3-event-notification/
S3 can also send notifications to AWS Lambda to run your own code directly.