I have to write code in react-native that allows a user to upload videos to amazon s3 to be transcoded for consumption by various devices. For the processing after the upload occurs; I am reviewing two approaches:
1) I can use Lambda with ffmpeg to handle the transcoding immediately after the uploading occurs (my fear here would be the amount of time required to transcode the videos and the effect on pricing if it takes a considerable amount of time).
2) I can have s3 pass an sns message to a rest api after the created event occurs and the rest api generate a rabbitmq message that will be processed by worker that will perform the transcoding using ffmpeg.
Option 1) seems to be the preferable option based on a completion time perspective. How concerned should I be with using 1) considering how long video transcoding might take as opposed to option 2)?
Also, regardless, I need a way to pass additional parameters to lambda or along the sns messaging that would allow me to somehow associate the user who uploaded the video with their account. Is there a way to pass additional text-based values to s3 to pass along to lambda or along sns when the upload completes, as a caveat I plan to upload the video directly to s3 using the rest layer(found this answer here: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html#RESTObjectPUT-responses-examples)?
AWS provides a video transcoding service for exactly this type of thing. If you don't want to do that for some reason then you need to make sure you can complete your transcoding tasks in AWS Lambda in under 5 minutes. Not sure where the second option of using RabbitMQ and workers is coming from. Why RabbitMQ instead of SQS? Would workers be processes on EC2 servers instead of Lambda functions?
Regarding your other question, you need to pass those extra parameters as metadata fields on the S3 object. In the document you linked, look at how x-amz-meta- works. Then when you later retrieve the object from S3 to transcode it you can retrieve the metadata fields at the same time.
Related
I'd like to peform the following tasks on a regular basis (e.g. every day at 6AM) using AWS:
get new set of data using API. This dataset is updated on a daily basis.
run a python script that would process the obtained dataset by the means of several python libraries like matplotlib, pandas, plotly
automatically send the output of the script, which would be a single pdf file or a html dashboard, via email to a group of specified recipients
I know how to perform all of the above items locally - my goal is to automate this routine. I'm new to AWS and would appreciate some advice on how to perform these tasks in a straightforward way. Based on the reading I did so far, it looks like the serverless approach may be able to do the job and also reduce the complexity, but I'm not sure which functionalities exactly I should use.
For scheduling you can use aws event bridge.
You can schedule AWS lambda or AWS Step Functions both of these are serverless :).
You can have 3 lambdas
To get the data and save it in S3/dynamo (if you want to persist the data)
Processor lambda and save the report to S3.
Another lambda to send email using AWS SES which will read the report from S3 and send it.
If you don't want to use step function you can start your lambda from S3 put event or you can trigger one lambda from another lambda using aws-sdk.
So there are different approaches you can take.
First off, I would create a Lambda. You can schedule the function to run on a cron job.
If the Message you want to send is small:
I would create a SNS Topic with a email fan out.
Inside your lambda you can then transform the data and send out via SNS.
Otherwise:
I would use SES and send a mail via the SES SDK.
folks.
There's a question I've recently faced, which brought some concerns and hesitations. I'm creating an "almost serverless" micro-service using AWS. Here is its workflow: Workflows options
The thing is the input message may be large, but AWS SQS limits message size to 256 Kb. So, I decided to use S3 and S3 notifications to handle inputs: client PUTs an object; its creation triggers Lambda functions and so on. In that way, 256 Kb limits are not relevant, but on the other hand, I'm using storage service as an integration one. One of the concerns is a dead letter queue handling f.e.
Maybe someone has faced similar problems. One of the things is to keep "serverless". Are there any good solutions/improvements/advice?
Thanks in advance.
I would recommend combining the two approaches:
Write the data to Amazon S3
Create a message in the Amazon SQS queue that includes a reference to the data in S3
This way, you have the benefits of using a queue, with additional storage.
If all the data you require is already in the file, then you can configure an Amazon S3 Event to create the SQS message directly in the queue. The message will include the name of the bucket and the key of the object. Thus, putting the file in S3 will create the SQS message and trigger the AWS Lambda function. This is more scalable than directly triggering the Lambda function from S3.
Have you considered using Kinesis Stream, then attaching your lambda to the stream with a size of 1? You could also handle your dead-letter with shard time stamps etc.
Or if you are able to manipulate the original message place your timestamp inside the message then you can easily utilize kinesis firehose and bulk load messages. Limitation on Kinesis is 2mb so will give you almost 10x the message size on sqs which you could compress further.
I have an S3 bucket with different files. I need to read those files and publish SQS msg for each row in the file.
I cannot use S3 events as the files need to be processed with a delay - put to SQS after a month.
I can write a scheduler to do this task, read and publish. But can I was AWS for this purpose?
AWS Batch or AWS data pipeline or Lambda.?
I need to pass the date(filename) of the data to be read and published.
Edit : The data volume to be dealt is huge
I can think of two ways to do this entirely using AWS serverless offerings without even having to write a scheduler.
You could use S3 events to start a Step Function that waits for a month before reading the S3 file and sending messages through SQS.
With a little more work, you could use S3 events to trigger a Lambda function which writes the messages to DynamoDB with a TTL of one month in the future. When the TTL expires, you can have another Lambda that listens to the DynamoDB streams, and when there’s a delete event, it publishes the message to SQS. (A good introduction to this general strategy can be found here.)
While the second strategy might require more effort, you might find it less expensive than using Step Functions depending on the overall message throughput and whether or not the S3 uploads occur in bursts or in a smooth distribution.
At the core, you need to do two things:
Enumerate all of the object in a bucket in S3, and perform some action on any object uploaded more than a month ago.
Can you use Lambda or Batch to do this? Sure. A Lambda could be set to trigger once a day, enumerate the files, and post the results to SQS.
Should you? No clue. A lot depends on your scale, and what you plan to do if it takes a long time to perform this work. If your S3 bucket has hundreds of objects, it won't be a problem. If it has billions, your Lambda will need to be able to handle being interrupted, and continuing paging through files from a previous run.
Alternatively, you could use S3 events to trigger a simple Lambda that adds a row to a database. Then, again, some Lambda could run on a cron job that asks the database for old rows, and publishes that set to SQS for others to consume. That's slightly cleaner, maybe, and can handle scaling up to pretty big bucket sizes.
Or, you could do the paging through files, deciding what to do, and processing old files all on a t2.micro if you just need to do some simple work to a few dozen files every day.
It all depends on your workload and needs.
I'm new to Amazon web services and I'm wondering if the platform offers any solution to convert media files to different formats ( mp4 to mp3) or do I have to use a lambda function with a third party library to achieve this.
Thank you !
You can get up and running quickly with Elastic Transcoder. You will need to:
create two s3 buckets, your 'inbox' and 'outbox'
add a transcoder pipeline specifying which bucket is your in/out buckets, and you what file types you want to transcode from and two.
you can set up a trigger so that every time something hits the in bucket, the process runs, or you can place something in the in bucket and use the sdk or cli to trigger a job.
Two things to note:
When you fire a job, you have to pass in the name of the file that will be created. If the file already exists in the out bucket, an error will be thrown.
As with all of aws' complete services, you get a little free up front, then it gets expensive. Once you get the hang of it, you can save some money rolling your own in lambda like this
Simple problem, i have got a google bucket which gets content 3 times a day from an external provider. I want to fetch this content as soon as it arrives and push it onto a S3 bucket. I have been able to achieve this via running my python scripts as a cron job. But I have to provide high availability and such if i follow this route.
My idea was to set this up in aws lambda, so I don't have to sweat the infrastructure limitations. Any pointers on this marriage between gs and lambda. I am not a native Node speaker so any pointers will be really helpful.
GCS can send object notifications when an object is created/updated. You can catch the notifications (which are HTTP post requests) by a simple web app hosted on GAE, and then handle the file transfer to S3. Highly available, event driven solution.