synch gs bucket with s3 bucket (lambda style) - amazon-web-services

Simple problem, i have got a google bucket which gets content 3 times a day from an external provider. I want to fetch this content as soon as it arrives and push it onto a S3 bucket. I have been able to achieve this via running my python scripts as a cron job. But I have to provide high availability and such if i follow this route.
My idea was to set this up in aws lambda, so I don't have to sweat the infrastructure limitations. Any pointers on this marriage between gs and lambda. I am not a native Node speaker so any pointers will be really helpful.

GCS can send object notifications when an object is created/updated. You can catch the notifications (which are HTTP post requests) by a simple web app hosted on GAE, and then handle the file transfer to S3. Highly available, event driven solution.

Related

Work around for handling CPU Intensive task in aws ec2?

I have created a django application (running on aws ec2) which convert media file from one format to another format ,but during this process it consume CPU resource due to which I have to pay charges to aws.
I am trying to find a work around where my local pc (ubuntu) takes care of CPU intensive task and final result is uploaded to s3 bucket which I can share with user.
Solution :- One possible solution is that when user upload media file (html upload form) it goes to s3 bucket and at the same time via socket connection the s3 bucket file link is send to my ubuntu where it download file, process it and upload back to s3 bucket.
Could anyone please suggest me better solution as it seems to be not efficient.
Please note :- I have decent internet connection and computer which can handle backend very well but i not in state to pay throttle charges to aws.
Best solution for this is to create separate lambda function for this task. Trigger lambda whenever someone upload files on S3. Lambda will process files and store back to S3.

AWS tech stack solution for a static website

I have a project where I am building a simple single page app, that needs to pull data from an api only once a day. I have a backend that I am thinking of building with golang, where I need to do 2 things:
1) Have a scheduled job that would once a day update the DB with the new data.
2) Serve that data to the frontend. Since the data would only be updated once a day, I would like to cache it after each update.
Since, the number of options that AWS is offering is a bit overwhelming, I am wondering what would be the ideal solution for this scenario. Should I use lambda that connects to DB and updates it with a scheduled job? Should I create then a separate REST API lambda where I would pull that data from the DB and call it from the frontend?
I would really appreciate suggestions for this problem.
Her is my suggestion;
Create a lambda function
it will fetch required information from database
You may use S3 or DynamoDB to save your content. Both of the solutions may be free please check for free tier offers depending on your usage
it will save the fetched content to S3 or DynamoDB (you may check Dax for DynamoDB caching)
Create an Api gateway and integrate it to your lambda (Elastic LoadBalancer is another choice)
Create a Schedule Expressions on CloudWatch to trigger lambda daily
Make a request from your front end to Api Gateway or ELB.
You may use Route 53 for domain naming.
Your lambda should have two separate functions, one is to respond schedule expression, the other one is to serve your content via communicating with S3/DynamoDB.
Edit:
Here is the architecture
Edit:
If the content is going to be static, you may configure a S3 bucket for static site serving and your daily lambda may write it in there when it is triggered. Then you no longer need api gateway and DynamoDB.
here is the documentation for s3 static content

passing additional values to s3 event notification for lambda consumption

I have to write code in react-native that allows a user to upload videos to amazon s3 to be transcoded for consumption by various devices. For the processing after the upload occurs; I am reviewing two approaches:
1) I can use Lambda with ffmpeg to handle the transcoding immediately after the uploading occurs (my fear here would be the amount of time required to transcode the videos and the effect on pricing if it takes a considerable amount of time).
2) I can have s3 pass an sns message to a rest api after the created event occurs and the rest api generate a rabbitmq message that will be processed by worker that will perform the transcoding using ffmpeg.
Option 1) seems to be the preferable option based on a completion time perspective. How concerned should I be with using 1) considering how long video transcoding might take as opposed to option 2)?
Also, regardless, I need a way to pass additional parameters to lambda or along the sns messaging that would allow me to somehow associate the user who uploaded the video with their account. Is there a way to pass additional text-based values to s3 to pass along to lambda or along sns when the upload completes, as a caveat I plan to upload the video directly to s3 using the rest layer(found this answer here: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html#RESTObjectPUT-responses-examples)?
AWS provides a video transcoding service for exactly this type of thing. If you don't want to do that for some reason then you need to make sure you can complete your transcoding tasks in AWS Lambda in under 5 minutes. Not sure where the second option of using RabbitMQ and workers is coming from. Why RabbitMQ instead of SQS? Would workers be processes on EC2 servers instead of Lambda functions?
Regarding your other question, you need to pass those extra parameters as metadata fields on the S3 object. In the document you linked, look at how x-amz-meta- works. Then when you later retrieve the object from S3 to transcode it you can retrieve the metadata fields at the same time.

how to develop a Lambda based S3 file management application

I am planning to develop a web application which can perform some basic text edit functions (like insert and delete) on S3 files. Could anyone show me a path forward? I am currently learning Lambda, and have followed tutorial here: http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
I can create a Lambda function which can modify files on S3, and call the function by AWS CLI now. What else do I need to know and do to create this web application? Thank you very much.
You would need to look at AWS API Gateway. This can be the front end to your web application.
Also note that S3 is a block storage mechanism, and if your file edits are too frequent it is not suitable for your use case because every time you want to edit the text you will have to download the entire file, modify that and upload that back again. And be mindful of the S3 eventual consistency
Amazon S3 Data Consistency Model
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

How can I email error logs from AWS Spark

I have a process that uses AWS EMR to run a pyspark cluster.
I have a S3 location where all the process logs gets stored.
I want to understand that is there a way I can filter out ERROR logs and get them mailed to my inbox. I do not want to save any log file on my system.
Is there any python library which can help me monitor real time logs. I have seen the boto3 and EMR library, but I could not find a answer to my problem from there.
The EMR logs will likely be buffered up into chunks of a few minutes or some size before being written to S3 ( but full disclosure, that's based on experience with other AWS S3 logging systems, not EMR itself).
If I were attempting to solve this problem, I'd use an AWS Lambda function to execute python that would read the S3 logs line by line and filter for the lines matching ERROR, and then use SNS to send the logs to your email address. You can use S3 events to automatically trigger the Lambda when objects are written to the S3 logging location for EMR, so this is as close to realtime as you're gonna get.
The architecture I am suggesting looks something like this
EMR -> S3 -> Lambda -> SNS -> email inbox
The write of each EMR log to s3 triggers a lambda which uses boto3
to filter the log for error messages, sending alerts to an SNS topic for distribution to users.
It may seem like a lot of moving parts but it won't require much to maintain it and should cost you only a few cents a month more than the S3 storage is already costing you. And the effort for the whole thing is actually pretty small.
Furthermore, you won't need:
a place to execute your code, servers to manage, etc
nontrivial deployment model for your project
any parts not shown above, for that matter
And you'll get for free:
Monitoring in the form of
cloudwatch metrics for lambda,
s3 logs (should you enable them)
cloudwatch logs that store your function's execution windows and stdout.
Easy integration into alerting through cloudwatch Alarms ( these typically integrate well with Pager Duty and the like )
dead-simple exensibility, such as
SNS can send SMS messages to your phone
add more parsing options in the lambda and redeploy
expose cloudwatch metrics and add alarms for thresholds
write the summary to S3 for pre signed email or sms links, or further processing now or later
You could send the email yourself through SES or just manually with python, but I would rather use SNS so that the subscriptions to the topic can vary independently from the python code.
Lambdas are a little intimidating to start with, but they'll include the boto3 sdk by default (which should obviate the need for a zipfile with pip dependencies all together ), which will simplify creation.
For that matter, you can set all this stuff up in the AWS console if you like doing things by dragging mouse pointers around, or intend to do it only a few times, or you can express all if it in cloudformation if you need something repeatable.
http://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
http://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
http://docs.aws.amazon.com/sns/latest/dg/welcome.html