I have an AWS Lambda function which fully works when tested locally using ATOM, within this it reads and writes to my s3 bucket. However when I upload the function to Lambda it doesn't seem to have access to S3. Whenever I try to read from S3 it simply times out after 3 minutes, even on simple requests like listing buckets.
I have increased the access of "Lambda Basic Execution" to have full admin access, and it still doesn't work.
Any ideas would be greatly appreciated.
According to information provided you have troubles with communication between deployed to server lambda and S3. When running on the local machine, where you have direct access to S3, there is no problem, because it is not isolated environment as in VPC.
Please check your network configuration (especially when lambda runs in VPC, for tests purposes you can disable that in AWS console -> Lambda -> yourLambdaFunction -> Network -> choose No VPC). Image below shows the lambda config for those tests:
Lambda console no VPC config
It was mentioned before. Answers included should solve your problem:
Lambda Timeout while communicating with S3
Please let us know if it helps.
Related
I am looking for a way to trigger the Jenkins job whenever the s3 bucket is updated with a particular file format.
I have tried a lambda function method with an "Add trigger -> s3 bucket PUT method". I have followed this article. But it's not working. I have explored and I have found out that "AWS SNS" and "AWS SQS" also can use for this, but the problem is some are saying this is outdated. So which is the simplest way to trigger the Jenkins job when the s3 bucket is updated?
I just want a trigger, whenever the zip file is updated from job A in jenkins1 to the S3 bucket name called 'testbucket' in AWS enviornment2. Both Jenkins are in different AWS accounts under seperate private VPC. I have attached my Jenkins workflow as a picture. Please refer below picture.
The approach you are using seems solid and a good way to go. I'm not sure what specific issue you are having so I'll list a couple things that could explain why this might not be working for you:
Permissions issue - Check to ensure that the Lambda can be invoked by the S3 service. If you are doing this in the console (manually) then you probably don't have to worry about that since the permissions should be automatically setup. If you're doing this through infrastructure as code then it's something you need to add.
Lambda VPC config - Your lambda will need to run out of the same subnet in your VPC that the Jenkins instance runs out of. Lambda by default will not be associated to a VPC and will not have access to the Jenkins instance (unless it's publicly available over the internet).
I found this other stackoverflow post that describes the SNS/SQS setup if you want to continue down that path Trigger Jenkins job when a S3 file is updated
Those familiar with Amazon bandwidth costs know that network traffic within Amazon cloud is not billed on condition it's in the same region*.
*: Over-simplification, but that's the general rule of thumb.
I have a lambda function that calls S3 signed URL's and does some processing on these s3 objects. Those S3 signed url's are passed, along with other parameters, by my application running on EC2.
Those S3 signed URL's look like this: https://bucket.s3.eu-central-1.amazonaws.com/object/objectid.jpg?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXXXXXX%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=XXXXX&X-Amz-SignedHeaders=host&X-Amz-Expires=1200&X-Amz-Signature=XXXX
I know that in lambda one should use the AWS functions that can be called by AWS sdk api's. Example documentation for Python3 in my case: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
However, I am quite new to Lambda so I want to avoid rebuilding my code with the risk of vendor-lock in.
So to come back to my question: is calling signed AWS s3 URL's sufficient to avoid being billed for outside traffic, or should I really use the Amazon SDK's to call s3 objects?
My Lambda and S3 instances are both in the same region.
Thanks in advance!
Some images which is already uploaded on AWS S3 bucket and of course there is a lot of image. I want to edit and replace those images and I want to do it on AWS server, Here I want to use aws lambda.
I already can do my job from my local pc. But it takes a very long time. So I want to do it on server.
Is it possible?
Unfortunately directly editing file in S3 is not supported Check out the thread. To overcome the situation, you need to download the file locally in server/local machine, then edit it and re-upload it again to s3 bucket. Also you can enable versions
For node js you can use Jimp
For java: ImageIO
For python: Pillow
or you can use any technology to edit it and later upload it using aws-sdk.
For lambda function you can use serverless framework - https://serverless.com/
I have made youtube videos long back. This is related to how get started with aws-lambda and serverless
https://www.youtube.com/watch?v=uXZCNnzSMkI
You can trigger a Lambda using the AWS SDK.
Write a Lambda to process a single image and deploy it.
Then locally use the AWS SDK to list the images in the bucket and invoke the Lambda (asynchronously) for each file using invoke. I would also save somewhere which files have been processed so you can continue if something fails.
Note that the default limit for Lambda is 1000 concurrent executions, so to avoid reaching the limit you can send messages to an SQS queue (which then triggers the Lambda) or just retry when invoke throws an error.
After reading about AWS Lambda I've taken a quite interest in it. Although there is one thing that I couldn't really find any info on. So my question is, is it possible to have lambda work outside Amazon services? Say if I have a database from some other provider, would it be possible to perform operations on it through AWS Lambda?
Yes.
AWS Lambda functions run code just like any other application. So, you can write a Lambda function that calls any service on the Internet, just like any computer. Of course, you'd need access to the external service across the Internet and access permissions.
There are some limitations to Lambda functions, such as functions only running for a maximum of five minutes and only 500MB of local disk space.
So when should you use Lambda? Typically, it's when you wish some code to execute in response to some event, such as a file being uploaded to Amazon S3, data being received via Amazon Kinesis, or a skill being activated on an Alexa device. If this fits your use-case, go for it!
If I create a function to get webpages. Will it execute it on different IP per execution so that my scraping requests dont get blocked?
I would use this AWS pipeline:
Where at source on the left you will have an EC2 instance with JAUNT which then feeds the URLS or HTML pages into a Kinesis Stream. The Lambda will do your HTML parsing and via Firehose stuff everything into S3 or Redshift.
The JAUNT can run via a standard WebProxy service with a rotating IP.
Yes, lambda by default executes with random IPs. You can trigger it using things like event bridge so you can have a schedule to execute the script every hour or similar. Others can possibly recommend using API Gateway, however, it is highly insecure to expose API endpoints available for anyone to trigger. So you have to write additional logic to protect it either by hard coded headers or say oauth.
AWS Lambda doesn't have a fixed IP source as mentioned here
however, I guess this will happen when it gets cooled down, not during the same invocation.
Your internal private IP address isn't what is seen by the web server, it's the public ip address from AWS. As long as you are setting the headers and signaling to the webserver that your application is a bot. See this link for some best practices on creating a web scrapper:
https://www.blog.datahut.co/post/web-scraping-best-practices-tips
Just be respectful of how much data you pull and how often is the main thing in my opinion.
Lambda is triggered when a file is placed in S3, or data is being added to Kinesis or DynamoDB. That is often backwards of what a web scraper needs, though certainly things like S3 could perform as a queue/job runner.
Scraping on different IPs? Certainly lambda is deployed on many machines, though that doesn't actually help you, since you can't control the machines or their IP.