I am implementing an API using Amazon lambda function and API Gateway this lambda function will, in turn, call another 3rd party API and will transform that data into a specific format and will return it.
The 3rd part API that I am using to fetch records has pagination and throttling enabled but the API I am building using lambda and API Gateway I don't want to implement pagination in it rather I want this API to get all the pages one by one transform them in the specific format and return at once. The Client of this API should not have to call it with different pagination parameters.
Now as Lambda function has a maximum of 15-minute limit and the 3rd part API also has a max request per minute limit what is the best way to implement this.
This is how I am doing it right now, in my lambda function I push a specific number of requests in promises and when the max number is reached I stop pushing more and execute the pending promises and set a timeout function for one minute meanwhile the pending promises executes and I generate the response from them but don't send it back as there are pending requests to be made. When the timeout completes I again push a specific amount of requests in promises and repeat the process.
Once all the pages are completed I return the data.
Now the problem is this may exceed more than 15 minutes and the lambda function would terminate.
Is there a better approach for this, even by using some other amazon services.
Related
I have AWS lambda function that gets details using multiple ids via rest API. The problem is the API only accept 1 id at a time/per call. Per my observation, the job can only cater around 30 ids else the job won’t finish or would max my 10 mins time limit. Currently, my ids can go as high as 200 ids per job process so I’m thinking of a way how I can resolve this issue.
So far I’m thinking of using step function so I can asynchronously run the job and just chunked my ids into multiple payload but I’m not sure how I can pass ids/payload from lambda to step function. Another solution I’m thinking is I can invoke the same lambda with chunked ids but i’m afraid that recursive would happen.
Any other suggestions or AWS services I can use to fix this?
I would have a process that dumps all the IDs into an SQS queue. Then have a Lambda function that uses the SQS queue as an event source. Lambda will then automatically spin up multiple instances of your Lambda function, passing each one a batch of IDs to process.
I have a use-case with lambda concurrent.
My system uses the API Gateway + Lambda function. I requested AWS for more Limit Concurrent and I get 10000 CCU. But my system is going to solve about 30000 concurrent users.
First, I tried using lambda async by this guide [https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-integration-async.html]
But my application use REST API and the result get 500 code of the response. The document said
"In this case, the backend Lambda function is invoked asynchronously,
and the front-end REST API method doesn't return the result"
Then, I search for one more solution that use Lambda Async. It used lambda async invoke from another sync Lambda. But this way didn't use for API GET of some methods that required validation.
Now, we go back with the solution that uses the provision concurrent, but this solution may not be useful. I am concerned about the concurrent that not enough.
Please, help me. Sorry about my English skill
My problem every 20minutes I want to execute the curl request which is around 25000 or more than that and save the curl response in database. In PHP it is not handled properly which is the best AWS services I can use except lambda.
A common technique for processing large number of similar calls is:
Create an Amazon Simple Queue Service (SQS) queue and push each request into the queue as a separate message. In your case, the message would contain the URL that you wish to retrieve.
Create an AWS Lambda function that performs the download and stores the data in the database.
Configure the Lambda function to trigger off the SQS queue
This way, the SQS queue can trigger hundreds of Lambda functions running parallel. The default concurrency limit is 1000 Lambda functions, but you can request for this to be increased.
You would then need a separate process that, every 20 minutes, queries the database for the URLs and pushes the messages into the SQS queue.
The complete process is:
Schedule -> Lambda pusher -> messages into SQS -> Lambda workers -> database
The beauty of this design is that it can scale to handle large workloads and operates in parallel, rather than each curl request having to wait. If a message cannot be processed, it Lambda will automatically try again. Repeated failures will send the message to a Dead Letter Queue for later analysis and reprocessing.
If you wish to perform 25,000 queries every 20 minutes (1200 seconds), this would need a query to complete every 0.05 seconds. That's why it is important to work in parallel.
By the way, if you are attempting to scrape this information from a single website, I suggest you investigate whether they provide an API otherwise you might be violating the Terms & Conditions of the website, which I strongly advise against.
I'm working on an application and our Back-end is written in .NET Web APi Core and Front-end in React. I make an endpoint which gets a JSON list and the size of the list is almost 83 MB. When I, deploy my back-end into AWS Lambda and call my endpoint it gives me an error (Error converting the response object of type Amazon.Lambda.APIGatewayEvents.APIGatewayProxyResponse from the Lambda function to JSON: Unable to expand the length of this stream beyond its capacity.: JsonSerializerException). I already check the Lambda payload response limit is 6 MB, (storing data into S3 from lambda endpoint and then call into Front-End will not work for me), so is there any way I can get that much data through Lambda.
Not with a single call, sorry, you cannot. As described in this link, the payload limit (for synchronous call) is, as you say, 6MB. Asynchronous calls have even lower limits.
I'd suggest you modify your UI/API to narrow your results first, or trap this error and alert the user (or calling service) that the payload is too large (and hence should be aborted, narrowed, or split into multiple calls).
In a single call we cannot retrieve more than 6.2 mb data from AWS lambda or through AWS API.Im also facing the same issue,Either we have to filter data or should do multiple calls
When I try to invoke a method that has a HTTP event it results in 500 Internal server error.
On CloudWatch logs it shows Recoverable error occurred (Rate Exceeded.)
When I try invoke a function without lambda it executes with response.
Here is my serverless config:
You have set your Lambda's reservedConcurrency to 0. This will prevent your Lambda from ever being invoked. Setting it to 0 is usually useful when your functions are getting invoked but you're not sure why and you want to stop it right away.
If you want to have it invoked, change reservedConcurrency to a positive integer (by default, it can be a positive integer <= 1000, but you can increase this limit by contacting AWS) or simply remove the reservedConcurrency attribute from your .yml file as it will use the default values.
Why would one ever use reservedConcurrency anyways? Well, let's say your Lambda functions are triggered by requests from API Gateway. Let's say you get 400 (peak hours) requests/second and, upon every request, two other Lambda functions are triggered, one to generate a thumbnail for a given image and one to insert some metadata in DynamoDB. You'd have, in theory, 1200 Lambda functions running at the same time (given all of your Lambda functions finish their execution in less than a second). This would lead to throttling as the default concurrent execution for Lambda functions is 1000. But is the thumbnail generation as important as the requests coming from API Gateway? Very likely not as it's naturally an eventually consistent task, so you could set reservedConcurrency on the thumbnail Lambda to only 200, so you wouldn't use up your concurrency, meaning other functions would be able to spin up to do something more useful at a given point in time (in our example, receiving HTTP requests is more important than generating thumbnails). The other 800 left concurrency could then be split between the function triggered from API Gateway and the one that inserts data into DynamoDB, thus preventing throttling for the important stuff and keeping the not-so-important-stuff eventually consistent.