Lambda Function / Stage Scoping - amazon-web-services

Lambda Function / Stage Scoping - amazon-web-services

When running functions in AWS lambda, it's common to use environment variables to control settings. However, when invoking Lambda via API gateway you have 'stage variables' to contend with.
My question is this: is an AWS Lambda instance scoped to a particular API gateway stage when invoked from API gateway, such that I can rely on the stage not changing between calls. In effect, does each API 'stage' get it's own pool of instances to work with, which are recycled in accordance with stage variables?
Examples of where I might want to depend on this behaviour:
Creating connections to tables - the table name will be different per-stage, so if I create the connection on first usage I'd end up using the first callers stage context. What happens when I make a call on a different API gateway stage?
Varying JWT keys for environments.
My gut feeling on this is that if API gateway has two versions/stages of the deployment referencing the exact same function verison, the lambda-managed function instances can recieve calls from two stages interchangably, and I shoulnd't cache the context and request derrived information (stage-variables) variables in process.
There's a lot of AWS API Gateway / Lambda stuff out there, but couldn't find a clear answer to this issue.

You're right, a single Lambda function version will have a pool of instances that are totally independent. Different API Gateway stages and even different APIs can call the same function and this has no impact on the instance pool in Lambda.
So any in-function caching you're doing should not use the assumption that only a specific API and/or stage will access the cached data.

Related

Caching results of a lambda function

We are developing a serverless application. The application has "users" that get added, "groups" that have "permissions" on different "resources". To check if a user has permission to do an action on a resource, there would be some calculations we will need to do. (We are using DynamoDB)
Basically, before every action, we will need to check if the user has permission to do that particular action on the given resource. I was thinking we could have a lambda function that checks that from a cache and If not in the cache, hits the DB, does the calculation, writes in the cache, and returns.
What kind of cache would be best to use here? We are going to be calling this internally from the backend itself.
Is API gateway the way to go still?
How about elastic cache for this purpose? Can we use it without having to configure a VPC? We are trying not to have to use a VPC in our application.
Any better ways?

They are all good options!
Elasticache is designed for caching data. API Gateway can also cache results.
An alternative is to keep the data "inside" the AWS Lambda function by using global variables. The values will remain present the next time the Lambda function is invoked, so you could cache results and an expiry time. Note, however, that Lambda might launch multiple containers if the function is frequently run (even in parallel), or not run for some time. Therefore, you might end up with multiple caches.
I'd say the simplest option would be API Gateway's cache.

Where are those permissions map (user <-> resource) is stored?
This aws's blog post might be interesting (it's about caching in lambda execution environment's memory.), because you could use dynamodb's table for that.

Organising stacks and shared resources in AWS CloudFromation and Serverless

I have an architectural question about the design and organisation of AWS Serverless resources using CloudFormation.
Currently I have multiple stack organised by the domain specific purpose and this works well. Most of the stack that contain Lambdas have to transformed using Serverless (using SAM for all). The async communication is facilitated using a combination of EventBridge and S3+Events and works well. The issue I have is with synchronous communication.
I don't want to reference Lambdas from other stacks using their exported names from other stacks and invoke them directly as this causes issues with updating and versions (if output exports are referenced in other stacks, I cannot change the resource unless the reference is removed first, not ideal for CI/CD and keeping the concerns separate).
I have been using API Gateway as an abstraction but that feels rather heavy handed. It is nice to have that separation but having to have domain and DNS resolving + having the API GW exposed externally doesn't feel right. Maybe there is a better way to configure API GW to be internal only. If you had success with this, could you please point me in the direction?
Is there a better way to abstract invocation of Lambda functions from different stacks in a synchronous way? (Common template patterns for CF or something along those lines?)

I see two questions:
Alternatives for Synchronous Lambda Functions with API Gateway .
Api Gateway is one easy way, with IAM Authentication to make it secure. HTTP Api is much simplified and cheaper option compared to REST APIs. We can choose Private Api rather than a Regional/Edge, which is not exposed outside VPC to make it even move secure.
we can have a private ALB with target as Lambda functions, for a simple use case that doesn't need any API gateway features.(this will cost some amount every month)
We can always call lambdas directly with AWS SDK invoke.
Alternatives to share resources between templates.
Exporting and Importing will be bit of problem if we need to delete and recreate the resource, shouldn't be a problem if we are just updating it though.
We can always store the Arn of the Lambda function in an SSM parameter in source template and resolve the value of the Arn from SSM parameter in destination template. This is completely decoupled. This is better than simply hard coding the value of Arn.

How To Prevent AWS Lambda Abuse by 3rd-party apps

Very interested in getting hands-on with Serverless in 2018. Already looking to implement usage of AWS Lambda in several decentralized app projects. However, I don't yet understand how you can prevent abuse of your endpoint from a 3rd-party app (perhaps even a competitor), from driving up your usage costs.
I'm not talking about a DDoS, or where all the traffic is coming from a single IP, which can happen on any network, but specifically having a 3rd-party app's customers directly make the REST calls, which cause your usage costs to rise, because their app is piggy-backing on your "open" endpoints.
For example:
I wish to create an endpoint on AWS Lambda to give me the current price of Ethereum ETH/USD. What would prevent another (or every) dapp developer from using MY lambda endpoint and causing excessive billing charges to my account?

When you deploy an endpoint that is open to the world, you're opening it to be used, but also to be abused.
AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS, etc., however, they do not know what is or is not abuse of your Lambda function, as you are asking.
If your Lambda function is private, then you should use one of the API gateway security mechanisms to prevent abuse:
IAM security
API key security
Custom security authorization
With one of these in place, your Lambda function can only by called by authorized users. Without one of these in place, there is no way to prevent the type of abuse you're concerned about.

Unlimited access to your public Lambda functions - either by bad actors, or by bad software developed by legitimate 3rd parties, can result in unwanted usage of billable corporate resources, and can degrade application performance. It is important to you consider ways of limiting and restricting access to your Lambda clients as part of your systems security design, to prevent runaway function invocations and uncontrolled costs.
Consider using the following approach to preventing execution "abuse" of your Lambda endpoint by 3rd party apps:
One factor you want to control is concurrency, or number of concurrent requests that are supported per account and per function. You are billed per request plus total memory allocation per request, so this is the unit you want to control. To prevent run away costs, you prevent run away executions - either by bad actors, or by bad software cause by legitimate 3rd parties.
From Managing Concurrency
The unit of scale for AWS Lambda is a concurrent execution (see
Understanding Scaling Behavior for more details). However, scaling
indefinitely is not desirable in all scenarios. For example, you may
want to control your concurrency for cost reasons, or to regulate how
long it takes you to process a batch of events, or to simply match it
with a downstream resource. To assist with this, Lambda provides a
concurrent execution limit control at both the account level and the
function level.
In addition to per account and per Lambda invocation limits, you can also control Lambda exposure by wrapping Lambda calls in an AWS API Gateway, and Create and Use API Gateway Usage Plans:
After you create, test, and deploy your APIs, you can use API Gateway
usage plans to extend them as product offerings for your customers.
You can provide usage plans to allow specified customers to access
selected APIs at agreed-upon request rates and quotas that can meet
their business requirements and budget constraints.
What Is a Usage Plan? A usage plan prescribes who can access one or
more deployed API stages— and also how much and how fast the caller
can access the APIs. The plan uses an API key to identify an API
client and meters access to an API stage with the configurable
throttling and quota limits that are enforced on individual client API
keys.
The throttling prescribes the request rate limits that are applied to
each API key. The quotas are the maximum number of requests with a
given API key submitted within a specified time interval. You can
configure individual API methods to require API key authorization
based on usage plan configuration. An API stage is identified by an
API identifier and a stage name.
Using API Gateway Limits to create Gateway Usage Plans per customer, you can control API and Lambda access prevent uncontrolled account billing.

#Matt answer is correct, yet incomplete.
Adding a security layer is a necessary step towards security, but doesn't protect you from authenticated callers, as #Rodrigo's answer states.
I actually just encountered - and solved - this issue on one of my lambda, thanks to this article: https://itnext.io/the-everything-guide-to-lambda-throttling-reserved-concurrency-and-execution-limits-d64f144129e5
Basically, I added a single line on my serverless.yml file, in my function that gets called by the said authirized 3rd party:
reservedConcurrency: 1
And here goes the whole function:
refresh-cache:
handler: src/functions/refresh-cache.refreshCache
# XXX Ensures the lambda always has one slot available, and never use more than one lambda instance at once.
# Avoids GraphCMS webhooks to abuse our lambda (GCMS will trigger the webhook once per create/update/delete operation)
# This makes sure only one instance of that lambda can run at once, to avoid refreshing the cache with parallel runs
# Avoid spawning tons of API calls (most of them would timeout anyway, around 80%)
# See https://itnext.io/the-everything-guide-to-lambda-throttling-reserved-concurrency-and-execution-limits-d64f144129e5
reservedConcurrency: 1
events:
- http:
method: POST
path: /refresh-cache
cors: true
The refresh-cache lambda was invoked by a webhook triggered by a third party service when any data change. When importing a dataset, it would for instance trigger as much as 100 calls to refresh-cache. This behaviour was completely spamming my API, which in turn was running requests to other services in order to perform a cache invalidation.
Adding this single line improved the situation a lot, because only one instance of the lambda was running at once (no concurrent run), the number of calls was divided by ~10, instead of 50 calls to refresh-cache, it only triggered 3-4, and all those call worked (200 instead of 500 due to timeout issue).
Overall, pretty good. Not yet perfect for my workflow, but a step forward.
Not related, but I used https://epsagon.com/ which tremendously helped me figuring out what was happening on AWS Lambda. Here is what I got:
Before applying reservedConcurrency limit to the lambda:
You can see that most calls fail with timeout (30000ms), only the few first succeed because the lambda isn't overloaded yet.
After applying reservedConcurrency limit to the lambda:
You can see that all calls succeed, and they are much faster. No timeout.
Saves both money, and time.
Using reservedConcurrency is not the only way to deal with this issue, there are many other, as #Rodrigo stated in his answer. But it's a working one, that may fit in your workflow. It's applied on the Lambda level, not on API Gateway (if I understand the docs correctly).

AWS Gateway map path for lambda

I am working in AWS with the API gateway together with a lambda function. I read about how to pass parameters over to lambda function, that is fine. But I want to pass the whole path over to lambda. Does someone know how that would be done? Especially I want to pass the stage of the API gateway. The lambda function should connect to either the test server or the prod based on the stage. In the following example it would be test:
https://skjdfsdj.execute-api.us-east-1.amazonaws.com/test/name/name2
In next example it would be prod:
https://skjdfsdj.execute-api.us-east-1.amazonaws.com/prod/name/name2
Any information how that would work?
Thanks,
Benni

We can configure/deploy the API Gateway with respect to the stages and the HTTP Methods that are required Docs.
There may be two cases :
You may either have two different AWS lambda functions implemented, in this scenario its pretty simple as you can just create another stage and map the lambda function and the respective methods accordingly.
If you have to access the same lambda function and take action corresponding to the stage, You can add, remove, and edit stage variables and their values. You can use stage variables in your API configuration to parametrize the integration of a request. Stage variables are also available in the $context object of the mapping templates, and once we have mapped the particular stage variable in the incoming request you can use it and configure which server to call accordingly. Do check this out API Gateway context/stage variables

What is the best way to work with environments in AWS API Gateway?

I am using AWS to build an API, and deploy this to multiple stages.
When a call is made to a specific environment, I need to get a stage variable in Lambda and then data is recorded in a DynamoDB table such as "environment-Table".
Is this the best way to work with environments (like development, production etc) using AWS API Gateway, Lambda and DynamoDB?

It difficult to say what the best approach is for your specific situation, given the limited data in your post. Managing multiple environments such as development and production was one of the intended uses of stage and stage variables. I don't see any obvious problems with what your are proposing.
Depending on your use case, you can call a Lambda function to record data in DynamoDB, or you may be able to skip the Lambda function and record the data in DynamoDB directly using the AWS proxy integration type.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js