I am using Serverless and have a service with two functions.
I have setup API Keys to restrict access to the service.
One function does a basic task
The 2nd function is activated at the end of the 1st function to return a more thorough result.
I would like all API Keys be to able to access the 1st function but only some API Keys to access the 2nd function kind of like a premium feature that is activated for some users.
The only solution I can think of that works is having two separate services but this seems like a waste of resources as I have to request one API from the other. Is there a better way to do this?
You can see the usage plans associated with an API key and act on that information, see:
https://docs.aws.amazon.com/apigateway/api-reference/link-relation/usageplan-by-id/
https://docs.aws.amazon.com/apigateway/api-reference/link-relation/apikey-usageplans
So instead of requiring a mere presence of an API key, you can inspect some properties associated with it and decide to either allow or disallow the usage of your function based on that, but you will have to implement that logic inside of a function for the biggest flexibility.
For more info see:
https://docs.aws.amazon.com/apigateway/api-reference/resource/api-key/
especially links pointing to info on plans and marketplace.
Related
Seems a little inefficient the way it currently is:
response.body = {
user: await userService(userID) // calls a user service to get info on user
friends: await friendsService(userID) // calls a friends service to get info on friends for
}
Let's say the userService and friendsService are configured on different API Gateway endpoints.
Then wouldn't that make the network request take longer than if I were to just package my entire backend into one zip file that's uploaded to AWS Lambda.
Seems like this is very inefficient.
Is there a way to call other lambdas without having to make a network request? I understand putting the lambdas/gateway in the same VPC as the main Gateway endpoint exposed to the internet, but this is expensive?
Anyway to do this more efficiently?
You can call a Lambda function by using the AWS SDK (a LambdaClient object). So, for example assume you wrote two Lambda functions - funA and funB.
Next assume:
you want to call funB from funA
you wrote your Lambda function by using the Lambda Java runtime
You can use the Lambda Java API to invoke funB. There is no need to wrap either one in Restful call using API Gateway. You can use the AWS SDK. Here is the Java API example that shows you how to invoke a Lambda function:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/lambda/src/main/java/com/example/lambda/LambdaInvoke.java
#smac2020 writes in the answer about using SDK, that is of course a network call too. It just skips api gateway and calls directly AWS api.
I think the key point about Lambda is if it scales well. Let you try to thing about the algorithms in a different way. For example you can create a pipeline where in each step your "state object" is enriched with additional data. You can use step function or SQS and send the requests in between the steps or you can make the client responsible to manage the data. You should try to avoid one function waiting for another function. You are then paying two lambdas running - the caller and the called one.
If you are thinking "but microservices..." - look at the design of the AWS API itself. You do not need output of one service as an input to another one. It needs some time to adapt and to look at problems from different perspectives. In your case I would consider if the user list can for example live in the user object and the calls can be merged (look at some no-sql database design principles).
Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!
I'm looking through Google Cloud Functions docs and I wonder if it is possible to restrict access to HTTP cloud function to the given network? I would like to avoid anyone to exhaust the free quota.
Is there any firewall rules or similar mechanism for Cloud Functions?
I don't believe there is any in-built security restrictions at the moment.
In terms of avoid quota exhaustion you could pass a header or parameter with some kind of shared secret. Even a fixed string value would help avoid this problem.
You can add authentication to a cloud function by using firebase authentication. Here's a github example of how to do to it: https://github.com/firebase/functions-samples/tree/master/authorized-https-endpoint
Note however that the authentication code is executed by your function, so rejecting unauthorized access would still consume a small portion of your free resource allowance.
The Google Function Authorizer module might be what you're looking for. It provides "a simple user authentication and management system for Google Cloud HTTP Functions." It doesn't seem to have a lot of users yet, but the project seems simple enough that you could at least use it as a basis to modify or implement your own solution if you prefer.
This article was helpful for me.
https://cloud.google.com/solutions/authentication-in-http-cloud-functions
Anyone can still invoke the function but it must contain credentials from a user that has access to the resources accessed by the function.
Before that I was doing something very simple that is probably not great for production but does provide a little bit more security that just leaving it open publicly. I call my function with a password in the payload and if it doesn't match one of the passwords I hardcoded on the function it just fails with a 403.
If you need to restrict to IP range then you can follow instructions here: https://sukantamaikap.com/posts/load-balancing-cloud-functions
The UI of Google Cloud has unfortunately changed and you need to do some searching before you get all done, but I managed to set it up. But note that the related services will cost roughly 25 eur per month at minimum.
You can estimate the pricing here:
https://cloudpricingcalculator.appspot.com/
You need to search for "Cloud Load Balancing and Network Services" and then enable "Cloud Load Balancing", "Google Cloud Armor", and "IP addresses".
Alternatively, in some cases it might be sufficient if you set the name of the function or some suffix to the name complex enough so that it will be effectively like a sort of password. Something like MyGoogleCloudFunc-abracadabra. Then it will not restrict the network but perhaps outsiders would not know the secret name anyway.
Is it appropriate to design a RESTful service that has a single, generic endpoint that can be reused for arbitrary business operations? For example, one web app might need data relating to vendors. It could call the REST api, passing the name of the class method that that api should call internally to get vendor data. Another app could use the same generic api endpoint and pass the name of another business entity to get different data back.
My motivation is that we have a large set of business-related objects and data. Making api endpoints for each individual one seems like overkill. Can't the REST api be used instead as a kind of dumb pass-through?
It can, but in that case it has nothing to do with "REST" anymore. You'll be simply doing RPC-over-HTTP.
Background
I have a backoffice that manages information from various sources. Part of the information is in a database that the backoffice can access directly, and part of it is managed by accessing web services. Such services usually provides CRUD operations plus paged searches.
There is an access control system that determines what actions a user is allowed to perform. The decision of whether the user can perform some action is defined by authorization rules that depend on the underlying data model. E.g. there is a rule that allows a user to edit a resource if she is the owner of that resource, where the owner is a column in the resources table. There are other rules such as "a user can edit a resource if that resource belongs to an organization and the user is a member of that organization".
This approach works well when the domain model is directly available to the access control system. Its main advantage is that it avoids replicating information that is already present in the domain model.
When the data to be manipulated comes from a Web service, this approach starts causing problems. I can see various approaches that I will discuss below.
Implementing the access control in the service
This approach seems natural, because otherwise someone could bypass access control by calling the service directly. The problem is that the backoffice has no way to know what actions are available to the user on a particular entity. Because of that, it is not possible to disable options that are unavailable to the user, such as an "edit" button.
One could add additional operations to the service to retrieve the authorized actions on a particular entity, but it seems that we would be handling multiple responsibilities to the service.
Implementing the access control in the backoffice
Assuming that the service trusts the backoffice application, one could decide to implement the access control in the backoffice. This seems to solve the issue of knowing which actions are available to the user. The main issue with this approach is that it is no longer possible to perform paged searches because the service will now return every entity that matches, instead of entities that match and that the user is also authorized to see.
Implementing a centralized access control service
If access control was centralized in a single service, everybody would be able to use it to consult access rights on specific entities. However, we would lose the ability to use the domain model to implement the access control rules. There is also a performance issue with this approach, because in order to return lists of search results that contain only the authorized results, there is no way to filter the database query with the access control rules. One has to perform the filtering in memory after retrieving all of the search results.
Conclusion
I am now stuck because none of the above solutions is satisfactory. What other approaches can be used to solve this problem? Are there ways to work around the limitations of the approaches I proposed?
One could add additional operations to the service to retrieve the
authorized actions on a particular entity, but it seems that we would
be handling multiple responsibilities to the service.
Not really. Return a flags field/property from the web service for each record/object that can then be used to pretty up the UI based on what the user can do. The flags are based off the same information that is used for access control that the service is accessing anyway. This also makes the service able to support a browser based AJAX access method and skip the backoffice part in the future for added flexibility.
Distinguish between the components of your access control system and implement each where it makes sense.
Access to specific search results in a list should be implemented by the service that reads the results, and the user interface never needs to know about the results the user doesn't have access to. If the user may or may not edit or interact in other ways with data the user is allowed to see, the service should return that data with flags indicating what the user may do, and the user interface should reflect those flags. Service implementing those interactions should not trust the user interface, it should validate the user has access when the service is called. You may have to implement the access control logic in multiple database queries.
Access to general functionality the user may or may not have access to independant of data should again be controlled by the service implementing that functionality. That service should compute access through a module that is also exposed as a service so that the UI can respect the access rules and not try to call services the user does not have access to.
I understand my response is very late - 3 years late. It's worth shedding some new light on an age-old problem. Back in 2011, access-control was not as mature as it is today. In particular, there is a new model, abac along with a standard implementation, xacml which make centralized authorization possible.
In the OP's question, the OP writes the following re centralized access control:
Implementing a centralized access control service
If access control was centralized in a single service, everybody would be able to use it to consult access rights on specific entities. However, we would lose the ability to use the domain model to implement the access control rules. There is also a performance issue with this approach, because in order to return lists of search results that contain only the authorized results, there is no way to filter the database query with the access control rules. One has to perform the filtering in memory after retrieving all of the search results.
The drawbacks that the OP mentions may have been true in a home-grown access control system, in RBAC, or in ACL. But they are no longer true in abac and xacml. Let's take them one by one.
The ability to use the domain model to implement the access control rules
With attribute-based access control (abac) and the eXtensible Access Control Markup Language (xacml), it is possible to use the domain model and its properties (or attributes) to write access control policies. For instance, if the use case is that of a doctor wishing to view medical records, the domain model would define the Doctor entity with its properties (location, unit, and so on) as well as the Medical Record entity. A rule in XACML could look as follows:
A user with the role==doctor can do the action==view on an object of type==medical record if and only if the doctor.location==medicalRecord.location.
A user with the role==doctor can do the action==edit on an object of type==medical record if and only if the doctor.id==medicalRecord.assignedDoctor.id
One of the key benefits of XACML is precisely to mirror closely the business logic and the domain model of your applications.
Performance issue - the ability to filter items from a db
In the past, it was indeed impossible to create filter expressions. This meant that, as the OP points out, one would have to retrieve all the data first and then filter the data. That would be an expensive task. Now, with XACML, it is possible to achieve reverse querying. The ability to run a reverse query is to create a question of the type "What medical record can Alice view?" instead of the traditional binary question "Can Alice view medical records #123?".
The response of a reverse query is a filter condition which can be converted into a SQL statement, for instance in this scenario SELECT id FROM medicalRecords WHERE location=Chicago assuming of course that the doctor is based in Chicago.
What does the architecture look like?
One of the key benefits of a centralized access control service (also known as externalized authorization) is that you can apply the same consistent authorization logic to your presentation tier, business tier, APIs, web services, and even databases.