AWS service for managing state data - dynamodb/step functions/sqs? - amazon-web-services

I am building a Desktop-on-Demand solution using AWS Workspaces product and I am trying to understand what is the best AWS service to fit my requirements for managing state data for new users.
In a nutshell, solution will create a new AWS Workspace (virtual desktop instance) for a user when multiple conditions are met and checks are satisfied. These tasks would be satisfied by multiple lambda functions.
DynamoDB would be used as a central point for storing confguration data details like user data, user groups data and deployed virtual desktops data.
Logic for Desktops creation would be implemented using Step Functions like below:
Event hook comes from Identity Management system firing a lambda function that checks if user desktop already exists in DynamoDB table
If it does not exist, another lambda creates AWS AD connector
Once this is done, another lambda builds custom image for new desktop if needed
Another lambda pulls latest data from Identity Management system and updates DynamoDB table for users and groups.
Other lambda functions that may be fired up as a dependency
To ensure we have transactional mechanism, we only deploy new desktop when all conditions are met. I can think about few ways of implementing this check:
Use DynamoDB table for keeping State data. When all attributes in item are in expected state, desktop can be created. If any lambda fails or produces data that does not fit, dont' create desktop.
Just use Step Functions and design it's logic flow that all conditions must satisfy before desktop is created
Someone suggested using SQS queue but I don't see how this can be used for my purpose.
What is the best way to keep this data?

Step Functions is the method I would use for this. The DynamoDB solution would also work, but this seems like exactly the sort of thing Step Functions was designed to handle.
I agree that SQS would not be a correct solution.

Related

AWS CDK: What is the best way to implement multiple Stacks?

I have a few things to get clear, specifically regarding modeling architecture for a serverless application using AWS CDK.
I’m currently working on a serverless application developed using AWS CDK in TypeScript. Also as a convention, we follow the below rules too.
A stack should only have one table (dynamo)
A stack should only have one REST API (api-gateway)
A stack should not depend on any other stack (no cross-references), unless its the Event-Stack (a stack dedicated to managing EventBridge operations)
The reason for that is so that each stack can be deployed independently without any interferences of other stacks. In a way, our stacks are equivalent to micro-services in a micro-service architecture.
At the moment all the REST APIs are public and now we have decided to make them private by attaching custom Lambda authorizers to each API Gateway resource. Now, in this custom Lambda authorizer, we have to do certain operations (apart from token validation) in order to allow the user's request to proceed further. Those operations are,
Get the user’s role from DB using the user ID in the token
Get the user’s subscription plan (paid, free, etc.) from DB using the user ID in the token.
Get the user’s current payment status (due, no due, fully paid, etc.) from DB using the user ID in the token.
Get scopes allowed for this user based on 1. 2. And 3.
Check whether the user can access this scope (the resource user currently requesting) based on 4.
This authorizer Lambda function needs to be used by all the other Stacks to make their APIs private. But the problem is roles, scopes, subscriptions, payments & user data are in different stacks in their dedicated DynamoDB tables. Because of the rules, I have explained before (especially rule number 3.) we cannot depend on the resources defined in other stacks. Hence we are unable to create the Authoriser we want.
Solutions we could think of and their problems:
Since EventBridge isn't bi-directional we cannot use it to fetch data from a different stack resource.
We can invoke a Lambda in a different stack using its ARN and get the required data from its' response but, AWS has discouraged this as a CDK Anti Pattern
We cannot use technology like gRPC because it requires a continuously running server, which is out of the scope of the server-less architecture.
There was also a proposal to re-design the CDK layout of our application. The main feature of this layout is going from non-crossed-references to adopting a fully-crossed-references pattern. (Inspired by layered architecture as described in this AWS best practice)
Based on that article, we came up with a layout like this.
Presentation Layer
Stack for deploying the consumer web app
Stack for deploying admin portal web app
Application Layer
Stack for REST API definitions using API Gateway
Stack for Lambda functions running business-specific operations (Ex: CRUDs)
Stack for Lambda functions runs on event triggers
Stack for Authorisation (Custom Lambda authorizer(s))
Stack for Authentication implementation (Cognito user pool and client)
Stack for Events (EvenBuses)
Stack for storage (S3)
Data Layer
Stack containing all the database definitions
There could be another stack for reporting, data engineering, etc.
As you can see, now stacks are going to have multiple dependencies with other stacks' resources (But no circular dependencies, as shown in the attached image). While this pattern unblocks us from writing an effective custom Lambda authorizer we are not sure whether this pattern won't be a problem in the long run, when the application's scope increases.
I highly appreciate the help any one of you could give us to resolve this problem. Thanks!
Multiple options:
Use Parameter Store rather than CloudFormation exports
Split stacks into a layered architecture like you described in your
answer and import things between Stacks using SSM parameter store like the other answer describes. This is the most obvious choice for breaking inter-stack dependencies. I use it all the time.
Use fixed resource names, easily referencable and importable
Stack A creates S3 bucket "myapp-users", Stack B imports S3 bucket by fixed name using Bucket.fromBucketName(this, 'Users', 'myapp-users'). Fixed resource names have their own downsides, so this should be used only for resources that are indeed shared between stacks. They prevent easy replacement of the resource, for example. Also, you need to enforce the correct Stack deployment order, CDK will not help you with that anymore since there are no cross-stack dependencies to enforce it.
Combine the app into a single stack
This sounds extreme
and counter intuitive, but I found that most real life teams don't
actually have a pressing need for multi-stack deployment. If your only concern is
separating code-owners of different parts of the application - you
can get away by splitting the stack into multiple Constructs,
composed into a single stack, where each team takes care of their
Construct and its children. Think of it as combining multiple Git repos into a Monorepo. A lot of projects are doing that.
A strategy I use to avoid hard cross-references involves storing shared resource values in AWS Systems Manager.
In the exporting stack, we can save the name of an S3 Bucket for instance:
ssm.StringParameter(
scope=self,
id="/example_stack/example_bucket_name",
string_value=self.example_bucket.bucket_name,
parameter_name="/example_stack/example_bucket_name",
)
and then in the importing stack, retrieve the name and create an IBucket by using a .from_ method.
example_bucket_name = ssm.StringParameter.value_for_string_parameter(
scope=self,
parameter_name="/example_stack/example_bucket_name",
)
example_bucket = s3.Bucket.from_bucket_name(
scope=self,
id="example_bucket_from_ssm",
bucket_name=example_bucket_name,
)
You'll have to figure out the right order to deploy your stacks but otherwise, I've found this to be a good strategy to avoid the issues encountered with stack dependencies.

Is there any equivalent feature to BPMN UserTask available in AWS Step functions?

We have our old 'Camunda based Spring boot application' which we currently deployed it into kubernetes which is running in an AWS EC2 Instance. This application acts as a backend for an angular based UI application.
Now we need to develop a new application similar to the above, which needs to interact with UI.
Our process flow will contain some UserTasks (BPMN) which will wait until manual interaction performed by human user via angular UI.
We are evaluating the possibility of using AWS stepfunctions instead of Camunda, if possible.
I googled but unable to find a concrete answer.
Is AWS stepfunctions have any feature similar to BPMN/Camunda's UserTask ?
Short answer: No.
====================================
Long answer:
After a whole day of study, I decided to continue with CamundaBPM because of below reasons.
AWS step-functions don't have an equivalent feature of UserTask in BPMN.
Step functions supports minimal human intervention via sending emails/messages by using AWS SQS(simple queue service) and AWS SNS(simple notification service).
Refer this link for full example. This manual interaction also based on 'task token'. So this interaction is limited to basic conversational style.
Step-function is NOT coming with in-built database & data management support, the developer has to take care of designing database schema, creating tables, their relationship etc.
On the other hand, Camunda is taking care of creating tables, their relationship, saving & fetching data.
No GUI modeler is available in step-functions, instead you need to draw workflow in a JSON-like language. This will be very difficult if your workflow becomes complex.
Drawing workflow in Camunda is just drag-and-drop using it's modeler.

Counting AWS lambda calls and segmenting data per api key

Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!

Implementing a simple Restful service to store and retrieve data using AWS API Gateway/Lambda

I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!
Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.