Running Lambda functions for server-side validation with AppSync and DynamoDB

Running Lambda functions for server-side validation with AppSync and DynamoDB - amazon-web-services

I've enjoyed working with AWS Amplify a lot lately, its code generation for GraphQL queries based on defined schema is outstanding.
I came across one complication for defining custom logic / validation server-side. Out of the bag AppSync (part responsible for GraphQL api in Amplify) generates resolvers and DynamoDB tables for your schema. Resolvers are created using Apache Velocity templating language and if you are new to it, its a bit of a learning curve in my opinion.
Furthermore, these resolvers are auto generated by Amplify cli. I'm not sure if editing them makes sense either in AppSync console or locally, as every time we push api changes they will be auto generated again?
To add to this, these resolvers that are auto generated actually achieve a lot in terms of linking type models together, enabling search and authentication checks, I really don't want to touch them since development velocity enabled by automatic generation is insane.
Hence only other solution to introduce my custom logic seems to be Lambda functions that listen for create / update events of associated DynamoDB tables.
I think I can set this up in a way thats demonstrated below, essentially allowing users to use GraphQL api normally and when action that requires server validation is made react to it in lambda?
For example player adds item to their inventory, we fire lambda function to check if player had this item before, if not it was purchased, we validate item data and subtract gold of its cost from player table. I think this works fine but my concerns are
We allow to write unvalidated data to database first (although it is validated by graphql type system and auth check prior.)
Additional costs for involving Lambda (in my opinion worth it for time saving and ability to use NodeJS instead of Apache Velocity to define language)
Am I missing something else?
So lambda will do validation behind the scenes, we assume majority of users are good actors here and data they pass to GraphQL api is correct since they use our client.
In case data is unexpected (bad actor) lambda will react and ban the user.
Is this solution viable / common, is there other alternative?

Related

What are the limitations of using AWS appsync api(GraphQL) through Amplify?

I just want to avoid use of custom/manual resolvers in appsync completely. So I'm using Amplify to setup GraphQL appsync API in my app. I'm doing all the stuffs by changing schema.graphql and amplify push.
I have 2 questions :
1. What are the limitations and what problems I'm going to face in future?
2. Can graphql subscriptions get update when app is not running(like user should be notified)?

tons of business logic will be exposed on the client side code.
I think for push notifications you would still have to go via external integrations like FCM/APNS. Multiple integration options are available in SNS

Just to preamble these answers, the fact that you use an amplify generated graphQL and resolvers doesn't stop you from later including custom resolvers and pipeline functions - it's just that you need to learn quite a bit about where to include them in the backend file structure of amplify.
1. What are the limitations and what problems I'm going to face in future?
This depends on how well your applications use-case matches the graphQL schema design and if your application is relatively self-contained. Amplify becomes more complex when your application needs to talk to other back-end systems, you'll need to start using DynamoDB triggers to notify other state machines/event bridge/SNS or similar services.
As mentioned none of these problems are crippling, you can deal with them later but it will be a step up in the AWS knowledge required to implement them.
For small high-volume/availability apps Amplify and DynamoDB as-it-comes is great. If your application matures into many micro-services and sites then you'll need to learn quite a bit more AWS to make them play together well. Amplify does determine your DynamoDB on a table per object basis and you'll probably be stuck with (paying for) that. Think hard about if you ever might want to go to a different optimised data source (RDS or single dynamo table) to reduce the number of queries required to fulfil your graphQL requests.
2. Can graphql subscriptions get update when app is not running(like user should be notified)?
No. Anurag mentions SNS which would be a good option to out-app notify users, best to blend subscriptions and another service.

AWS service for managing state data - dynamodb/step functions/sqs?

I am building a Desktop-on-Demand solution using AWS Workspaces product and I am trying to understand what is the best AWS service to fit my requirements for managing state data for new users.
In a nutshell, solution will create a new AWS Workspace (virtual desktop instance) for a user when multiple conditions are met and checks are satisfied. These tasks would be satisfied by multiple lambda functions.
DynamoDB would be used as a central point for storing confguration data details like user data, user groups data and deployed virtual desktops data.
Logic for Desktops creation would be implemented using Step Functions like below:
Event hook comes from Identity Management system firing a lambda function that checks if user desktop already exists in DynamoDB table
If it does not exist, another lambda creates AWS AD connector
Once this is done, another lambda builds custom image for new desktop if needed
Another lambda pulls latest data from Identity Management system and updates DynamoDB table for users and groups.
Other lambda functions that may be fired up as a dependency
To ensure we have transactional mechanism, we only deploy new desktop when all conditions are met. I can think about few ways of implementing this check:
Use DynamoDB table for keeping State data. When all attributes in item are in expected state, desktop can be created. If any lambda fails or produces data that does not fit, dont' create desktop.
Just use Step Functions and design it's logic flow that all conditions must satisfy before desktop is created
Someone suggested using SQS queue but I don't see how this can be used for my purpose.
What is the best way to keep this data?

Step Functions is the method I would use for this. The DynamoDB solution would also work, but this seems like exactly the sort of thing Step Functions was designed to handle.
I agree that SQS would not be a correct solution.

Counting AWS lambda calls and segmenting data per api key

Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?

I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!

Implementing a simple Restful service to store and retrieve data using AWS API Gateway/Lambda

I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!

Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?

Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.

From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js