Implementing a simple Restful service to store and retrieve data using AWS API Gateway/Lambda - amazon-web-services

I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!

Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.

Related

Does the KDB+ gateway have to hold all the data?

I am trying to implement a gateway design to access/abstract the api to my database, which is simply a single HDB and RDB on the same server. Reading through documentation https://code.kx.com/q/wp/gateway-design/ the most basic gateways act as at least one man in the middle. Without aggregation this doubles the data transfer needed and with aggregation it seems that to be generic (implement "select" for example) it would need to pull all data to the gateway anyway (for example to perform an average that required data from both). Is there something I am missing in the design of the gateway so as not to copy the data through it, a simple and elegant solution for a small setup would be ideal. I guess this is the map reduce problem in general but in a KDB+ HDB/RDB setting.
You might want to take a look on https://www.aquaq.co.uk/kdb-gateways/
aquaq torq has implement a good gw setup. If you are setting a new kdb project, torq will be a boost starter for you.
Asyn gw dosent hold any data
Sync gw does hold data and processing can be done on sync gw itself and then return to user/client
The gateway in general doesn't hold the data. Instead it uses Inter-process Communication (IPC) to send requests to the RDB and HDB. The queries are calculated on the RDB and HDB side and the result is then sent to the gateway which is then sent to the client, but the gateway usually doesn't store the data.

Counting AWS lambda calls and segmenting data per api key

Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!

AWS API Gateway: When to create another API?

This conceptual question has crept into my mind after becoming more familiar with AWS. In general, I’m curious if there is a best-practice and/or convention as to when an API provider should group endpoints into a new, separate API (vs. lumping the endpoints into an existing API).
To illustrate, let’s say a Service creates digital wallet coupons on behalf of Manufacturers, to be redeemed by Consumers at a bunch of Mom & pop stores — some of the activities the Service might engage in include:
Receiving data from the Manufacturers (in order to build the digital coupons)
Providing a mechanism for Consumers to find and download coupons
Providing a way for the Mom & pop stores’ payment terminals to validate the coupons
And, oh by the way, the Service might also be required to ...
Implement a variety of endpoints, based on technologies involved (e.g., PassKit with Apple Wallet)
So?
With AWS, it’s easy to modularize one’s backend (e.g., have an RDS instance for the database, run a few lambda functions for microservices, etc.) and load balance it all. API Gateway adds to this in that each endpoint can point to different things (lambda functions, EC2 instances via HTTP proxy, etc.).
Consequently, one approach might be to define one API in AWS API Gateway and have all the endpoints underneath it:
API: “Master”
/coupon
POST = create a new one (for Manufacturers)
PUT = update an existing one (for Manufacturers)
GET = retrieve one (for Consumers)
/coupon/validate
POST = verify it’s still valid (Mom & Pop store use-case)
/apple-wallet
/{version}
/passes
... per documentation
/devices
... per documentation
But would it make more sense for the Service to shave off the /apple-wallet endpoint and create an entirely new, separate API?
Alternatively, if the Service was going to publish documentation for public developers to use, would it make sense to move the Manufacturer-relevant endpoints into a separate API altogether?
Since AWS makes the effort of splitting endpoints so simple via API Gateway, are there any standard practices for when you should (or should not)?
Thank you for any insights / opinions!
My two cents. Think about your end-user for your APIs. You will have different developer end-users for each API set.
Your ideal situation will have each developer end-user only seeing the APIs that are relevant to them. So you should split your APIs into different Gateways according to the end-users
In the theoretical situation you describe:
Create an API for Manufacturers so they can integrate with you to create coupons. If you do the integration internally it will be the corporate sales and presales people who talk to the manufacturers
The users for the Service and End User coupons might end up being the
same app developers that create an interface for both stores and
users. So create a coupon API for them
Separating both should also give you security benefits as you will protect the knowledge of your Manufacturer API from the users who might try to hack it

Sharing data between isolated microservices

I'd like to use the microservices architectural pattern for a new system, but I'm having trouble figuring out how to share and merge data between the services when the services are isolated from each other. In particular, I'm thinking of returning consolidated data to populate a web app UI over HTTP.
For context, I'm intending to deploy each service to its own isolated environment (Heroku) where I won't be able to communicate internally between services (e.g. via //localhost:PORT. I plan to use RabbitMQ for inter-service communication, and Postgres for the database.
The decoupling of services makes sense for CREATE operations:
Authenticated user with UserId submits 'Join group' webform on the frontend
A new GroupJoinRequest including the UserId is added to the RabbitMQ queue
The Groups service picks up the event and processes it, referencing the user's UserId
However, READ operations are much harder if I want to merge data across tables/schemas. Let's say I want to get details for all the users in a certain group. In a monolithic design, I'd just do a SQL JOIN across the Users and the Groups tables, but that loses the isolation benefits of microservices.
My options seem to be as follows:
Database per service, public API per service
To view all the Users in a Group, a site visitor gets a list of UserIDs associated with a group from the Groups service, then queries the Users service separately to get their names.
Pros:
very clear separation of concerns
each service is entirely responsible for its own data
Cons:
requires multiple HTTP requests
a lot of postprocessing has to be done client-side
multiple SQL queries can't be optimized
Database-per-service, services share data over HTTP, single public API
A public API server handles request endpoints. Application logic in the API server makes requests to each service over a HTTP channel that is only accessible to other services in the system.
Pros:
good separation of concerns
each service is responsible for an API contract but can do whatever it wants with schema and data store, so long as API responses don't change
Cons:
non-performant
HTTP seems a weird transport mechanism to be using for internal comms
ends up exposing multiple services to the public internet (even if they're notionally locked down), so security threats grow from greater attack surface
Database-per-service, services share data through message broker
Given I've already got RabbitMQ running, I could just use it to queue requests for data and then to send the data itself. So for example:
client requests all Users in a Group
the public API service sends a GetUsersInGroup event with a RequestID
the Groups service picks this up, and adds the UserIDs to the queue
The `Users service picks this up, and adds the User data onto the queue
the API service listens for events with the RequestID, waits for the responses, merges the data into the correct format, and sends back to the client
Pros:
Using existing infrastructure
good decoupling
inter-service requests remain internal (no public APIs)
Cons:
Multiple SQL queries
Lots of data processing at the application layer
harder to reason about
Seems strange to pass large quantities around data via event system
Latency?
Services share a database, separated by schema, other services read from VIEWs
Services are isolated into database schemas. Schemas can only be written to by their respective services. Services expose a SQL VIEW layer on their schemas that can be queried by other services.
The VIEW functions as an API contract; even if the underlying schema or service application logic changes, the VIEW exposes the same data, so that
Pros:
Presumably much more performant (single SQL query can get all relevant data)
Foreign key management much easier
Less infrastructure to maintain
Easier to run reports that span multiple services
Cons:
tighter coupling between services
breaks the idea of fundamentally atomic services that don't know about each other
adds a monolithic component (database) that may be hard to scale (in contrast to atomic services which can scale databases independently as required)
Locks all services into using the same system of record (Postgres might not be the best database for all services)
I'm leaning towards the last option, but would appreciate any thoughts on other approaches.
To evaluate the pros and cons I think you should focus on what microservices architecture is aiming to achieve. In my opinion Microservices is architectural style aiming to build loosely couple applications. It is not designed to build high performance application so scarification of performance and data redundancy are something we are ready accept when we decided to build applications in a microservices way.
I don't think you services should share database. Tighter coupling scarify the main objective of the microservices architecture. My suggestion is to create a consolidated data service which pick up the data changes events from all the other services and update the database behind it. You might want to design the database behind the consolidated data service in a way that is optimised for query (like a data warehouse) because that's all this service will be used for. You might want to consider using a NoSQL database to support your consolidated data service.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.