I am part of a team currently developing a Proof of Concept architecture/application for a communication service between governmental offices and the public (narrowed down to the health-sector for now). The customer has specifically requested a mainly serverless approach through AWS services, and I am in need of advice for how to set up this architecture, namely the Lambda to Database relationship.
Roughly, the architecture would make use of API Gateway to handle requests, which would invoke different Lambdas, as micro-services, that access the DB.
The following image depicts a quick relationship schema. Basically, a Patient inputs a description of his Condition which forms the basis for a Case. That Case is handled during one or many Sessions by one or many Nurses that take Notes related to the Case. DB Schema (not enough reputation)
From my research, I've gathered that in the case of RDS, there is a trade-off between security (keeping the Lambdas outside of a public VPC containing an RDS instance, foregoing security best-practices, a no-no for public sector) and performance (putting the Lambda in a private VPC with an RDS instance, and incurring heavy cold-start times due to the provisioning of ENI). The cold-start times can however be negated by pinging them with CloudWatch, which may or may not be optimal.
In the case of DynamoDB, I am personally very inexperienced (more so than in MySQL) and unsure of whether the data is applicable to a NoSQL model. If it is, DynamoDB seems like the better approach. From my understanding though, NoSQL has less support for complex queries that involve JOINs etc. which might eliminate it as an option.
It feels as if SQL/RDS is more appropriate in terms of the data/relations, but DynamoDB gives less problems for Lambda/AWS services if a decent data model is found. So my question is, would it be preferable to go for a private RDS instance and try to negate the cold-starts by warming up the most critical Lambdas, or is there a NoSQL model that wouldn't cause headaches for complex queries, among other things? Am I missing any key aspects that could tip the scale?
Let's start by clearing up some rather drastic misconceptions on your part:
From my research, I've gathered that in the case of RDS, there is a trade-off between security (keeping the Lambdas outside of a public RDS instance, foregoing security best-practices, a no-no for public sector) and performance (putting the Lambda in a private RDS instance, and incurring heavy cold-start times). The cold-start times can however be negated by pinging them with CloudWatch, which may or may not be optimal
RDS is a database server. You don't run anything inside or outside of it.
You may be thinking of a VPC, or Virtual Private Cloud. This is an isolated network in which you can run your RDS instances and Lambdas.
Running inside or outside of a VPC has no impact on cold start times. You pay the cold start penalty when AWS has to start a new container to run your Lambda. This can happen either because it hasn't been running recently, or because it needs to scale to meet concurrent requests. The actual cold start time will depend on your language: Java is significantly slower than Python, for example, because it needs to start the JVM and load classes before doing anything.
Now for your actual question
Basically, a Patient inputs a description of his Condition which forms the basis for a Case. That Case is handled during one or many Sessions by one or many Nurses that take Notes related to the Case.
This could be implemented in a NoSQL database such as DynamoDB. Without more information, I would probably make the Session the base document, using case ID as partition key and session ID as the sort key. If you don't understand what those terms mean, and how you would structure a document based around that key, then you probably shouldn't use DynamoDB.
A bigger reason to not use DynamoDB has to do with access patterns. Will you ever want to find all cases worked by a given nurse? Or related to a given patient? Those types of queries are what a relational database is designed for.
the case of DynamoDB, I am personally very inexperienced (more so than in MySQL)
Do you have anyone on your team who is familiar with NoSQL databases? If not, then I think you should stick with MySQL. You will have enough challenges learning how to use Lambda.
Related
I am going to mention my needs and what I have currently in place so bear with me. Firstly, a lambda function say F1 which when invoked will get 100 links from a site. Most of these links say about 95 are the same as when F1 was invoked the previous time, so further processing must be done with only those 5 "new" links. One solution was to write to a Dynamodb database the links that are processed already and each time the F1 is invoked, query the database and skip those links. But I found that the "database read" although in milliseconds is doubling up lambda runtime and this can add up especially if F1 is called frequently and if there are say a million processed links. So I decided to use Elasticache with Redis.
I quickly found that Redis can be accessed only when F1 runs on the same VPC and because F1 needs access to the internet you need NAT. (I don't know much about networking) So I followed the guidelines and set up VPC and NAT and got everything to work. I was delighted with performance improvements, almost reduced the expected lambda cost in half to 30$ per month. But then I found that NAT is not included in the free tier and I have to pay almost 30$ per month just for NAT. This is not ideal for me as this project can be in development for months and I feel like I am paying the same amount as compute just for internet access.
I would like to know if I am making any fundamental mistakes. Am I using the Elasticache in the right way? Is there a better way to access both Redis and the internet? Is there any way to structure my stack differently so that I retain the performance without essentially paying twice the amount after free tier ends. Maybe add another lambda function? I don't have any ideas. Any minute improvements are much appreciated. Thank you.
There are many ways to accomplish this, and all of them have some trade-offs. A few other ideas for you to consider:
Run F1 without a VPC. It will have connectivity directly to DynamoDB without need for a NAT, saving you the cost of the NAT gateway.
Run your function on a micro EC2 instance rather than in Lambda, and persist your link lookups to some file on local disk, or even a local Redis. With all the Serverless hype, I think people sometimes overestimate the difficulty (and stability) of simply running an OS. It's not that hard to manage, it's easy to set up backups, and may be an option depending upon your availability requirements and other needs.
Save your link data to S3 and set up a VPC endpoint to S3 gateway endpoint. Not sure if it will be fast enough for your needs.
I’m trying to build an application on aws that is 100% serverless (minus the database for now) and what I’m running into is that the database is the bottleneck. My application can scale very well but my database has a finite number of connections it can accommodate and at some point, my lambdas will run into that limit. I can do connection pooling outside of the handler in my lambdas so that there is a database connection per lambda container instead of per invocation and while that does increase the number of concurrent invocations before I hit my connection limit, the limit still exists.
I have two questions.
1. Does serverless aurora solve this by autoscaling to increase the number of instances to meet the need for more connections.
2. Are there any other solutions to this problem?
Also, from other developers interested in serverless, am I trying to do something that’s not worth doing? I love how easy deployment is with serverless framework but is it better just to work with Microservices in something like Kubernetes instead?
I believe there are two potential solutions to that problem:
The first and the simplest option is to take advantage of "lambda hot state", it's the concept when Lambda reuses the execution context for subsequent invocations. As per AWS suggestion
Any declarations in your Lambda function code (outside the handler code, see Programming Model) remains initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We suggest adding logic in your code to check if a connection exists before creating one.
Basically, while the lambda function is the hot stage it "might/should" reuse opened connection(s).
The limitations of the following:
you only reuse connection for single lambda type, so if you have 5 lambda functions invoked all the time you still will be using 5 connections
when you have a spike in lambda invocations, including parallel executions this approach becomes less effective since, lambda will be executed in a new execution context for majority of requests
The second option would be to use a connection pool, connection pool is an array of established database connections, so that the connections can be reused when future requests to the database are required.
While the second option provides a more consistent solution it requires much more infrastructure.
you would be required to run a separate instance for the pool, and if you want to do things properly probably at least two instances and a load balancer (unless use containers).
While it might be overwhelming to provision that much additional infrastructure for connection pooler, it still might be a valid option depending on the scale of the project, your existing infrastructure (may be you already using containers) and cost benefits
Best practices by AWS recommends to take advantage of hot start. You can read more about it here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.BestPracticesWithDynamoDB.html
I'm trying to find a good architecture for connecting to the database. It is required that the connection to the database is not repeated in each lambda function. In addition, this way will create many connections for individual lambdas instead of one common. Can I implement the structure as in the figure below, so that one lambda connects to the database, and everyone else uses its connection in its code
Your proposed architecture will not work because unless your Innovation of DB Lambda is too frequent to always keep it warm and that you are storing your connection in /tmp for reusing on subsequent innovations your DB Lambda will create new connections for each invocation. Moreover if your invocations of DB Lambda create multiple containers to serve simultaneous requests then you will anyways have those many connections instead of just one
Ideal solution will be to replace the DB Lambda with a tiny EC2 instance
The DB connection can be cached in your "DB Lambda" when the Lambda stays warm. If the Lambda does not stay warm, then invoking that lambda will suffer the price of a cold lambda, which could be dominated by having to recreate a DB connection, or could be dominated by what other work you do in "DB Lambda".
How frequently you expect your Lambdas to go cold is something to take into consideration. It depends on the statistics of your incoming traffic. Are you willing to suffer the latency of recreating a DB connection once in a while or not, another consideration?
Managing a tiny EC2 instance like someone else said, could be a lot of extra work depending on whether your cloud service is a complex set of backend services, whether that service shuts down during periods of inactivity. Managing EC2 instances is more work that Lambdas.
I do see one potential problem with your architecture. If for whatever reason your "DB Lambda" fails, the calling Lambda won't know. That could be a problem if you need to handle that situation and do cleanup.
We have a completely serverless application, with only lambdas and DynamoDB.
The lambdas are running in two regions, and the originals are stored in Cloud9.
DynamoDB is configured with all tables global (bidirectional multi-master replication across the two regions), and the schema definitions are stored in Cloud9.
The only data loss we need to worry about is DynamoDB, which even if it crashed in both regions is presumably diligently backed up by AWS.
Given all of that, what is the point of classic backups? If both regions were completely obliterated, we'd likely be out of business anyway, and anything short of that would be recoverable from AWS.
Not all AWS regions support backup and restore functionality. You'll need to roll your own solution for backups in unsupported regions.
If all the regions your application runs in supports the backup functionality, you probably don't need to do it yourself. That is the point of going serverless. You let the platform handle simple DevOps tasks.
Having redundancy with regional or optionally cross-regional replication for DynamoDB provides mainly the durability, availability and fault tolerance for your data storage. However along with these inbuilt capabilities, still there can be the need for having backups.
For instance, if there is a data corruption due to an external threat (Like an attack) or based on an application malfunction, still you might want to restore the data back. This is one place where having backups is useful to restore the data back to a recent point of time.
There can also be compliance related requirement, which will require taking backups of your database system.
Another use case is when there is a need to create new DynamoDB tables for your build pipeline and quality assurance, it is more practical to re-use an already made snapshot of data from a backup rather taking a copy from the live database (Since it can consume the IOPS provisioned, affecting the application behaviors).
We have a service that runs in 6 AWS regions and we have some requisites that should be met:
The latency of querying the database must be very low
It support a high throughput of queries
It's been observed that the database update process is IO intensive, so it increases the queries latency due to db locks.
Delays in the order of seconds is acceptable between update and read
The architecture that we discussed was having one service that updates the master db and one slave in each region (6 slaves total).
We found some problems and some possible solutions with that:
There is a limitation of 5 read replicas using AWS infrastructure.
To solve this issue we though of creating read replicas of read replicas. That should give us 25 instances.
There is a limitation in AWS that you cannot create a read replica of a read replica from another region.
To solve this issue we though of inside the application updating 2 master databases.
This approach will create a problem that, for a period of time, the databases can be inconsistent.
In the service implementation we can always recreate the data. So there is a job re-updating the data from times to times (that is one of the reasons that the update is IO intensive).
Anyone has a similar problem? How do you handle it? Can we avoid creating and maintaining databases by ourselves?
We are using MySQL but we are pretty open to use other compatible DBs.
unfortunately, there is no magical solution when it comes to inter-region: you lose latency.
I think you explored pretty much all the solutions from an RDS point of view with what you propose, e.g read replica of read replica (I confirm you cannot do this from another region, but this is to save you from a too high replica-lag).
Another solution would be to create databases on EC2 instances, but you would lose all the benefits from RDS (You could protect this traffic with an inter-region vpn between vpcs). Bare in mind however that too many read replicas will impact your performances.
My advises in your case would be:
to massively use cache at every possible levels: elasticache between DB and servers, varnish for http pages, cloudfront for content delivery. If you want so many read replicas, it means that you are heavely dependent on reads. This way, you would save a lot of reads from hitting your database and gain latency significantly, and maybe 5 read replicas would be enough then.
to consider sharding or using several databases. This
is not always a good solution however, depending on your use case...
You can request an increase in the number of RDS for MySQL Read Replicas using the form at https://aws.amazon.com/contact-us/request-to-increase-the-amazon-rds-db-instance-limit/
Once the limit has been increased you'll want to test to make sure that the performance of having a large number of Read Replicas is acceptable to your application.
Hal