Keycloak cross data center partial synchronisation of user data - amazon-web-services

I'm working for a german company, therefor we're bound to GDPR. We're selling our product as a software as a service offering and are hosting the systems in AWS. Our customers are spread over Europe, USA and Asia. So we're running multiple VPCs in AWS in the Regions EU-West, US-Northeast and APAC. Our plan is to implement Keycloak as SSO backend.
Up to this point our initial idea was to imlement Keycloak with the so called Cross Data Center Replication. This would mean one Keycloak-Cluster per VPC with a load balancer in front, the Infispan cluster for inter VPC caching/communication and an Aurora RDS cluster as the centralised database, but we are not pinned to that. The problem is, as mentioned above, we're bound to the GDPR and so the data of european users must not leave the EU except the customer orders us to do so. All I've read is saying that Keycloak is expecting that all data is synced accross the database cluster.
Information about our topology and the issue itself:
Every customer has dedicated EC2 instances in the best suitable geographical region. Additionally there are centralized services hosted in the EU. So user from the USA or APAC need to have access to systems in the EU but EU users don't need to have access to instances/services outside of the EU, except the customers explicitely orders it.
So how do we achieve this?
My only idea atm would be to build up a database cluster (likely not AWS Aurora RDS) and configure on the database itself to not sync all of the data. But this sounds very gross to me and I don't think, that Keycloak is doing well with this. Any ideas or tips would be appreciated!

Ok, in case someone is interested in our solution:
It seems, that we will accept the fact, that not EU customer have a latency under some circumstances. The Infinispan cluster will serve as a cache, so these users will only have once in a while this latency. The DB will reside in the EU.
I'm not sure, if the Infinispan servers will act as a cache, if the connection to the nodes in the other datacenters is lost, but maybe I'll find something about that.

Related

AWS Cloud practitioner: Does RDS require less administration work than Route 53?

I am preparing my exam for AWS Cloud Practitioner and one of the resources I am using is the Test Bank that comes along with the book AWS Certified Cloud Practitioner Study Guide: CLF-C01 Exam and there is question I think it is wrong in the Test Bank.
I tried to contact Wiley Efficient Learning, the provider who offers and maintain the Test Bank, but their customer service is really bad (side note: the book is fairly good, the training provider is awful).
Maybe someone can help me here.
The question is:
Which of the following AWS services would require the customer (i.e., you) to assume the least responsibility for administration? (Select TWO).
A. Elastic Beanstalk
B. Elastic Compute Cloud
C. Relational Database Service
D. Route 53
According to the Test Bank, the right answers are A (Elastic Beanstalk) and C (Relational Database Service).
However I think that the right answers are A (Elastic Beanstalk) and D (Route 53).
Why?
Well, with RDS you have to pick up the right SQL engine, instance type, memory, vCPU, IOPS,... You have to setup a maintenance window, backup retention, deciding on the level of reliability you want enabling/disables multi AZ, enabling resources to access through security groups and, when it is running, you have to monitor it and right-size it if you under- or over-provisioned (for example, if you have enough connection slots to support your workload) plus monitor for potential malicious access.
Also, if the engine you selected is reaching the EOF, you are responsible to update it.
On top of that, you have the administration of the databases you create, but that is using the service, not administrating the service.
On the contrary, with Route 53 you can either register a domain name or you manage the resolution of domain names. But that is using the service, not administrating it.
With Route 53 you don't have to care about reliability nor capacity nor elasticity nor auto-scaling nor backups... AWS will provide the domain name registration and domain name resolution for you.
You have to configure your hosted zones, but that is using the service not administrating it.
What do you think?
Interesting question and perhaps the language could be clearer.
My answer would be A & C having the least responsibility for administration. My reasoning would be:
A - You only have to configure the service. Yes there are a lot of options available to configure, but you never have to deal with security patches on the underlying instances, closing ports on your ALB, etc
B - You have access to the machine itself which, apart from the Security Group, is effectively an open IP on the public internet and you must perform all security and maintenance yourself.
C - Same as A; only a huge amount of configuration, the "deep" parts are still hidden away
D - There's less configuration options in general so it might appear "easier", however those options are very fundamental to how it operates. You have access to the SOA records, DNS publishing, etc. There aren't really any "hidden" aspects to R53

Benefits of separate AWS RDS Instances for each customer

I'm working on a project for a web app where each enterprise customer will have separate resources independent to themselves (i.e. users and user data unique to a single paying customer, content unique for each paying customer, etc). It's somewhat similar to a normal CRM. We're building on AWS and planning to use RDS for the database. Given the fact that each customer does not share data across any different region, would it be most effective to:
Upon enterprise customer sign up, a new VPC is created with new RDS and EC2 instances, so that each customer has siloed databased and services
Have a single VPC with RDS and EC2 instances shared across all customers and use an indicator field in the database to determine what data belongs to each customer
We don't anticipate having over 10000 users for any given enterprise customer, and right now the project is small with only a single enterprise customer.
This answer all depends on how you anticipate your application growth (not just in terms of number of customers but also performance), however personally I would say if you have a single enterprise customer at the moment create a single VPC.
Perhaps separate out a seperate database on the RDS for each customer initially assuming that cost would become a concern, with a seperate db user for each customer to limit exposure.
If the financials allow then you would look at separating out to a seperate RDS database where the performance demands (maybe you have tiers for SLAs etc).
If you get larger customers (who may have bespoke needs, or stringent requirements such as no multi tenant hosts) I would suggest you look at an organizations setup with a account per large customer to separate out the resources and a single account for any shared or multi-customer resources.
The key is plan ahead for how you would want to but if you're still in the early midst of onboarding clients then do not over complicate your setup yet.

Shouldn't I use Direct Connect to deliver the solution of collecting info from multi regions in AWS?

I came across the following question during my AWS practice and I have a different opinion and want to post it here for more discussion as it addresses a very common need, thanks.
http://jayendrapatil.com/aws-rds-replication-multi-az-read-replica/?unapproved=227863&moderation-hash=c9a071a3758c183b1cf03e51c44d2373#comment-227863
Your company has HQ in Tokyo and branch offices all over the world and is using logistics software with a multi-regional deployment on AWS in Japan, Europe and US. The logistic software has a 3-tier architecture and currently uses MySQL 5.6 for data persistence. Each region has deployed its own database. In the HQ region you run an hourly batch process reading data from every region to compute cross-regional reports that are sent by email to all offices this batch process must be completed as fast as possible to quickly optimize logistics. How do you build the database architecture in order to meet the requirements?
A. For each regional deployment, use RDS MySQL with a master in the region and a read replica in the HQ region
B. For each regional deployment, use MySQL on EC2 with a master in the region and send hourly EBS snapshots to the HQ region
C. For each regional deployment, use RDS MySQL with a master in the region and send hourly RDS snapshots to the HQ region
D. For each regional deployment, use MySQL on EC2 with a master in the region and use S3 to copy data files hourly to the HQ region
E. Use Direct Connect to connect all regional MySQL deployments to the HQ region and reduce network latency for the batch process
I lean to E, the reason is:
Direct Connect provides bandwidth that bypasses the ISP and more privately, faster (if needed).
The question doesn't factor cost here.
The initial setup time could be longer comparing to other options, however, initial setup time cost should not be the point here, what is asking here is “this batch process must be completed as fast as possible to quickly optimize logistics.”, so it is not about the initial setup, it is about how to implement the right solution to deliver the “as fast as possible” service AFTER the setup
And hence I believe E is the best option for the need.
I am open to discussion, please, if my understanding is wrong. Thank you.
E is not applicable. You cannot use Direct Connect to connect 2 VPCs. Direct Connect is used to connect VPC and your premise. Question asks about multi-regional AWS infrastructure without mentioning anything about HQ not being hosted on AWS.
The easiest solution is A in my opinion.

Distributed Database Access in aws cloud front

I have MYSQL Database in AWS RDS and Web Application in Mumbai Region. I want to access the web application from the USA with good latency/speed. I have used AWS CloudFront still the application is very slow.
Any Suggestions.
Best,
Syed
AWS CloudFront
How about a cross-region read replica of your MySQL database in the USA? If the majority of your database operations are read rather than write, this will give you a significant improvement in response time.
In standard, it is recommended to keep databases and apps should be in the same regions(eventually can try to keep in the same zone) from the majority of end-user belongs to
As of now, you can create a cross-region replica but you need to be ready for replica lag and data transfer charges. In the long term, plan to move your setup to N.Virgania or any other USA region.

Understanding Amazon offerings

I am working on a project and am at a point where the POC is done and now want to move towards a real product. I am trying to understand the Amazon cloud offerings just to see if I need to be aware of them at development time. I have a bunch of questions that I cannot get answered from the Amazon site. Its probably because I am new to the whole web services thing and have never hosted a site before. I am hoping someone out here will explain this to me like I am a C programmer :)
I see amazon has a bunch of offerings -
EC2
Elastic Block Store
Simple DB
AuotScaling
Elastic Load Balancing
I understand EC2 is virtual server instances that I can use and these could come pre-loaded with what I want (say Apache + python). I have the following questions -
If I want a custom instance of something (like say a custom apache module I wrote for my project). Can I create a server instance using the exact modules and make it the default the next time I create a new instance or in Autoscaling?
Do I get an IP Address to access this? Can I set my own hostname to it? I mean do I get a DNS record? Or is it what Elastic IP is?
How do I access it from the outside? SSH? Remote Desktop? Or is it entirely up to how I configure the instance?
What do they mean by Inter-Region or Intra-Region data transfer? What is data transfer to begin with? Is it just people using my instance? So if I go live with it that will be the cost I have to pay for people using it?
What is the difference between AutoScaling and Elastic Load Balancing?
What is Elastic Block Store? Is it storage? If so do I have to worry about backups or do they take care of it?
About the Simple DB -
It looks like the interface to use this is different to my regular SQL calls. Am I correct?
If so the whole development needs to be tailored specifically for Amazon. Which kind of sucks. Is there a better alternative?
Do I get data backups or do I have to worry about it myself?
Will I be able to connect to the DB using regular tools to inspect the DB (during or afte development). Or do I get other tools made by Amazon for it?
What about security? The DB is obviously somewhere in the cloud farm away from the EC2 instance. My DB password is going over the wire and so is all my data totally unencrypted. Don't I have to worry about that? The question comes up only because I don't own any of the hardware.
I really hope some one points me in the right direction here.
Thanks for taking the time to read.
P
I just went through the question and here I tried to answer few of them,
1) AWS EC2 instances doesnt publish pre-configured instances, in fact its configured by the developers and made it publicly available to the users so that they can use it. One can any one of those instances or you can just opt for what ever OS you want which is raw and provision it accordingly and create a snap shot of it so that you can use it for autos caling.The snap shot becomes the base AMI in your case.
2) Every instance you boot will have a public DNS attach to it, you can use the public DNS to connect to that instance using ssh if your are a linux user or using putty if you are a windows users. Apart from that, you can also attach a elastic IP which comes with a cost will is like peanuts and attach it to the instance and access your instance through the elastic IP and you can either map the public DNS or elastic ip to map to a website by adding a A record or Cname respectively.
3)AWS owns databases in the different parts of the world. For example you deploy your application depending upon your customer base, if you target customers are based out of India, the nearest region available is Singapore which is called as ap-southeast-1 by AWS. Each region will have multiple availability zones, example ap-southeast-1a and ap-southeast-1b, which are two different databases and geographically part. Intre region means from ap-southeast-1a to ap-southeast-1b. Inter Region means, from ap-southeast-1 to us-east-1 which is Northern Virginia Data centre. AWS charges from in coming and out going bandwidth, trust me its nothing.
They chargge 1/8th of a cent per GB. Its a thing to even think about it.
4)Elastic Load balancer is cluster which divides the load equally to all your regions across availability zones (if you are running in multi AZ) ELB sits on top the AWS EC2 instances and monitors the instance health periodically and enables auto scaling
5) To help you understand what is autoscaling please go through this document http://aws.amazon.com/autoscaling/
6)Elastic Block store or EBS are like hard disk which is a persistent data storage which can be attached to your instance.Regarding back up yes dependents upon your use case. I do backups of EBS periodically.
7)Simple Db now renamed as dynamo DB is nosql DB, I hope you understand what is nosql db, its a non RDMS db systems. Please read some documentation to understand what is nosql db is.
8)If you have mysql or oracle db you can opt for RDS, please read the documents.
9)I personally feel you are newbie to the entire cloud eco system, you need to understand what exactly cloud does first.
10)You dont have to make large number of changes to development as such, just make sure it works fine in your local box, it can be deployed to cloud with out much ado.
11) You dont have to use any extra tool for that, change the database end point to RDS(if your use it) or else install mysql in your ec2 instance and connect to the local db which resides in the ec2 instance and connect to it,which is as simple as your development mode.
12)You dont have to worry about any security issues aws, it is secured. Dont follow the myths, I am have been using aws since 3 years running I dont even know remember how many applications, like(e-commerce,m-commerce,social media apps) I never faced any kind of security issues and also aws allows to set your security how ever you want.
Go ahead, happy coding. Contact me if you have any problem.
The answer above is a good summary on AWS. Just wanted to add
AWS offers full data center, so it depends what you are trying to achieve. For starters you will need,
EC2 - This is your server, it comes with instance storage, which will be lost on restart
EBS - Your mounted storage, the data is persisted across reboots
S3 - Provides storage (RESTful API's on top, the cost is usage based rather than "provisioned" as in EBS)
Databases - can start with Amazon RDS, which provides managed database services, you can chose between various available databases. You can also install your own database using EC2 + EBS, you will have to take care of managing the database yourself.
Elastic IP: Public facing IP address, you can point your DNS server to this.
One great tool to calculate the pricing,
http://calculator.s3.amazonaws.com/calc5.html
Some other services to take in account are:
VPC (Virtual Private Cloud). This is your own private network. You can define subnets, route tables and internet gateways there. I would strongly recommend to use VPC for any serious deployment of more than one instance.
Glacier - this will replace your tape library to storing backups.
Cloud Formation - great tool for deployment and automation of instances.