How Does GCP Flushes the static Routes pointing to unresponsive VM's - google-cloud-platform

Hello Community Members,
Topology:- I have shared VPC where I have two NGFW VM's. Each have its own IPSEC tunnel and on top of it BGP is running.
Northbound on-perm DC device there are same routes received from both VM's (primary and backup). however while advertising the routes over BGP i am adding MED value for routes advertised via backup VM so that VM is not preferred.
On Southbound I have client VM and according to the static routes configuration in shared VPC the primary route with priority 10 is given to primary VM instance and priority 15 route is given to backup VM so that Primary always serve the traffic and backup VM is only active if the Primary is failed.
Problem :- When i reboot the Primary VM, on-perm DC immediately prefer the BGP routes from backup VM however on GCP side the route is still pointing to the Primary VM, at least on control plane (not sure how to check on data plane), GCP static route is not flushed so that causes asymmetric routing and my client VM traffic is impacted around ~10 min (basically the time primary vm takes to come up).
Question: - Does GCP routing does not flush the static route pointed to unresponsive VM?
If yes how to check routes on control and data plane?
2)What is alternative way to fix this. Does the Active-Passive ILB help? I can't use Active-Active ILB due to asymmetric routing scenario's.
I expect to traffic to be immediately taken over by backup VM as primary VM is unresponsive so GCP should flush the static route.

Related

Securing physical servers with an NSX-T gateway firewall in active/active edge routing scenario

So my plan is to secure a set of physical servers in a private network against the entire NSX-T workload domain, without buying an additional hardware firewall, since we have massive edge capacity left, but no money. :/
So intuitively I would just add an NSX gateway firewall, just like it's described in this blog post:
https://blogs.vmware.com/networkvirtualization/2020/08/the-nsx-t-gateway-firewall-secures-physical-servers.html/
But there it's the easy case, when the firewall is just added to the default T0, which I can't do due to our active/active setup. So I would have to add an additional active/passive T0 and connect it to the existing T0.
But how do I now force traffic to my private network through the additional T0 including the gateway firewall?
Apprently this is impossible to achieve without bridging only the second T0 to the private network's vlan and omitting the route via the physical BGP router. Or is there a chance?
So if the private network is routed via the physical BGP uplink router, there is indeed no way, but to hide this route on the physical BGP. This wouldn't make too much sense anyways, so let's consider the case where it doesn't.
Then there are apparently two solutions to this task, with the first probably being the more straight forward one:
Deploy an additional service edge (active/passive) and then add a service gateway and the gateway firewall to any T1. The corresponding T1 service router will then be deployed on the service edge (you have to pick one in the deployment wizard). Now we might only have to add a prefix filter on the NSX BGP uplink, if we want to hide the private network to the external uplink network.
Configure L2 bridging: Vmware Docs. Create an L2 bridge in any segment and add the gateway firewall to this segment's T1 uplink or add a bridge firewall to the bridge's VDS. Then optionally apply the prefix filter for the bridged LAN on the BGP uplink.
an upcoming update will let you use A/A and firewall (stateful). VMware is working on it. That simply set stateful services to use even with A/A routing setup.

ELB cross-AZ balancing DNS resolution with Sticky sessions

I am preparing for AWS certification and came across a question about ELB with sticky session enabled for instances in 2 AZs. The problem is that requests from a software-based load tester in one of the AZs end up in the instances in that AZ only instead of being distributed across AZs. At the same time regular requests from customers are evenly distributed across AZs.
The correct answers to fix the load tester issue are:
Forced the software-based load tester to re-resolve DNS before every
request;
Use third party load-testing service to send requests from
globally distributed clients.
I'm not sure I can understand this scenario. What is the default behaviour of Route 53 when it comes to ELB IPs resolution? In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request? Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?
I can understand that requests from the same instance, with load testing software on it, will go to the same LBed EC2, but why does it have to be an instance in the same AZ? It can be achieved only by Geolocation- or Latency-based routing, but I can't find anything in the specs whether those are the default ones.
When an ELB is in more than one availability zone, it always has more than one public IP address -- at least one per zone.
When you request these records in a DNS lookup, you get all of these records (assuming there are not very many) or a subset of them (if there are a large number, which would be the case in an active cluster with significant traffic) but they are unordered.
If the load testing software resolves the IP address of the endpoint and holds onto exactly one of the IP addresses -- as it a likely outcome -- then all of the traffic will go to one node of the balancer, which is in one zone, and will send traffic to instances in that zone.
But what about...
Cross-Zone Load Balancing
The nodes for your load balancer distribute requests from clients to registered targets. When cross-zone load balancing is enabled, each load balancer node distributes traffic across the registered targets in all enabled Availability Zones. When cross-zone load balancing is disabled, each load balancer node distributes traffic across the registered targets in its Availability Zone only.
https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html
If stickiness is configured, those sessions will initially land in one AZ and then stick to that AZ because they stick to the initial instance where they landed. If cross-zone is enabled, the outcome is not quite as clear, but either balancer nodes may prefer instances in their own zone in that scenario (when first establishing stickiness), or this wasn't really the point of the question. Stickiness requires coordination, and cross-AZ traffic takes a non-zero amount of time due to distance (typically <10 ms) but it would make sense for a balancer to prefer to select instances its local zone for sessions with no established affinity.
In fact, configuring the load test software to re-resolve the endpoint for each request is not really the focus of the solution -- the point is to ensure that (1) the load test software uses all of them and does not latch onto exactly one and (2) that if more addresses become available due to the balancer scaling out under load, that the load test software expands its pool of targets.
In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request?
The software may not see the TTL, may not honor the TTL and, as noted above, may stick to one answer even if multiple are available, because it only needs one in order to make the connection. Every request is not strictly necessary, but it does solve the problem.
Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?
To "resolve DNS" in this context simply means to do a DNS lookup, whatever that means in the specific instance, whether using the OS's DNS resolver or making a direct query to a recursive DNS server. When software establishes a connection to a hostname, it "resolves" (looks up) the associated IP address.
The other solution, "use third party load-testing service to send requests from globally distributed clients," solves the problem by accident, since the distributed clients -- even if they stick to the first address they see -- are more likely to see all of the available addresses. The "global" distribution aspect is a distraction.
ELB relies on random arrival of requests across its external-facing nodes as part of the balancing strategy. Load testing software whose design overlooks this is not properly testing the ELB. Both solutions mitigate the problem in different ways.
The sticky is the issue , see here : https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html
The load balancer uses a special cookie to associate the session with
the instance that handled the initial request, but follows the
lifetime of the application cookie specified in the policy
configuration. The load balancer only inserts a new stickiness cookie
if the application response includes a new application cookie. The
load balancer stickiness cookie does not update with each request. If
the application cookie is explicitly removed or expires, the session
stops being sticky until a new application cookie is issued.
The first solution, to re-resolve DNS will create new sessions and with this will break the stickiness of the ELB . The second solution is to use multiple clients , stickiness is not an issue if the number of globally distributed clients is large.
PART 2 : could not add as comment , is to long :
Yes, my answer was to simple and incomplete.
What we know is that ELB is 2 AZ and will have 2 nodes with different IP. Not clear how many IP , depends on the number of requests and the number of servers on each AZ. Route 53 is rotating the IP for every new request , first time in NodeA-IP , NodeB-IP , second time is NodeB-IP, NodeA-IP. The load testing application will take with every new request the first IP , balancing between the 2 AZ. Because a Node can route only inside his AZ , if the sticky cookie is for NodeA and the request arrives to NodeB , NodeB will send it to one of his servers in AZ2 ignoring the cookie for a server in AZ 1.
I need to run some tests, quickly tested with Route53 with classic ELB and 2 AZ and is rotating every time the IP's. What I want to test if I have a sticky cookie for AZ 1 and I reach the Node 2 will not forward me to Node 1 ( In case of no available servers, is described in the doc this interesting flow ). Hope to have updates in short time.
Just found another piece of evidence that Route 53 returns multiple IPs and rotate them for ELB scaling scenarios:
By default, Elastic Load Balancing will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request. As the traffic profile changes, the controller service will scale the load balancers to handle more requests, scaling equally in all Availability Zones.
And then:
To ensure that clients are taking advantage of the increased capacity, Elastic Load Balancing uses a TTL setting on the DNS record of 60 seconds. It is critical that you factor this changing DNS record into your tests. If you do not ensure that DNS is re-resolved or use multiple test clients to simulate increased load, the test may continue to hit a single IP address when Elastic Load Balancing has actually allocated many more IP addresses.
What I didn't realize at first is that even if regular traffic is distributed evenly across AZs, it doesn't mean that Cross-Zone Load Balancing is enabled. As Michael pointed out regular traffic will naturally come through various locations and end up in different AZs.
And as it was not explicitly mentioned in the test, Cross-AZ balancing might not have been in place.
https://aws.amazon.com/articles/best-practices-in-evaluating-elastic-load-balancing/

How to Join Local Windows Machine to AWS Active Directory

Hi my goal is to create Active Directory in AWS. I used simple AD and used 2 public and 2 private subnets within the same VPC with the private ones being for the domain controllers. I created an EC2 instance within the same VPC with Windows Server so that I can manage the AD. My EC2 instance joins the domain with no problem. My problem however is I cannot get my local machines on my network to join the AD, as the DC's, are of course private IP's and I cant change the DNS on my machine to these IP's unless on the same network.
Im guessing I need a VPN to join my local network to the Network in the AWS cloud.
Is there a way to achieve having AD in AWS without a VPN such as using an elastic IP with NAT to communicate to the DC's? Or maybe even promoting my EC2 instance to a DC then connecting the local machines DNS to the EC2 instances elastic IP?
Any help is much appreciated and let me know if I am missing any information or not explaining the goal clear enough.
Your question mentions Simple AD. My comments will be for Active Directory in AWS.
Setting up Active Directory in AWS and on-premises is not as easy as I would like it to be. This topic can fill a small book or as Amazon does it, multiple hour long videos. Watch a few while thinking up your solution.
1) Simple AD is not real Active Directory. It is Samba 4, which is very good, but is an Active Directory clone.
2) Do not, and I repeat do not, think about putting Active Directory on a public IP address to serve your on-premises users. The number of ports that you need to open and the risk is just not worth it.
3) Most, if not all, real solutions for configuring Active Directory on-premises and in AWS involve VPNs. Either Direct Connect (DX), hardware routers (Cisco) or site to site VPNs built from OpenSwan or Windows Server.
Note: OpenSwan is very easy to setup, so this is the route I would recommend if cost is a factor. Otherwise look at Cisco ASA type routers (lots of vendors here) for your office and setup a VPN with IPSEC. If cost is not a factor, absolutely go with Direct Connect (DX).
Note: I also use OpenVPN to connect to AD in AWS from home. This setup routes my workstation to a VPC in AWS and is so easy to setup and use. You could start with this to get comfortable with networking to a VPC. There are preconfigured OpenVPN setups in AWS marketplace that are free (user limited).

Using Redis behing AWS load balancer

We're using Redis to collect events from our web application (pub/sub based) behind AWS ELB.
We're looking for a solution that will allow us to scale-up and high-availability for the different servers. We do not wish to have these two servers in a Redis cluster, our plan is to monitor them using cloudwatch and switch between them if necessary.
We tried a simple test of locating two Redis server behind the ELB, telnetting the ELB DNS and see what happens using 'redis-cli monitor', but we don't see nothing. (when trying the same without the ELB it seems fine)
any suggestions?
thanks
I came across this while looking for a similar question, but disagree with the accepted answer. Even though this is pretty old, hopefully it will help someone in the future.
It's more appropriate for your question here to use DNS failover with a Redis Replication Auto-Failover configuration. DNS failover provides groups of availability (if you need that level of scale) and the Replication group provides cache up time.
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-configuring.html
The Active-passive failover should provide the solution you're wanting with High Availability:
Active-passive failover: Use this failover configuration when you want
a primary group of resources to be available the majority of the time
and you want a secondary group of resources to be on standby in case
all of the primary resources become unavailable. When responding to
queries, Amazon Route 53 includes only the healthy primary resources.
If all of the primary resources are unhealthy, Amazon Route 53 begins
to include only the healthy secondary resources in response to DNS
queries.
After you setup the DNS, then you would point that to the Elasticache Redis failover group's URL and add multiple groups for higher availability during a failover operation.
However, you might need to setup your application to write and read from different endpoints to maximize the architecture's scalability.
Sources:
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/AutoFailover.html
Placing a pair of independent redis nodes behind a LB will likely not be what you want. What will happen is ELB will try to balance connections to each instance, splitting half to one and half to another. This means that commands issued by one connection may not be seen by another. It also means no data is shared. So client a could publish a message, and client b being subscribed to the other server won't see the message.
For PUBSUB behind ELB you have a secondary problem. ELB will close an idle connection. So if you subscribe to a channel that isn't busy your ELB will close your connection. As I recall the max you can make this is 60s, meaning if you don't publish a message every single minute your clients will be disconnected.
As to how much of a problem that is depends on your client library, and frankly in my experience most don't handle it well in that they are unaware of the need to re-subscribe upon re-establishing the connection, meaning you would have to code that yourself.
That said a sentinel + redis solution would be quite ideal if your c,isn't has proper sentinel support. In this scenario. Your client asks the sentinels for the master to talk to, and on a connection failure it repeats this process. This would handle the setup you describe, without the problems of being behind an ELB.
Assuming you are running in VPC:
did you register the EC2 instances with the ELB?
did you add the correct security group setting to the ELB (allowing inbound port 23)?
did you add an ELB listener that maps port 23 on the ELB to port 23 on the instances?
did you set sensible ELB health checks (e.g. TCP on port 23) so that ELB thinks the EC2 instances are healthy?
If the ELB thinks the servers behind it are not healthy then ELB will not send them any traffic.

How to connect hornetq on AWS VPC from another vm on AWS

I have 2 VMs on AWS. On the first VM I have hornet and application that send messages to hornet. On another VM I have application that is a consumer of hornet.
The consumer fails to pull messages from hornet, and I can't understand why. Hornetq is running, I opened to ports to any IP.
I tried to connect hornet with jconsole (on my local computer) and failed, so I can't see if the hornet has any consumers/ suppliers.
I've tried to change 'bind' configurations to 0.0.0.0 but when I restarted hornet they were automatically changed to what I have as server IP in config.properties.
Any suggestions what might be the problem that I failed to connect my application to the hornetq?
Thanks!
These are the things you need to check for the connectivity between VMs in VPC.
The Security- Group of the instance has both Ingress-Egress Configuration settings unlike the traditional EC2 Security Group [ now Classic EC2 ]. Check the Egress from your Consumer and ingress to the Server
If the instances are in different Subnets you need to check for the ACL as well; however the default setting would be allow.
Check if the iptables / OS level firewall which are blocking.
With respect to the connectivity failed from your local machine to Hornetq - you need to place the Instance in Public sub and configure the Instance's SG accordingly; only the app / VM would accessible to public internet
I have assumed that both the instances are in the Same VPC. However the title of the post sounds slightly misleading - if it is 2 different VPCs altogether, then new concept of VPC Peering also comes in