HDFS Balancer is balancing blocks in the same rack of nodes - hdfs

When am running HDFS balancer using below command , the balancer is balancing the blocks in the same rack, and it is not moving blocks to different racks, where nodes are having free space.
is there any property to set to copy the blocks across multiple racks? i am using HDFS FEDERATION in the cluster.

Related

Load Balancing of 2 instances in AWS

I have two VM's (in AWS cloud) connected to single DB. Each VM is having same application running. I want to load balance those two VM's and route based on the traffic. (Like if traffic is more on one VM instance then it should switch to another VM).
Currently I am accessing 2 different instances with 2 different IP addresses with HTTP. Now I want to access those 2 VM's with HTTPS and route the instances with same DNS like (https://dns name/service1/),
(https://dns name/service2/)
How can I do load balancing using nginx ingress.
I am new to AWS cloud. Can someone help me or guide me or suggest me some appropriate related references in getting the solution to it.
AWS offers an Elastic Load Balancing service.
From What is Elastic Load Balancing? - Elastic Load Balancing:
Elastic Load Balancing automatically distributes your incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. It monitors the health of its registered targets, and routes traffic only to the healthy targets. Elastic Load Balancing scales your load balancer as your incoming traffic changes over time. It can automatically scale to the vast majority of workloads.
You can use this ELB service instead of running another Amazon EC2 instance with nginx. (Charges apply.)
Alternatively, you could configure your domain name on Amazon Route 53 to use Weighted routing:
Weighted routing lets you associate multiple resources with a single domain name (example.com) or subdomain name (acme.example.com) and choose how much traffic is routed to each resource. This can be useful for a variety of purposes, including load balancing and testing new versions of software.
This would distribute the traffic when resolving the DNS Name rather than using a Load Balancer. It's not quite the same because DNS information is cached, so the same client would continue to be redirected to the same server until the cache is cleared. However, it is practically free to use.

How does location matter while creating a Global HTTPS Google Cloud Load Balancer?

I am creating a global HTTPS Load Balancer in Google Cloud and wondering how does location affect a global load balancer.
I am unable to find much detail about it on internet.
There are price differences based on location:
https://cloud.google.com/vpc/network-pricing#lb
but no idea how it affects routing of https requests:
https://storage.googleapis.com/gweb-cloudblog-publish/images/global_lb.max-1800x1800.png
For example: if my website receives most of its traffic from USA, India and Europe then what would be the best location I should choose while setting up global HTTPS Load Balancer and what difference it will make?
The primary difference is that the global load balancing is for Layer-7 (high-level application layer) traffic while regional load balance is for Layer-4 (transport layer) traffic and uses Maglev for traffic routing.
Let's say you want to do your own SSL terminations and operate your own Layer-7 reverse proxies, I believe regional is the correct option. For the vast majority of users I would wager that global is the better choice.
Have a look at the documentation Cloud Load Balancing overview section Global versus regional load balancing:
Use global load balancing when your backends are distributed
across multiple regions, your users need access to the same
applications and content, and you want to provide access by using a
single anycast IP address. Global load balancing can also provide IPv6
termination.
Use regional load balancing when your backends are in one region,
and you only require IPv4 termination.
and at section External HTTP(S) Load Balancing:
HTTP(S) Load Balancing is implemented on GFEs. GFEs are
distributed globally and operate together using Google's global
network and control plane. In Premium Tier, GFEs offer cross-regional
load balancing, directing traffic to the closest healthy backend that
has capacity and terminating HTTP(S) traffic as close as possible to
your users.
more information about Network Service Tiers you can find in this article.
To find more details have a look at the documentation External HTTP(S) Load Balancing overview:
HTTP(S) Load Balancing is a global service when the Premium Network
Service Tier is used.
and
When a user request comes in, the load balancing service determines the approximate origin of the request from the source IP address.
The load balancing service knows the locations of the instances owned by the backend service, their overall capacity, and their
overall current usage.
If the closest instances to the user have available capacity, the request is forwarded to that closest set of instances.
Incoming requests to the given region are distributed evenly across all available backend services and instances in that region. However,
at very small loads, the distribution may appear to be uneven.
If there are no healthy instances with available capacity in a given region, the load balancer instead sends the request to the next
closest region with available capacity.
also
HTTP(S) Load Balancing is a regional service when the Standard Network
Service Tier is used. Its backend instance groups or NEGs must all be
located in the region used by the load balancer's external IP address
and forwarding rule.
Meanwhile, Maglev is a distributed system for Network Load Balancing.

Consolidating multiple AWS Classic Load Balancers into a single load balancer

We currently use a single AWS classic load balancer per EC2. This was cost effective for not many EC2s but now we're a growing project, we have 8 Classic Load Balancers which is starting to cost more than we'd like
What could I do to consolidate these multiple load balancers into a single load balancer?
The current load balancers are only used to forward HTTP/HTTPs traffic to an EC2 that's registered against it
I have DNS A records setup to route to the load balancers
Without knowing all the details, you might be better creating a single application load balancer with multiple target groups, this way it's only one load balancer and then you have the segregation at target group level rather than load balancer level.
If you need http/s access to some pieces of infrastructure and app access to others then you might consider one network LB and one application LB.

How to synchronize data between backend instances in load balancing?

This is my first time trying load balancing in GCE, or at all to be honest. I have followed the GCE document for creating a cross-region load balancing and successfully created a work HTTP(S) load balancing to 2 instances in us-central1-b and 2 instances europe-west1-b.
After following the tutorial for creating a load balancing with an unmanaged instance group, I was desperately looking for ways to synchronize data between the instances around the world. When I mean synchronizing data, I mean sharing the same website files and database information. I know I can manually upload files to each instance but that would take quite a while if I have instances in more than two locations.
I've heard of using Cloud Storage for sharing the same (static) data across the instances, but I am not sure if that is applicable to cross-region load balancing (I've only found content-based load balancing document for that). What I am concerned with this method is that I will need to create multiple Cloud Storage in multiple regions (where instances are located) to decrease latency. Otherwise, for example, instances from Singapore will have to request data from Cloud Storage in the United States, which would decrease latency and potentially the purpose of cross-region load balancing (or am I wrong?)
I've also heard of creating a master Cloud SQL and external (slave) MySQL in each instance for synchronized databases, but are there any other recommended methods (and potentially better in terms of performance?)

CPU and memory utilization discrepancies for ejabberd and Riak clusters on AWS

I'm running a 2-node ejabberd cluster (behind an elastic load balancer) that in turn connects with a 3-node Riak cluster (again, via an ELB) on AWS. When I load-test the platform via Tsung (creating 0.5 million user registrations), I notice that the CPU utilization for the ejabberd nodes differs amongst themselves by around 10%. For the Riak nodes, the CPU and memory utilization amongst nodes differs by around 5%.
The nodes are of identical configuration, so wondering what could be leading to these non-trivial differences in utilization. Can anyone throw some light here please / educate me?
Is it due to the load balancer? Or a network impact? I expect that once a cluster is formed (either of ejabberd or of Riak KV), the nodes are all identical in behavior, especially for ejabberd where the entire database is replicated across the cluster.
Not that these differences are a problem, but would be good to understand the inner workings of the clusters here...
Many thanks.
Elastic Load Balancing mechanism
DNS server uses DNS round robin to determine which load balancer node in a specific Availablity Zone will receive the request
The selected load balancer checks for "sticky session" cookie
The selected load balancer sends the request to the least loaded instance
And in greater details:
Availability Zones (unlikely your case)
By default, the load balancer node routes traffic to back-end instances within the same Availability Zone. To ensure that your back-end instances are able to handle the request load in each Availability Zone, it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a.
Sessions (most likely your case)
By default a load balancer routes each request independently to the server instance with the smallest load. By comparison, a sticky session binds a user's session to a specific server instance so that all requests coming from the user during the session will be sent to the same server instance.
AWS Elastic Beanstalk uses load balancer-generated HTTP cookies when sticky sessions are enabled for an application. The load balancer uses a special load balancer–generated cookie to track the application instance for each request. When the load balancer receives a request, it first checks to see if this cookie is present in the request. If so, the request is sent to the application instance specified in the cookie. If there is no cookie, the load balancer chooses an application instance based on the existing load balancing algorithm. A cookie is inserted into the response for binding subsequent requests from the same user to that application instance. The policy configuration defines a cookie expiry, which establishes the duration of validity for each cookie.
Routing Algorithm (less likely your case)
Load balancer node sends the request to healthy instances within the same Availability Zone using the leastconns routing algorithm. The leastconns routing algorithm favors back-end instances with the fewest connections or outstanding requests.
Source: Elastic Load Balancing Terminology And Key Concepts
Hope it helps.