Openshift roundrobin request across all pods

Openshift roundrobin request across all pods - cookies

We want to have roundrobin of request across all pods deployed in openshift.
I have configured below annotations in Route config but the sequence of calls to all pods is random:
haproxy.router.openshift.io/balance : roundrobin
haproxy.router.openshift.io/disable_cookies: 'true'
We have spinup 3 pods. We want requests to have sequence
pod1,pod2,pod3,pod1,pod2,pod3,pod1....
But the real behaviour after setting above annotations in random like:
pod1,pod1,pod2,pod2,pod3,pod1,pod2,pod2.... which is incorrect.
Do we need to configure any openshift configuration make it perfect roundroubin?

If you want to access through pod1, pod2, pod3 in order, the you should use leastconn on the same pod group.
leastconn The server with the lowest number of connections receives the
connection. Round-robin is performed within groups of servers
of the same load to ensure that all servers will be used. Use
of this algorithm is recommended where very long sessions are
expected, such as LDAP, SQL, TSE, etc... but is not very well
suited for protocols using short sessions such as HTTP. This
algorithm is dynamic, which means that server weights may be
adjusted on the fly for slow starts for instance.
roundrobin of HAProxy would distribute the request equally, but it might not protect the accessing server order in the group.
roundrobin Each server is used in turns, according to their weights.
This is the smoothest and fairest algorithm when the server's
processing time remains equally distributed. This algorithm
is dynamic, which means that server weights may be adjusted
on the fly for slow starts for instance. It is limited by
design to 4095 active servers per backend. Note that in some
large farms, when a server becomes up after having been down
for a very short time, it may sometimes take a few hundreds
requests for it to be re-integrated into the farm and start
receiving traffic. This is normal, though very rare. It is
indicated here in case you would have the chance to observe
it, so that you don't worry.
Refer HAProxy balance (algorithm) for details of balance algorithm options.

Related

Using Istio circuit breakers in production

I'd like to start using the http2MaxRequests circuit breaker in my Kubernetes based Istio service mesh, but the semantics are befuddling to me:
on the destination side, it does what I would expect: each Envoy will accept at most the specified number of concurrent requests, and will then reject new requests. Importantly, the configuration is based on the concurrency supported by each instance of my application server; when I scale the number of instances up or down, I don't need to change this configuration.
on the source side, things are a lot more messy. The http2MaxRequests setting specifies the number of concurrent requests from each source Envoy to the entire set of destination Envoys. I simply don't understand how this can possible work in production!
First off, different source applications have wildly different concurrency models; we have an Nginx proxy that supports a massive amount of concurrent requests per instance, and we have internal services that support a handful. Those source applications will themselves be scaled very differently, with e.g. Nginx having very few instances that make lots of requests. I would need a DestinationRule for each source application pertaining to each destination service.
Even then, the http2MaxRequests setting limits the concurrent requests to the entire destination service, not to each individual instance (endpoint). This fundamentally breaks the notion of autoscaling, the entire point of which is to vary the capacity as needed.
What I would love to see is a way to enforce at the source side a destination-instance concurrency limit; that is, to tie the limit not to the entire Envoy cluster, but to each Envoy endpoint instead. That would make it actually useful for me, as I could then base the configuration on the concurrency supported by each instance of my service, which is a constant factor.

Coordinating master and worker machines

If this question seems basic to more IT-oriented folks, then I apologize in advance. I'm not sure it falls under the ServerFault domain, but correct me if I'm wrong...
This question concerns some backend operations of a web application, hosted in a cloud environment (Google). I'm trying to assess options for coordinating our various virtual machines. I'll describe what we currently have, and those "in the know" can maybe suggest a better way (I hope!).
In our application there are a number of different analyses that can be run, each of which has different hardware requirements. They are typically very large, and we do NOT want these to be run on the application server (referred to as app_server below).
To that end, when we start one of these analyses, app_server will start a new VM (call this VM1). For some of these analyses, we only need VM1; it performs the analysis and sends a HTTP POST request back to app_server to let it know the work is complete.
For other analyses, VM1 will in turn will launch a number of worker machines (worker-1,...,worker-N), which run very similar tasks in parallel. Once the task on a single worker (e.g. worker-K) is complete, it should communicate back to VM1: "hey, this is worker-K and I am done!". Once all the workers (worker-1,...,worker-N) are complete, VM1 does some merging operations, and finally communicates back to app_server.
My question is:
Aside from starting a web server on VM1 which listens for POST requests from the workers (worker-1,..), what are the potential mechanisms for having those workers communicate back to VM1? Are there non-webserver ways to listen for HTTP POST requests and do something with the request?
I should note that all of my VMs are operating within the same region/zone on GCE, so they are able to communicate via internal IPs without any special firewall rules, etc. (e.g. running $ ping <other VM's IP addr> works). I obviously do not want any of these VMs (VM1, worker-1, ..., worker-N) to be exposed to the internet.
Thanks!

Sounds like the right use-case for Cloud Pub/Sub. https://cloud.google.com/pubsub
In your case workers would be publishing events to the queue and VM1 would be subscribing to them.
Hard to tell from your high - level overview if it can be a match, but take a look at Cloud Composer too https://cloud.google.com/composer/

Optimizing Jetty for heartbeat detection of thousands of machines?

I have a large number of machines (thousands and more) that every X seconds would perform an HTTP request to a Jetty server to notify they are alive. For what value of X should I use persistent HTTP connections (which limits number of monitored machines to number of concurrent connections), and for what value of X the client should re-establish a TCP connection (which in theory would allow to monitor more machines with the same Jetty server).
How would the answer change for HTTPS connections? (Assuming CPU is not a constraint)
This question ignores scaling-out with multiple Jetty web servers on purpose.
Update: Basically the question can be reduced to the smallest recommended value of lowResourcesMaxIdleTime.

I would say that this is less of a jetty scaling issue and more of a network scaling issue, in which case 'it depends' on your network infrastructure. Only you really know how your network is laid out and what sort of latencies are involved in order to come up with a value of X.
From an overhead perspective the persistent HTTP connections will of course have some minor effect (well I say minor but depends on your network) and the HTTPS will again have a larger impact....but only from a volume of traffic perspective since you are assuming CPU is not a constraint.
So from a jetty perspective, it really doesn't need to be involved in the question, you seem to ultimately be asking for help optimizing bytes of traffic on the wire so really you are looking for the best protocol at this point. Since with HTTP you are having to mess with headers for each request you may be well served looking at something like spdy or websocket which will give you persistent connections but are optimized for low round trip network overhead. But...they seem sort of overkill for a heartbeat. :)

How about just make them request at different time? Assume first machine request, then you pick a time to response to that machine as the next time to heart beat of that machine (also keep the id/time at jetty server), the second machine request, you can pick another time to response to second machine.
In this way, you can make each machine perform heart beat request at different time so no concurrent issue.
You can also use a random time for the first heart beat if all machines might start up at the same time.

How big of an impact are local LAN HTTP connections to underlying APIs of web infrastructures?

When deploying web applications, a common approach is to implement the actual application logic as a series of services and expose them via HTTP, then put some front ends between the services and the users. Typically those front ends will take care of things like SSL, session data, load balancing, routing requests, possibly caching, and so on.
AFAIK, that usually means that every time a new request comes in, one or more new requests must be made to one or more of the backends: each one with its TCP handshake, HTTP overhead, etc.
Doesn't that additional connections add a measurable latency and/or performance hit?
If so, what techniques are in common practice to get the best performance from those kind of deployments?

Latency on a local connection will be minimal - single digit milliseconds at most probably. There will be some occupancy overhead for extra HTTP sessions, but then its spread out among different apps.
The advantage of the approach you describe is that is distributes the load amongst different apps so you can have lots of front-end bits doing heavy lifting like SSL and have fewer backend apps that handle more sessions. And you can pick and mix what apps you need.
A single monolithic app will probably be a bit faster until it runs out of capacity at which point you have a problem because its hard to scale up.

How heavy for the server to transmit data over HTTPS?

I am trying to implement web service and web client applications using Ruby on Rails 3. For that I am considering to use a SSL but I would like to know: how "heavy" is it for servers to handle a lot of HTTPS connection instead of HTTP? what is the difference of response time and the performance at all?

The cost of SSL/TLS handshake (which takes most of the overall "slowdown" SSL/TLS adds) nowadays is much less than the cost of TCP connection establishment and other actions associated with session establishment (logging, user lookup etc). And if you worry about speed and want to save any ns of time, there exist hardware SSL accelerators that you can install to your server.

It is several times slower to go with HTTPS, however, most of the time that's not what is actually going to slow your app down. Especially if you're running on Rails, your performance scaling is going to be bottlenecked elsewhere in the system. If you are doing anything that requires the passing of secrets of any kind over the wire (including a shared session cookie), SSL is the only way to go and you probably won't notice the cost. If you happen to scale up to the point where you do start to see a performance hit from encryption, there are hardware acceleration appliances out there that help tremendously. However, rails is likely to fall over long before that point.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js