I'd like to start using the http2MaxRequests circuit breaker in my Kubernetes based Istio service mesh, but the semantics are befuddling to me:
on the destination side, it does what I would expect: each Envoy will accept at most the specified number of concurrent requests, and will then reject new requests. Importantly, the configuration is based on the concurrency supported by each instance of my application server; when I scale the number of instances up or down, I don't need to change this configuration.
on the source side, things are a lot more messy. The http2MaxRequests setting specifies the number of concurrent requests from each source Envoy to the entire set of destination Envoys. I simply don't understand how this can possible work in production!
First off, different source applications have wildly different concurrency models; we have an Nginx proxy that supports a massive amount of concurrent requests per instance, and we have internal services that support a handful. Those source applications will themselves be scaled very differently, with e.g. Nginx having very few instances that make lots of requests. I would need a DestinationRule for each source application pertaining to each destination service.
Even then, the http2MaxRequests setting limits the concurrent requests to the entire destination service, not to each individual instance (endpoint). This fundamentally breaks the notion of autoscaling, the entire point of which is to vary the capacity as needed.
What I would love to see is a way to enforce at the source side a destination-instance concurrency limit; that is, to tie the limit not to the entire Envoy cluster, but to each Envoy endpoint instead. That would make it actually useful for me, as I could then base the configuration on the concurrency supported by each instance of my service, which is a constant factor.
Related
I want to know how I can have different implementation of the same service and switch traffic from one to the other when failure starts to occur (active / passive) or have traffic go from a 50%/50% split to a 0%/100% split when service implementation A is not responding. I would expect the 50/50 split to be restored once implementation A starts working again.
For example, I want to have a payment service and I have an implementation with Cybersource and the other with Stripe (or whatever other provider makes sense). My implementation will start returning 504 when they detect that response times on one of the providers is above a certain threshold or good old 500 because a bug occured. At that point, I want the clients to only connect to the fastest (properly working) implementation for a while and gradually retry the failed implementation once the health probe give it a green light.
Similarly for an active/passive scenario perhaps I have a search API and I want all traffic to go to implementation A. However, when that implementation starts returning 5XX, I want traffic to be routed to implementation B which is perhaps offering a degraded experience, but can be used as a backup implementation.
When I read the istio documentation / blogs, etc. I don't see the scenarios above. Perhaps Istio is not the right choice for that ?
I am currently doing research on the service mesh Istio in version 1.6. The data plane (Envoy proxies) are configured by the controle plane. Especially Pilot (part of istiod) is responsible to propagate routing rules and configs to the envoys. I am wondering how the communication is working?
Is it a single gRPC stream that is opened when the sidecar container starts for the first time and that stays opened during the sidecars whole lifecycle. If the mesh changes, Pilot uses this stream to inform envoy via the xDS api about the changes? So updates are based on a push strategy? OR does the sidecar pull for new configs in a defined interval?
What is the role of the istio agent (fromer pilot and citadel agent) in the sidecar container (especially the former pilot agent, I know that the Citadel agent is of the CSR process)? Does it pull for new configs, does it only bootstrap the envoy, but why is it then always running, ...?
Thanks in advance!
The best explanation how istio envoy works is from envoy documentation. It is actually lot more complicated than it seems:
Initialization
How Envoy initializes itself when it starts up is complex. This section explains at a high level how the process works. All of the following happens before any listeners start listening and accepting new connections.
During startup, the cluster manager goes through a multi-phase initialization where it first initializes static/DNS clusters, then predefined EDS clusters. Then it initializes CDS if applicable, waits for one response (or failure) for a bounded period of time, and does the same primary/secondary initialization of CDS provided clusters.
If clusters use active health checking, Envoy also does a single active health check round.
Once cluster manager initialization is done, RDS and LDS initialize (if applicable). The server waits for a bounded period of time for at least one response (or failure) for LDS/RDS requests. After which, it starts accepting connections.
If LDS itself returns a listener that needs an RDS response, Envoy further waits for a bounded period of time until an RDS response (or failure) is received. Note that this process takes place on every future listener addition via LDS and is known as listener warming.
After all of the previous steps have taken place, the listeners start accepting new connections. This flow ensures that during hot restart the new process is fully capable of accepting and processing new connections before the draining of the old process begins.
A key design principle of initialization is that an Envoy is always guaranteed to initialize within initial_fetch_timeout, with a best effort made to obtain the complete set of xDS configuration within that subject to the management server availability.
As for updating envoy config:
Runtime configuration
Envoy supports “runtime” configuration (also known as “feature flags” and “decider”). Configuration settings can be altered that will affect operation without needing to restart Envoy or change the primary configuration. The currently supported implementation uses a tree of file system files. Envoy watches for a symbolic link swap in a configured directory and reloads the tree when that happens. This type of system is very commonly deployed in large distributed systems. Other implementations would not be difficult to implement. Supported runtime configuration settings are documented in the relevant sections of the operations guide. Envoy will operate correctly with default runtime values and a “null” provider so it is not required that such a system exists to run Envoy.
Runtime configuration.
More information about how envoy proxy work can be found here.
According to istio documentation:
The benefit of consolidation: introducing istiod
Having established that many of the common benefits of microservices didn’t apply to the Istio control plane, we decided to unify them into a single binary: istiod (the ’d’ is for daemon).
Let’s look at the benefits of the new packaging:
Installation becomes easier. Fewer Kubernetes deployments and associated configurations are required, so the set of configuration options and flags for Istio is reduced significantly. In the simplest case, you can start the Istio control plane, with all features enabled, by starting a single Pod.
Configuration becomes easier. Many of the configuration options that Istio has today are ways to orchestrate the control plane components, and so are no longer needed. You also no longer need to change cluster-wide PodSecurityPolicy to deploy Istio.
Using VMs becomes easier. To add a workload to a mesh, you now just need to install one agent and the generated certificates. That agent connects back to only a single service.
Maintenance becomes easier. Installing, upgrading, and removing Istio no longer require a complicated dance of version dependencies and startup orders. For example: To upgrade, you only need to start a new istiod version alongside your existing control plane, canary it, and then move all traffic over to it.
Scalability becomes easier. There is now only one component to scale.
Debugging becomes easier. Fewer components means less cross-component environmental debugging.
Startup time goes down. Components no longer need to wait for each other to start in a defined order.
Resource usage goes down and responsiveness goes up. Communication between components becomes guaranteed, and not subject to gRPC size limits. Caches can be shared safely, which decreases the resource footprint as a result.
istiod unifies functionality that Pilot, Galley, Citadel and the sidecar injector previously performed, into a single binary.
A separate component, the istio-agent, helps each sidecar connect to the mesh by securely passing configuration and secrets to the Envoy proxies. While the agent, strictly speaking, is still part of the control plane, it runs on a per-pod basis. We’ve further simplified by rolling per-node functionality that used to run as a DaemonSet, into that per-pod agent.
Hope it helps.
We want to have roundrobin of request across all pods deployed in openshift.
I have configured below annotations in Route config but the sequence of calls to all pods is random:
haproxy.router.openshift.io/balance : roundrobin
haproxy.router.openshift.io/disable_cookies: 'true'
We have spinup 3 pods. We want requests to have sequence
pod1,pod2,pod3,pod1,pod2,pod3,pod1....
But the real behaviour after setting above annotations in random like:
pod1,pod1,pod2,pod2,pod3,pod1,pod2,pod2.... which is incorrect.
Do we need to configure any openshift configuration make it perfect roundroubin?
If you want to access through pod1, pod2, pod3 in order, the you should use leastconn on the same pod group.
leastconn The server with the lowest number of connections receives the
connection. Round-robin is performed within groups of servers
of the same load to ensure that all servers will be used. Use
of this algorithm is recommended where very long sessions are
expected, such as LDAP, SQL, TSE, etc... but is not very well
suited for protocols using short sessions such as HTTP. This
algorithm is dynamic, which means that server weights may be
adjusted on the fly for slow starts for instance.
roundrobin of HAProxy would distribute the request equally, but it might not protect the accessing server order in the group.
roundrobin Each server is used in turns, according to their weights.
This is the smoothest and fairest algorithm when the server's
processing time remains equally distributed. This algorithm
is dynamic, which means that server weights may be adjusted
on the fly for slow starts for instance. It is limited by
design to 4095 active servers per backend. Note that in some
large farms, when a server becomes up after having been down
for a very short time, it may sometimes take a few hundreds
requests for it to be re-integrated into the farm and start
receiving traffic. This is normal, though very rare. It is
indicated here in case you would have the chance to observe
it, so that you don't worry.
Refer HAProxy balance (algorithm) for details of balance algorithm options.
When deploying web applications, a common approach is to implement the actual application logic as a series of services and expose them via HTTP, then put some front ends between the services and the users. Typically those front ends will take care of things like SSL, session data, load balancing, routing requests, possibly caching, and so on.
AFAIK, that usually means that every time a new request comes in, one or more new requests must be made to one or more of the backends: each one with its TCP handshake, HTTP overhead, etc.
Doesn't that additional connections add a measurable latency and/or performance hit?
If so, what techniques are in common practice to get the best performance from those kind of deployments?
Latency on a local connection will be minimal - single digit milliseconds at most probably. There will be some occupancy overhead for extra HTTP sessions, but then its spread out among different apps.
The advantage of the approach you describe is that is distributes the load amongst different apps so you can have lots of front-end bits doing heavy lifting like SSL and have fewer backend apps that handle more sessions. And you can pick and mix what apps you need.
A single monolithic app will probably be a bit faster until it runs out of capacity at which point you have a problem because its hard to scale up.
I am looking to implement a service (web/windows, .net) that maintains a list of available services and can provide an endpoint based on the nature or type of request. The requester can then pass the actual work request to the provided endpoint. The actual work requests can contain very large chunks (from 10MB up to and possibly exceeding a GB) of data.
WCF routing services sounds like a perfect fit, but turns out not to be because the it requires the actual work request to pass through it, creating a bottleneck at the routing service (the whole point is to get a system to be able to scale out). If I had smaller messages, WCF routing would be a no brainer.
Is there anything out there that fits the bill? Preferably .NET/windows based?
Do you mean because the requests block for work?
Do could use OneWay OperationContract to create async services so as to not block the request pool.
[ServiceContract]
interface IMyContract
{
[OperationContract(IsOneWay = true)]
void DoWork()
}
Update
I think understand your question better now, you are looking to distribute load to different servers to avoid request bottle necks due to heavy traffic load (preferably distributed based on content).
I'd say that MVC Routing is indeed ideal for this. One of the features that you can leverage is the fall over functionality. You can actually define multiple backup endpoints, and in the case where one fails, it will automatically move over to the next. There's a good introduction to how this works here.
There's also a good article here that talks about load balancing with WCF using the same principles. It provides 2 solutions for a round robin filter implementation that allows you to load balance the service requests (even though at the begin he says his general answer to whether it supports load balancing is no for implementation reasons).
If you are worried about all requests routing via the one server and still becoming a bottle neck, then think of web load balancers. It's the same scenario. Sitting in the middle forwarding packets doesn't require much work, and they have no problem handling huge volumes of traffic. I don't think this is an issue IMO.