Istio shifting service implementation on failure or active/active and active/passive services - istio

I want to know how I can have different implementation of the same service and switch traffic from one to the other when failure starts to occur (active / passive) or have traffic go from a 50%/50% split to a 0%/100% split when service implementation A is not responding. I would expect the 50/50 split to be restored once implementation A starts working again.
For example, I want to have a payment service and I have an implementation with Cybersource and the other with Stripe (or whatever other provider makes sense). My implementation will start returning 504 when they detect that response times on one of the providers is above a certain threshold or good old 500 because a bug occured. At that point, I want the clients to only connect to the fastest (properly working) implementation for a while and gradually retry the failed implementation once the health probe give it a green light.
Similarly for an active/passive scenario perhaps I have a search API and I want all traffic to go to implementation A. However, when that implementation starts returning 5XX, I want traffic to be routed to implementation B which is perhaps offering a degraded experience, but can be used as a backup implementation.
When I read the istio documentation / blogs, etc. I don't see the scenarios above. Perhaps Istio is not the right choice for that ?

Related

Using Istio circuit breakers in production

I'd like to start using the http2MaxRequests circuit breaker in my Kubernetes based Istio service mesh, but the semantics are befuddling to me:
on the destination side, it does what I would expect: each Envoy will accept at most the specified number of concurrent requests, and will then reject new requests. Importantly, the configuration is based on the concurrency supported by each instance of my application server; when I scale the number of instances up or down, I don't need to change this configuration.
on the source side, things are a lot more messy. The http2MaxRequests setting specifies the number of concurrent requests from each source Envoy to the entire set of destination Envoys. I simply don't understand how this can possible work in production!
First off, different source applications have wildly different concurrency models; we have an Nginx proxy that supports a massive amount of concurrent requests per instance, and we have internal services that support a handful. Those source applications will themselves be scaled very differently, with e.g. Nginx having very few instances that make lots of requests. I would need a DestinationRule for each source application pertaining to each destination service.
Even then, the http2MaxRequests setting limits the concurrent requests to the entire destination service, not to each individual instance (endpoint). This fundamentally breaks the notion of autoscaling, the entire point of which is to vary the capacity as needed.
What I would love to see is a way to enforce at the source side a destination-instance concurrency limit; that is, to tie the limit not to the entire Envoy cluster, but to each Envoy endpoint instead. That would make it actually useful for me, as I could then base the configuration on the concurrency supported by each instance of my service, which is a constant factor.

Generating SCOM alerts for web servers

I have 4-7 sharepoint servers. We have a scom alert already implemented to generate alert if the server is down. But we want to implement scom alert if the website is down.
Can we genearate alert using scom by using ping functionality?
My idea is, we ping the server continuosly and when the website is unresponsive for some time we get alert saying that the website is unresposive.
Can this be implemented? And how much effort is needed? And do we need any other services to be implemented?
Any help would be appreciated.
Ravi,
Forgive me for the post being more philosophy and less answer.
For better or worse, Microsoft has resisted implemented a simple ping monitor in SCOM. Their is a solid reason for this. It would be overly leveraged by folks that don't know any better. The result of which would reflect poorly on the quality of SCOM as a monitoring tool. What I mean by that is that a ping monitor is a terrible idea as it doesn't tell the poor soul that was awoken at 2am much of anything beyond the highest level notion that something is wrong.
If you have 5 minutes to sit in front of the SCOM console to create a ping alert then you would serve your support teams much better if you spent those same 5 minutes creating a Web Application Availability monitor. The reason for this is that the Web App Avail monitor will actually look at the response to ensure that it is logical and successful.
Here is the documentation to create a Web Application Availability Monitor. It looks difficult only until your first implementation. It really is a snap. https://technet.microsoft.com/en-us/library/hh881882(v=sc.12).aspx
Consider that if you had a ping monitor and someone accidentally deleted your index.html file, your ping will happily chug along without telling anyone. Same with a bad code update. Heck, you could even stop your web application server and ping is still going to respond.
Conversely, If you had a Web App Avail monitor pointed at each of the nodes in a load balanced web farm and your load balancer failed, all of your Web monitors will continue to post as healthy while your monitor looking at the load balancer will start to fail. A quick glace at the console will tell your support team that indeed the issue is not with the web servers themselves.
It is a good philosophy to implement your monitors in a way that they testing the target as completely as possible and in the most isolated way possible. You would not want to point a Web App Avail monitor at a load balancer as you would not necessarily know which endpoint did not respond to SCOM to trigger the alert. Some folks go to great lengths to work around this by implementing health check pages that respond with there hostnames. This is usually not necessary, simply create a monitor against each individual node. You are going to want to monitor your load balancer directly so that you know it is up as well.
On another note, there already is a SharePoint management pack (actually one for each version of SharePoint) that you can download from MS for free. This management pack will automatically discovery and monitor all of the components of SharePoint in your infrastructure. It works quite well but if you are new to SCOM then the volume of data and alerts that it creates can be a bit overwhelming at first.
SharePoint 2016 (there is one for each version) management pack: https://blogs.technet.microsoft.com/wbaer/2015/09/08/system-center-operations-management-pack-for-sharepoint-server-2016-it-preview/
There is also a third party management pack that allows you to simply create ping monitors. People REALLY want this. I respectfully will tell you that that they are doing more harm then good in the majority of implementations that use this. But at the end of the day sometimes you just want something that works and you understand so here it is:
Ping management pack: https://www.opslogix.com/ping-management-pack/

What is the cause of websocket latency spikes?

I am running a server (that uses tornado python) on a single AWS instance and I am running into spikes in websocket latency.
Profiling the round trip time from when a websocket message is sent to the client, which then immediately sends an ack message back to the server, to when the server receives the ack message yields an average of <.1 second, however I note sometimes it goes up to 3 seconds. Note: there are no spikes when running the server locally.
What could be the cause or fix for this? I looked at the CPU usage and it only goes up to 40% max. The spikes are not correlated with heavy traffic (2 or 3 clients usually) and the client's internet seems fine. I find it hard to believe the instance is going beyond capacity with such low usage.
The fact that the spike is 3 seconds is actually telling you a lot more than you may suspect, about the nature of the problem.
It's packet loss.
TCP, as you likely know, is said to provide "reliable" transport, guaranteeing that payload sent is received by the far end in the order in which it was sent, because TCP reassembles things in the correct order before delivering the payload. One significant way in which this is accomplished is by the automatic retransmission of packets that are considered to have been lost.
You'll never guess the default initial timer value for retransmissions of lost packets. Or, perhaps, now, you will.
It's 3 seconds in many, if not most, implementations, based on standards established several years ago in a time when the bandwidth and latency of today's transmission links were unheard of, perhaps unimagined.
You won't see evidence of the retransmission at at the websocket server or the client software, because TCP shields the higher layers from knowing that it occurs... but 3 seconds is a dead giveaway that this is exactly the problem.
You'll see the retransmissions of the traffic occurring if you observe the network traffic with a packet sniffer, though that will only serve to confirm that this is the issue.
It could be loss from server to client, or loss from client to server. The latter is generally more likely, since clients often have a lower amount of available upstream bandwidth... but the directionality of the packet loss doesn't clearly indicate the physical location where it is occurring. Unless your client keeps track of local time, so that request and response initiation times can be correlated, you don't know whether the delay is in the message, or in the acknowledgement.
Under relatively light load, it seems unlikely that the problem is on your instance or in the AWS network on your side, and you obviously can't connect a sniffer to arbitrary points on the Internet to pinpoint the problem.
Given a case like this, it may be easier -- and surprisingly feasible -- to prove where the problem isn't, rather than where it is.
One technique for this would be to create a deliberate detour for the traffic through different equipment located elsewhere -- such as a different AWS region or another cloud provider.
First, of course, you'll want to learn to spot these retransmissions using wireshark.
Then, configure a proxy server at a different location, using a simple TCP connection proxy -- such as HAProxy, or even a simple tool like redir or socat.
Such a configuration will listen for connections from clients, and when one is established, will create a new TCP connection to the destination (your websocket server) but -- importantly -- they only tie the two connections together at the payload level -- not the TCP level, and of course nothing lower -- so retransmissions will only be seen on the wire at all between this intermediate server and the end of the connection with the packet loss problem. The other end will show no evidence of the retransmissions -- just data arriving later than expected.
For this test to be meaningful, the proxy needs to be located away from the server and the client, and with no meaningful common infrastructure -- hence the suggestion of placing it in a different AWS region. A different availability zone in the same region may share common Internet infrastructure at some level, so that's not far enough away for this purpose.
If client <--> proxy <--> server shows TCP retransmissions on the path between proxy and server, and not between client and proxy, the problem really is likely to be in your server, its hardware, network, or Internet connection, and you'll have to proceed accordingly.
Conversely (and, I would suggest, more likely) if the path between proxy and server is free of retransmissions but the path between client and proxy is still dirty, you have eliminated the server and its infrastructure as the source of the problem. How to proceed is up to you, but at this point you do know what the problem... isn't.
Two other possibilities:
Both sides remain dirty, which is the least likely scenario. Rule 1 of troubleshooting is to assume initially that you only have one problem, not two.
Or, both sides are suddenly and unexectedly clean when traffic uses this setup, which suggests thay your test setup has routed around a broken piece of the Internet. You've "solved" it but have no idea how. We'll also hope this isn't the outcome, but given the vagaries of the global Internet, it's not unthinkable that your stack may include components like this, with geolocation-DNS-based selection of an intermediate endpoint. This seems like a convolution but does have its place.
Such a tactic is actually part of the logic behind the S3 transfer acceleration feature. The content is not any closer to the end user, but the TCP connection from the browser is being terminated on equipment in the AWS edge network, at a location that is often nearer to the browser, and a second TCP connection back to the bucket is established, with the payload connected together... and, yes, it's faster and more stable, with the significance of the change becoming more notable as distance and connection quality vary.

Why are RESTful Applications easier to scale

I always read that one reason to chose a RESTful architecture is (among others) better scalability for Webapplications with a high load.
Why is that? One reason I can think of is that because of the defined resources which are the same for every client, caching is made easier. After the first request, subsequent requests are served from a memcached instance which also scales well horizontally.
But couldn't you also accomplish this with a traditional approach where actions are encoded in the url, e.g. (booking.php/userid=123&travelid=456&foobar=789).
A part of REST is indeed the URL part (it's the R in REST) but the S is more important for scaling: state.
The server end of REST is stateless, which means that the server doesn't have to store anything across requests. This means that there doesn't have to be (much) communication between servers, making it horizontally scalable.
Of course, there's a small bonus in the R (representational) in that a load balancer can easily route the request to the right server if you have nice URLs, and GET could go to a slave while POSTs go to masters.
I think what Tom said is very accurate, however another problem with scalability is the barrier to change upon scaling. So, one of the biggest tenants of REST as it was intended is HyperMedia. Basically, the server will own the paths and pass them to the client at runtime. This allows you to change your code without breaking existing clients. However, you will find most implementations of REST to simply be RPC hiding behind the guise of REST...which is not scalable.
"Scalable" or "web scale" is one of the most abused terms when it comes to the web, the cloud and REST, and mainly used to convince management to get their support for moving their development team on board the REST train.
It is a buzzword that holds no value. If you search the web for "REST scalability" you'll find a lot of people parroting each other without any concrete evidence.
A REST service is exactly equally scalable as a service exposed over a SOAP interface. Both are just HTTP interfaces to an application service. How well this service actually scales depends entirely on how this service was actually implemented. It's possible to write a service that cannot scale as all in both REST and SOAP.
Yes, you can do things with SOAP that makes it scale worse, like rely on state and sessions. SOAP out of the box does not do this. This requires you to use a smarter load balancer, which you want anyway if you're really concerned with whatever form of scaling.
One thing that REST allows that SOAP doesn't, and that some other answers here address, is caching cacheable responses through an HTTP caching proxy or at the client side. This may make a REST service somewhat more lightly loaded than a SOAP service when a lot of operations' responses are cacheable. All this means is that fewer requests end up in your service.
The main reason behind saying a rest application is scalable is, Its built upon a HTTP protocol. Because HTTP is stateless. Stateless means it wont share anything between other request. So any request can go to any Server in a load balanced cluster. There is nothing forcing this user request go to this server. We can overcome this by using token.
Because of this statelessness,All REST application are very easy to scale. But if you want get high throughput(number of request capable in one second) in each server, then you should optimize blocking things from the application. Follow the following tips
Make each REST resource is a small entity. Don't read data from join of many tables.
Read data from near by databases
Use caches (Redis) instead of databases(You can save DISK I/O)
Always keep data sources as much as near by because these blocks will make server resources (CPU) ideal and it no other request can use that resource while it is ideal.
A reason (perhaps not the reason) is that RESTful services are sessionless. This means you can easily use a load balancer to direct requests to various web servers without having to replicate session state among all of your web servers or making sure all requests from a single session go to the same web server.

Web service provider routing

I am looking to implement a service (web/windows, .net) that maintains a list of available services and can provide an endpoint based on the nature or type of request. The requester can then pass the actual work request to the provided endpoint. The actual work requests can contain very large chunks (from 10MB up to and possibly exceeding a GB) of data.
WCF routing services sounds like a perfect fit, but turns out not to be because the it requires the actual work request to pass through it, creating a bottleneck at the routing service (the whole point is to get a system to be able to scale out). If I had smaller messages, WCF routing would be a no brainer.
Is there anything out there that fits the bill? Preferably .NET/windows based?
Do you mean because the requests block for work?
Do could use OneWay OperationContract to create async services so as to not block the request pool.
[ServiceContract]
interface IMyContract
{
[OperationContract(IsOneWay = true)]
void DoWork()
}
Update
I think understand your question better now, you are looking to distribute load to different servers to avoid request bottle necks due to heavy traffic load (preferably distributed based on content).
I'd say that MVC Routing is indeed ideal for this. One of the features that you can leverage is the fall over functionality. You can actually define multiple backup endpoints, and in the case where one fails, it will automatically move over to the next. There's a good introduction to how this works here.
There's also a good article here that talks about load balancing with WCF using the same principles. It provides 2 solutions for a round robin filter implementation that allows you to load balance the service requests (even though at the begin he says his general answer to whether it supports load balancing is no for implementation reasons).
If you are worried about all requests routing via the one server and still becoming a bottle neck, then think of web load balancers. It's the same scenario. Sitting in the middle forwarding packets doesn't require much work, and they have no problem handling huge volumes of traffic. I don't think this is an issue IMO.