Canary (gradual) rollouts of VirtualService changes - istio

We have a VirtualService per microservice in our mesh - services A, B, C each have a VirtualService that describe how to route traffic towards them.
When performing changes to those VirtualServices we'd like them to be applied gradually - apply the new VS definition e.g. to 1% of traffic, then 10%, etc.
How can we achieve that?
E.g. we'd like to change the timeout or retries settings under https://istio.io/latest/docs/reference/config/networking/virtual-service/#HTTPRoute and canary-rollout them.

Related

is there any way to overrwrite the non-default listener rule of ALB while doing blue-green canary deployment using codedeploy with fargate?

I am using ECS blue-green canary deployment and deploying with canary configuration as "shift 10% traffic for 5 minutes and then move all to 100%". This configuration is working fine. But I have some specific requirement as follows:
Usecase: With default rule on port 80, I have also added two more listener rule on port 80 (say rule-1, rule-2). rule-1 and rule-2 have multiple domains setup with it so that traffic from those domains can be routed to different target groups as we want at any particular deployment phase.
So before deployment the state of port listener looks like this (assuming tg1 is blue and tg2 is green):
default rule: 100% tg1
rule-1: 100% tg1
rule-2: 100% tg1
AfterAllowTestTraffic the state of port listener looks like this (this is done by codedeploy internally)
default rule: 90% tg1, 10% tg2
rule-1 rule: 90% tg1, 10% tg2
rule-2 rule: 90% tg1, 10% tg2
But my requirement is to use the hook "AfterAllowTestTraffic" or any other hooks so that I can overwrite the above traffic rule as per below using lambda function with AWS-SDK elbv2.modifyRule().
default rule: 90% tg1, 10% tg2
rule-1 rule: 100% tg1(blue)
rule-2 rule: 100% tg2(green) (This will help me to route some specific domain to the green instances till the traffic wait time(2 days)).
I tried above solution but the codedeploy somehow waits till "AfterAllowTestTraffic" hook to be completed and then overwrite the listeners again with 90% and 10%.
I tried this with other hooks like "BeforeAllowTraffic" but still somehow codedeploy overwrites the listeners again with 90% and 10%.
Ideally the listener should be shifted to 90% and 10% first and then it should go for executing the lambda attached with hook "BeforeAllowTraffic" but somehow the first traffic shift happens after "BeforeAllowTraffic" or after the hook which overwrites listener if you try to overwrite the listener rules using hook with canary deployment, why?

What is the knative's "mesh" gateway

I see that for every knative service, 2 VirtualService objects are created namely ksvc-ingress which has knative-serving/knative-ingress-gateway & knative-serving/knative-local-gateway gateways configured and ksvc-mesh which has mesh as the gateway.
I can see the knative-serving/* gateways using kubectl but I am unable to find the mesh gateway object in any namespace. I would like to understand if mesh here denotes some special object or is it an istio keyword representing something else?
The mesh name is a keyword, as you guessed. That keyword represents the East-West traffic between Pods in the Kubernetes cluster, as managed by the Istio sidecar. You can think of those VirtualServices as being programmed onto each sidecar to do the routing and traffic splitting next to the request sender, rather than needing to route to a central service / gateway.
As you noticed, knative uses istio as a service mesh.
In the Istio context mesh is not an object (or resource) like, for example, a Service. Istio About page explain what Service Mesh is:
A service mesh is a dedicated infrastructure layer that you can add to your applications. It allows you to transparently add capabilities like observability, traffic management, and security, without adding them to your own code. The term “service mesh” describes both the type of software you use to implement this pattern, and the security or network domain that is created when you use that software.
So mesh is a term that encapsulate all Istio objects (istio-proxy containers, Virtual Services, Ingress Gateways etc.), that work together to allow for traffic management inside cluster.
A Gateway is a load balancer operating at the edge of the mesh receiving incoming or outgoing HTTP/TCP connections.

Application HA in k8s

im trying to make my app HA, so I created the following
3 replica
PDB
liveness and readiness probes and
pod anti affinity
is there anything else which I miss?
this is the antiaffinty config
...
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: ten
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: tan
topologyKey: topology.kubernetes.io/zone
Is there anything I missing?
Highly Available.. I think these are the parameters for an application to be HA..
Never launch your app directly from a Pod - it won’t survive a crash of a node. Even for single pod applications use ReplicaSet or Deployment objects, as they manage pods across whole cluster and maintain specified number of instances (even if it’s only one)
Use affinity configuration with custom rules to spread your pods based on your environments architecture. Workload are running in multiple instances spread across multiple nodes provides second level of resilience to the app
Define a livenessProbe for each container. Use proper method. Avoid ExecAction when your container can process HTTP requests. Remember to set proper initialDelaySeconds parameter to give your app some time to initialize (especially for ones based on JVM like Spring Boot - they are slow to start their HTTP endpoints)
You seemingly following all these points, so you should be good.
However If feasible I would recommend to try to deploy the apps on multiple clusters OR say deploy across multiple data centres and run in active-active mode. It can help adding more more nines to your availability.
Resource limit
You need to add the resource limit also in workloads it's a necessary thing otherwise cronjobs or other unnecessary workloads can may impact the business logic and workloads.
HPA - POD autoscaling
There is also some chance of all three POD get killed due to readiness & liveness while the workload under heavy traffic and the application won't be able to respond to readiness & liveness in this I would suggest you implement the HPA also at the place.
HA can be achieved by using multiple replicas, kubernetes provides this feature for HA only. Further service object in kubernetes helps load balancing the traffic to one of the available replicas based on liveliness and readiness probes, both of which are responsible for identifying the pod as healthy and ready to receive requests, resp.
please refer here https://kubernetes.io/docs/concepts/services-networking/service/ and https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

How to expose tcp service in Kubernetes only for certain ip addresses?

Nignx ingress provides a way to expose tcp or udp service: all you need is public NLB.
However this way tcp service will be exposed publicly: NLB does not support security groups or acl, also nginx-ingress does not have any way to filter traffic while proxying tcp or udp.
The only solution that comes to my mind is internal load balancer and separate non-k8s instance with haproxy or iptables, where I'll actually have restrictions based on source ip - and then forward/proxy requests to internal NLB.
Maybe there are another ways to solve this?
Do not use nginx-ingress for this. To get real IP inside nginx-ingress you have to set controller.service.externalTrafficPolicy: Local, which in its turn changes the way the nginx-ingress service is exposed - making it local to the nodes. See https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip. This in its turn causes your nginx-ingress LoadBalancer to have unhealthy hosts which will create noise in your monitoring (opposite to NodePort where every node exposes the same port and healthy). Unless you run nginx-ingress as a DaemonSet or use other hacks, e.g. limit which nodes are added as a backends (mind scheduling, scaling), or move nginx-ingress to a separate set of nodes/subnet - IMO each of these is a lot of headache for such a simple problem. More on this problem: https://elsesiy.com/blog/kubernetes-client-source-ip-dilemma
Use plain Service type: LoadBalancer (classic ELB) which supports:
Source ranges: https://aws.amazon.com/premiumsupport/knowledge-center/eks-cidr-ip-address-loadbalancer/
service.beta.kubernetes.io/aws-load-balancer-extra-security-groups annotation in case you want to manage the source ranges from the outside.
In this case your traffic going like World -> ELB -> NodePort -> Service -> Pod, without Ingress.
Yo can use the whitelist-source-range annotation for that. We've been using it successfully for a few use cases and it does the job well.
EDIT: I spoke too soon. Rereading your question and understanding your exact use case brought me to this issue, which clearly states these services cannot be whitelisted, and suggests solving this in the firewall level.

Vpn between two workers node

I have three nodes, the master and two workers inside my cluster. I want to know if it's possible with Istio to redirect all the traffic comming from one worker node, directly to the other worker node (but not the traffic of Kubernetes).
Thanks for the help
Warok
Edit
Apparently, it's possible to route the traffic of one specific user to a specific version https://istio.io/docs/tasks/traffic-management/request-routing/#route-based-on-user-identity. But the question is still open
Edit 2
Assume that my nodes name are node1 and node2, does the following yaml file is right?
apiVersion: networking.istio.io/v2alpha3
kind: VirtualService
metadata:
name: node1
...
spec:
hosts:
- nod1
tcp:
-match:
-port: 27017 #for now, i will just specify this port
- route:
- destination:
host: node2
I want to know if it's possible with Istio to redirect all the traffic comming from one worker node, directly to the other worker node (but not the traffic of Kubernetes).
Quick answer, No.
Istio is working as a sidecar container that is injected into a pod. You can read at What is Istio?
Istio lets you connect, secure, control, and observe services.
...
It is also a platform, including APIs that let it integrate into any logging platform, or telemetry or policy system. Istio’s diverse feature set lets you successfully, and efficiently, run a distributed microservice architecture, and provides a uniform way to secure, connect, and monitor microservices.
...
You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices
I also recommend reading What is Istio? The Kubernetes service mesh explained.
It's also important to know why would you want to redirect traffic from one node to the other.
Without knowing that I cannot advice any solutions.