I am using Amazon cloud server (AWS) to create Mule server nodes. Issue with AWS is it doesn't support multicasts, but MuleSoft requires all the nodes are in same network and multicasts enabled for clustering.
Amazon FAQ:
https://aws.amazon.com/vpc/faqs/
Q. Does Amazon VPC support multicast or broadcast?
Ans:No.
Mule cluster doesn't show proper heartbeat without multicasts enabled, mule_ee.log file should show as
Cluster OK
Members [2] {
Member [<IP-Node1>]:5701 this
Member [<IP-Node2>]:5701
}
but my cluster shows as:
Members [1] {
Member [<IP-Node1>]:5701 this
}
which is wrong according to MuleSoft standards. I created a sample Poll scheduler application and deployed in Mule cluster which runs in both nodes due to improper handling of Mule cluster.
But my organization needs AWS to continue with server configuration.
Question
1) is there any other approach instead of using Mule cluster, I can use both Mule server nodes and make it HA cluster configuration(Active-Active).
2) Is it possible to make one server up and running(active) and another one passive mode instead of Mule HA(ACtive-Active) mode?
3) CloudHub and AnypointMQ is deployed in AWS, how did MuleSoft handle multicasts issues with AWS?
According to Mulesoft support team, they don't advise managing Mule HA in AWS , it doesnt matter if we aree managing with ARM or MMC.
The Mule instances communicate with each other and guarantee HA as well as not processing a single request more than once but that does not work on AWS because latency may cause the instances to disconnect from one another. We need to have the servers on-prem to have HA model
Multicast and Unicast are just used for the nodes to be discoverable automatically and further more as explained in the documentation.
Mule cluster config
AWS know limitation: here
Related
I'd like to have my Google Cloud Run services privately communicate with one another over non-HTTP and/or without having to add bearer authentication in my code.
I'm aware of this documentation from Google which describes how you can do authenticated access between services, although it's obviously only for HTTP.
I think I have a general idea of what's necessary:
Create a custom VPC for my project
Enable the Serverless VPC Connector
What I'm not totally clear on is:
Is any of this necessary? Can Cloud Run services within the same project already see each other?
How do services address one another after this?
Do I gain the ability to use simpler by-convention DNS names? For example, could I have each service in Cloud Run manifest on my VPC as a single first level DNS name like apione and apitwo rather than a larger DNS name that I'd then have to hint in through my deployments?
If not, is there any kind of mechanism for services to discover names?
If I put my managed Cloud SQL postgres database on this network, can I control its DNS name?
Finally, are there any other gotchas I might want to be aware of? You can assume my use case is very simple, two or more long lived services on Cloud Run, doing non-HTTP TCP/UDP communications.
I also found a potentially related Google Cloud Run feature request that is worth upvoting if this isn't currently possible.
Cloud Run services are only reachable through HTTP request. you can't use other network protocol (SSH to log into instances for example, or TCP/UDP communication).
However, Cloud Run can initiate these kind of connection to external services (for instance Compute Engine instances deployed in your VPC, thanks to the serverless VPC Connector).
the serverless VPC connector allow you to make a bridge between the Google Cloud managed environment (where live the Cloud Run (and Cloud Functions/App Engine) instances) and the VPC of your project where you have your own instances (Compute Engine, GKE node pools,...)
Thus you can have a Cloud Run service that reach a Kubernetes pods on GKE through a TCP connection, if it's your requirement.
About service discovery, it's not yet the case but Google work actively on that and Ahmet (Google Cloud Dev Advocate on Cloud Run) has released recently a tool for that. But nothing really build in.
The challenge is to create a Kafka producer that connects to a Kafka cluster that lives within a Kubernetes cluster from outside that cluster. We have several RDBMS databases that sit on premise and we want to stream data directly to Kafka that lives in Kubernetes on AWS. We have tried a few things and deployed the Confluent Open Source Platform but nothing worked so far. Does anyone have a clear answer to this problem?
You might have a look at deploying Kafka Connect inside of Kubernetes. Since you want to replicate data from various RDMBS databases, you need to setup source connectors,
A source connector ingests entire databases and streams table updates
to Kafka topics. It can also collect metrics from all of your
application servers into Kafka topics, making the data available for
stream processing with low latency.
Depending on your source databases, you'd have to configure the corresponding connectors.
If you are not familiar with Kafka Connect, this article might be quite helpful as it explains the key concepts.
Kafka clients need to connect to specific node to produce or consume messages.
The kafka protocol can connect to any node to get metadata. Then the client connects to a specific node which has been elected as leader of the partition which the client wants to produce/consume.
Each kafka pod has to be individually accessible, so you need a L4 load balancer per pod. The advertised listener config can be set in the kafka config to advertise different IP/hostname for internal and external clients. Configure the ADVERTISED_LISTENERS EXTERNAL to use the load balancer, and the INTERNAL to use pod IP. The ports has to be different for internal and external.
Checkout https://strimzi.io/, https://bitnami.com/stack/kafka, https://github.com/confluentinc/cp-helm-charts
Update:
Was trying out installing kafka in k8s running in AWS EC2. Between
confluent-operator, bitnami-kafka and strimzi, only strimzi configured
EXTERNAL in the kafka settings to the load balancer.
bitnami-kafka used
headless service, which is not useful outside the k8s network.
Confluent-operator configures to node's IP which makes it accessible
outside k8s, but to those who can reach the EC2 instance via private
IP.
I am working on a project were lots of machines/sensors will be sending messages to Kafka/Nifi cluster directly. This machine/sensors will be pushing messages from public internet not from the corporate network. We are using a Hortonworks distribution on the AWS cloud.
My question is: what is the best architectural practice to setup Kafka /Nifi cluster for such use cases, I don't want to put my cluster in the public subnet in order to receive messages from the public internet.
Can you please help me with this?
Obviously you shouldn't expose your Kafka to the world. Therefore "sensor data directly to Kafka" is the wrong approach, IMO. At least, without using some SSL channel
You could allow a specific subnet of your external devices to reach the internal subnet, assuming you know that range, however I think your better option here is to use either Minifi or Streamsets SDC which are event collectors sitting on the sensors, which can encrypt traffic to an open Nifi or Streamsets cluster, which can then forward events to the internal Kafka cluster. You already have Nifi apparently, and therefore Minifi was built for this purpose
Another option could be the Kafka REST proxy, but you'll still need to setup authentication / security layers around it
Use AWS IoT to receive the devices communication, this option gives you a security layer and isolates you HDF sandbox from the internet.
AWS IoT Core provides mutual authentication and encryption at all points of connection, so that data is never exchanged between devices and AWS IoT Core without a proven identity.
Then import the information with a NiFi processor.
I am new to the Kafka and want to deploy Kafka Production cluster for IOT. We will be receiving messages from Raspberry Pi over the internet to our Kafka cluster which we will be hosting on AWS.
Now the concern, since we need to open the KAFKA PORT to the outer internet we are opening a way to system threat as it will compromise with the security by opening port to outer world.
Please let me know what can be done so that we can prevent malicious access using KAFKA port over the internet.
Pardon me if I am not clear with the question, do let me know if rephrasing of queation is needed.
Consider using a REST Proxy in front of your Kafka brokers (such as the one from Confluent). Then you can secure your Kafka cluster just as you would secure any REST API exposed to the public internet. This architecture is proven in production for several very large IoT use cases.
There are two ways that are most effective for Kafka Security.
Implement SSL Encryption for Kafka.
Authentication using SASL
You can follow this guide. http://kafka.apache.org/documentation.html#security_sasl
I want to make a cluster of Data Services Servers(DSS), and use an Enterprise Service Bus (ESB) as load balancer. In this deployment, what is the purpose of having a manager DSS in the cluster, and if there is a manager, is it a single point of failure?
These are the references which I used for load balancing and DSS clustering:
Dynamic load balancing between 3 nodes
How to install WSO2 Carbon cluster management feature?
The dynamic load balancing mechanism in WSO2 ESB, discovers the DSS members in an application group using a group communication framework and shares the load in runtime.
Load balancer is not bound or coupled to any cluster manager - it will simply distribute the load among nodes in applicationDomain.
So - in runtime - cluster manager doesn't create any single point of failure.
If you want you can setup a DSS cluster even without a cluster manager and distribute the load among the nodes via ESB.
The cluster manager - which is a component installed only to manage your cluster...
This is an extension to Prabath's answer.
DSS can be configured to work in a cluster. So that all DSS nodes act as members in a single cluster. This facilitates sharing session among each of the nodes.
Or else, you can have all DSS nodes running in isolation (using the same configuration), fronted by a load balancer (LB). Unlike the previous approach, this method does not support share sessions between DSS nodes. Thus only supports stateless services.
WSO2 ESB can act as a LB. But having a single instance of LB will make it a SPoF. And, LB can be configured to run in a cluster as well.
I don't know what's behind the decision of using an ESB instead of an ELB for LB, but it's up to you which one to use.
The manager is not a single point of failure, it's just a way to manage the entire cluster from a single management console (with limitations), and can be configured to be a worker at the same time.
Regarding the LB layer, you can use keepalived to avoid having a SPoF in the ESB acting as a LB, the same way it's done for WSO2 ELB's.
Take a look on that Failover for ELB with keepalived