WSO2 APIM Clustering Configuration

WSO2 APIM Clustering Configuration - wso2

I am using WSO2 APIM 1.10.0 on a single server deployment and would like to move to a clustering one. Looking at this documentation I could found a lot of information, howevre something is boring me; do I really have to always do all of it?
I mean, I don't want to split all my workers in multiple instances, all I want is configure two full setup configurations (key manager + publisher + store + gateway), each one on its own host and make sure I can put a load balance in front of it.
Thre requiremenst are simple: I would like to share the load on both of them, and guarantee a better availability in case of one of the hosts goes down. Is it a MUST break down the whole installation on both nodes so I have to start each component independently with offset ports configured?
I coud see that on version 2.0.0 a lot have been simplified, any way to reach the same on 1.10.0 one?
Regards

Splitting into profiles is not mandatory. This is designed in this way to scale API Manager based on the TPS. If you have a low TPS count and prefer to have 2 node HA setup, you can do the following.
Cluster the two nodes using wka, aws, etc.
Use dep-sync to share API artifacts between two nodes.
Use one node as the Publisher. You need to handle the publisher node traffic using single node. This is to avoid getting SVN conflicts.
You can serve API requests from both nodes.

You do not want to always use the same deployment pattern mentioned in the docuemtnation that you have pointed there. There are various Other deployment patterns that you can use according to the scalability and the requirement of yours.
Please refer the following documentation [1] for different deployment patterns you can use for WSO2 API Manager and [2] for more information on worker Manager separation and Load balancing.
[1] https://docs.wso2.com/display/CLUSTER44x/API+Manager+Deployment+Patterns
[2] https://docs.wso2.com/display/CLUSTER44x/Separating+the+Worker+and+Manager+Nodes

Related

Ideal HA setup for WSO2 API Manager

I have been trying to setup an active active setup for WSO2 API Manager. This has been elaborated here:
https://docs.wso2.com/display/AM210/Configuring+an+Active-Active+Deployment#Linux-Mac
Few observations:
The setup looks to be on two different nodes, with all components deployed on each node.
The setup indicates that Publisher should be pointed to one of the two nodes for both the nodes. If that is the case, lets say that node-1 (publisher) node goes down, how will second active instance help?
It recommends using NFS for content synchronization. NFS becomes a single point of failure in that case. Why is content synchronization needed though? Is it only for advanced siddhi query based throttling policies?
Finally, if I do two independent, all-components-in-one setup of API Managers with shared database and content synchronization using rsync/unison; but no throttling data publishing, what are the downsides?
Is this kind of setup fit for Active - Passive?
Thanks

If you use rsync or any deployment synchronization mechanism which is one way, this becomes a single point of failure. Most of the use cases, API publishing happens at the development time and this is actually a limitation.
That's why we can use NFS or file share mechanism. You can point the localhost and write the Synapse file to the file system. Then it shares among the two nodes. When you publish an API, Synapse artifact is created and deployed in the gateway node. In your case, one of the nodes.You can find a sample file in APIM_HOME/repository/deployment/server/synapse/default/api location.
If you disable throttling data publishing, i.e advanced throttling, your APIs can be accessed without any limitation. Simply there is no limit. But burst control and backend throttling will apply.
Yes, this fits for A-P. You can control A-A or A-P from the load balancer.

Minimum clustering of API Manager with Internal Store

I am trying to set up a clustered environment for WSO2 API Manager. In the environment I need there is no need for an external store. I looking to start with the least amount of nodes and JVMs that yet is scalable with growing number of API requests.
Having looked at the WSO2 documentation
Clustering WSO2 API Manager and specifically the "Store and Publisher components in a single server node" model.
Some questions on this deployment model
Where is the Gateway Manager deployed?
I understand the publisher and store are on the same server node. Can they be run in the same JVM? If so would you use the default profile that also starts up KM and Gateway or soemthing else?
(Appologies but I can't post the image due to my low reputation value. I would have thought the image of the model would have helped)

Yes - API Store and Publisher will be running in the same JVM. As there is no profile for Store & Publisher (see [1] for available profiles), we need to start API Manager in the default profile. And yes it will start KM & Gateway components as well. But you can block (not expose) gateway ports. And regarding gateway manager, I guess one gateway node can act as both manager and worker in this deployment pattern.
[1] https://docs.wso2.com/display/AM180/Product+Profiles

As per the design publisher is a subset of store. So, If you start with profile api-store you will eventually get publisher as well. In this case you can start the server with following option.
-Dprofile=api-store

Microservices service registry registration and discovery

Little domain presentation
I m actually having two microservices :
User - managing CRUD on users
Billings - managing CRUD on billings, with a "reference" on a user concerned by the billing
Explanation
I need, when a billing is called in a HTTP request, to send the fully billing object with the user loaded. In that case, and in this specifical case, I really need this.
In a first time, I looked around, and it seems that it was a good idea to use message queuing, for asynchronicity, and so the billing service can send on a queue :
"who's the user with the id 123456 ? I need to load it"
So my two services could exchange, without really knowing each other, or without knowing the "location" of each other.
Problems
My first question is, what is the aim of using a service registry in that case ? The message queuing is able to give us the information without knowing anything at all concerning the user service location no ?
When do we need to use a service registration :
In the case of Aggregator Pattern, with RESTFul API, we can navigate through hateoas links. In the case of Proxy pattern maybe ? When the microservices are interfaced by another service ?
Admitting now, that we use proxy pattern, with a "frontal service". In this case, it's okay for me to use a service registration. But it means that the front send service know the name of the userService and the billing service in the service registration ? Example :
Service User registers as "UserServiceOfHell:http://80.80.80.80/v1/"
on ZooKeeper
Service Billing registers as "BillingService:http://90.90.90.90/v4.3/"
The front end service needs to send some requests to the user and billing service, it implies that it needs to know that the user service is "UserServiceOfHell". Is this defined at the beginning of the project ?
Last question, can we use multiple microservices patterns in one microservices architecture or is this a bad practice ?
NB : Everything I ask is based on http://blog.arungupta.me/microservice-design-patterns/

A lot of good questions!
First of all, I want to answer your last question - multiple patterns are ok when you know what you're doing. It's fine to mix asynchronous queues, HTTP calls and even binary RPC - it depends on consistency, availability and performance requirements. Sometimes you can see a good fit for simple PubSub and sometimes you need to have distributed lock - microservices are different.
Your example is simple: two microservices need to exchange some information. You chose asynchronous queue - fine, in this case they don't really need to know about each other. Queues don't expect any discovery between consumers.
But we need service discovery in other cases! For example, backing services: databases, caches and actually queues as well. Without service discovery you probably hardcoded the URL to your queue, but if it goes down you have nothing. You need to have high availability - cluster of nodes replicating your queue, for example. When you add a new node or existing node crashed - you should not change anything, service discovery tool should understand that and update the registry.
Consul is a perfect modern service discovery tool, you can just use custom DNS name for accessing your backing services and Consul will perform constant health checks and keep your cluster healthy.
The same rule can be applied to microservices - when you have a cluster running service A and you need to access it from service B without any queues (for example, for HTTP call) you have to use service discovery to be sure that endpoint you use will bring you to the healthy node. So it's a perfect fit for Aggregator or Proxy patterns from the article you mentioned.
Probably the most confusion is caused by the fact that you see "hardcoded" URLs in Zookeeper. And you think that you need to manage that manually. Modern tools like Consul or etcd allows you to avoid that headache and just rely on them. It's actually also achievable with Zookeeper, but it'll require more time and resources to have similar setup.
PS: please remember about the most important rule in microservices - http://martinfowler.com/bliki/MonolithFirst.html

Building Erlang applications for the cloud

I'm working on a socket server that'll be deployed to AWS and so far we have the basic OTP application set up following a structure similarly to the sample project in Erlang in Practice, but we wanted to avoid having a global message router because that's not going to scale well.
Having looked through the OTP design guide on Distributed Applications and the corresponding chapters (Distribunomicon and Distributed OTP) in Learn You Some Erlang it seems the built-in distributed application mechanism is geared towards on-premise solutions where you have known hostnames and IPs and the cluster configuration is determined ahead of time, whereas in our intended setup the application will need to scale dynamically up and down and the IP addresses of the nodes will be random.
Sorry that's a bit of a long-winded build up, my question is whether there are design guidelines for distributed Erlang applications that are deployed to the cloud and need to deal with all the dynamic scaling?
Thanks,

There are a few possible approaches:
In Erlang and OTP in Action, one method presented is to use one or two central nodes with known domains or IPs, and have all the other nodes connect to this one to discover each other
Applications like https://github.com/heroku/redgrid/tree/logplex require having a central redis node where all Erlang nodes register themselves instead, and do membership management
Third party services like Zookeeper and whatnot to do something similar
Whatever else people may recommend
Note that unless you're going to need to protect your communication, either by switching the distribution protocol to use SSL, or by using AWS security groups and whatnot to restrict who can access your network.

I'm just learning Erlang so can't offer any practical advice of my own but it sounds like your situation might require a "Resource Discovery" type of approach as i've read about in Erlang & OTP in Action.
Erlware also have an application to help with this: https://github.com/erlware/resource_discovery

Other stupid answers in addition to Fred's smart answers include:
Using Route53 and targetting a name instead of an IP
Keeping an IP address in AWS KMS or AWS Secrets Manager, and connecting to that (nice thing about this is it's updatable without a rebuild)
Environment variables: scourge or necessary evil?
Stuffing it in a text file in an obscured, password protected s3 bucket
VPNs
Hardcoding and updating the build in CI/CD
I mostly do #2

WSO2 API Manager Clustering configuration

I'm trying to install and configure a highly availability setup for the WSO2 API Manager. I've been reading through this document: http://docs.wso2.org/wiki/display/Cluster/Clustering+API+Manager and in there it explains to break up the 4 components of the application into separate folders and that these 4 components can run on a single server. I'm not sure why this is needed. All I really want to do is take 2 servers, install the full application on both of them (without breaking the application up into 4 different pieces) and cluster them together between two servers with an Elastic Load Balancer in front of them.
What is the purpose of splitting up the multiple components on the same server if they all run out of a single installation? I'm looking for the simplest way to provide fail over capability to this application if one server goes down. Any insight into their methodology would be greatly appreciated.
Thanks.

The article you've linked describes on distributing different components of API Manager. If you look at the very end of that article there's a link to clustering configuration doc. In a production deployment usually it is encouraged that the 4 components are run on different nodes rather than having everything in a node and having multiple such nodes. That's why it goes on explaining breaking it down to separate components. The official AM doc below has a page on different deployment patterns.
You can go through the following articles to get a better understanding on clustering API Manager.
http://docs.wso2.org/wiki/display/AM140/Clustered+Deployment
http://sanjeewamalalgoda.blogspot.com/2012/09/how-do-clustering-and-enable-replicate.html

My 2cts:
The documentation mentioned in the remarks, explains how WSO2 sees the world of clustering. Spread the different functionality over different JVM's. This sounds logical from architectural point of view. A dis-advantages is that the diffent applications need to me administrated as well by operations. This makes the technical architecture rather complex.
In our situation, we defined 2 different servers with extra CPU and memory, on these servers we have installed the full WSO2 API Manager and defined the cluster configuration. Everything provisioned via Puppet.
Just a straightforward install, all data-source pointing to one schema in an Oracle database.
And...it is working; Our Developers happy, Operations happy, Architect department happy

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js