I'm investigating of the new WSO2CEP and WSO2 STREAM PROCESSOR products and I would like some information:
I would know if it can manage the scalability in a configuration where I have multiple instances installed in cluster on multiple servers, and each instance share the same information (rules, events, streams, etc ...)
Is it possible to aggregate the events across the servers? For example given the rule "select * from my_stream.window(10 minutes) having count = 2" and server 1 receives the first event and server 2 the second, validating the condition and firing an associated action only one time (not for each server/instance)
Is it possible to aggregate the events across the servers using pattern
where condition?
I could answer it only for WSO2SP
I would know if it can manage the scalability in a configuration where I have multiple instances installed in cluster on multiple servers, and each instance share the same information (rules, events, streams, etc ...)
indeed, yo ucan have a manager(s) node and all worker nodes will synchronize the apps (rules, streams, ..) from the manager node
For other questions - the state (aggregated values) can be persisted across the cluster, so it should work
Related
I have a fleet of multiple worker hosts polling for the following tasks of my SWF:
Activity 1: Perform some business logic to create a large file.
Activity 2: Wait for some time (a human approval, timer, etc.)
Activity 3: Transmit the file using some protocol (governed by input parameters of the SWF).
Activity 4: Clean-up the local-generated file.
The file generated in Step-1 needs to be used again in Step-3, and then eventually discarded at the end of the workflow.
The system would work fine if there is only 1 host polling for all tasks. However, when I have multiple workers, I cannot seem to ensure that task-1 and task-3 would end up on the same host.
I would like to avoid doing the following:
Uploading the file to a central repository (say S3) on step-1 and download it in step-3; or
Having a single activity for the task-1 and task-3.
I have the following questions:
Is it possible to control that subsequent activities be run on the same host as opposed to going to any random host in my fleet?
What are specific guidelines/best practices on re-using resources generated in different activities in a workflow?
Is it possible to control that subsequent activities be run on the
same host as opposed to going to any random host in my fleet?
Yes, absolutely. The basic idea is that SWF task lists (queues used to deliver activity tasks) are dynamic. So each host can have its own task list and workflow can specify specific task list name when calling an activity. See fileprocessing sample which executes download activity on any host from the pool, then converts the file and uploads the result on the same host as the first one.
List item What are specific guidelines/best practices on re-using resources generated in different activities in a workflow?
The approach of caching result in the worker process memory or on the local disk is considered the best practice. Sometimes using external data store and getting it each times also makes sense.
I have the following use case and I am not sure if the akka toolkit provide this out of the box:
I have a number of nodes (instance/machine) that can run a finite number of long running task in the background and cannot accept more work while at max capacity.
Each instance can only process 50 tasks.
All instances are behind a load balancer.
Each task can respond to messages from the client who initiated the task, since the client sends the messages via the load balancer the instances need to route it to the correct instance that handles the task.
I have tried initially cluster sharding, but there doesn't seem to be a way to cap the maximum number of shard regions/actors per node (= #tasks).
Then I tried it with a cluster aware router, which acts as a guard for accepting or rejecting work. This seems to work reasonable well, one problem is that once it reaches capacity I need to remove it as a routee and add it back once it has capacity again.
Is there something out of the box that supports this use case or should I carry on with the routing option and if so how can I achieve this?
I'll update the description if you have further questions or something is unclear.
Your scenario sounds like a good fit for the work pulling pattern. The gist of this pattern is:
A master actor coordinates units of work among a number of worker actors.
Workers register themselves to the master, meaning that workers can be added or removed dynamically.
When the master receives work to be done, the master notifies the workers that work is available. Workers pull units of work when they're ready, do what needs to be done with their respective units of work, then ask the master for more work when they're finished.
To learn more about this pattern, read the following (the first two links are listed in the Akka documentation):
The original post (by Derek Wyatt): http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
A follow-on post (by Michael Pollmeier): http://www.michaelpollmeier.com/akka-work-pulling-pattern
An application of the pattern in a clustered environment with a cluster-aware router (by Ryan Tanner): https://www.conspire.com/blog/2013/10/akka-at-conspire-part-5-the-importance-of/
In our beta stack, we have a single EC2 instance listening to a tasklist. Sometimes another developer in the team start's his own instance for testing purposes and forget to turn it off. This creates problems for the next developer who tries to start an activity only for it to be taken up by the last developer's machine. Is there a way to get the hostnames of all activity workers listening to a particular tasklist ?
It is not currently possible to get a list of pollers waiting on a task list through the SWF API. The workaround is to look at the identity field on the ActivityExecutionStarted event after it was picked up by the wrong worker.
One way to avoid this issue is always use a task list name that is specific to a machine or developer to avoid collisions.
We have deployed API-M 2.1 in a distributed way (each component, GW, TM, KM are running in their own Docker image) on top on DC/OS 1.9 ( Mesos ).
We have issues to get the gateway to enforce throttling policies (should it be subscription tiers or app-level policies). Here is what we have managed to define so far:
The Traffic Manager itself does it job : it receives the event streams, analyzes them on the fly and pushes an event onto the JMS topic throttledata
The Gateway reads the message properly.
So basically we have discarded a communication issue.
However we found two potential issues:
In the event which is pushed to the TM component, the value of the appTenant is null (instead of carbon.super)- We have a single tenant defined.
When the gateway receives the throttling message, it decides to let the message go thinking the "stopOnQuotaReach" is set to false, when it is set to true (we checked the value in the database).
Digging into the source code, we related those two issues to a single source: the value for both values above are read from the authContext and apparently incorrectly set. We are stuck and running out of ideas of things to try and would need some pointers to what could be a potential source of the problem and things to check.
Can somebody help please ?
Thanks- Isabelle.
Is there two TM with HA enabled available in the system?
If the TM is HA enabled, how gateways publish data to TM. Is it load balanced data publishing or failover data publishing to the TMs?
Did you follow below articles to configure the environment with respect to your deployment?
http://wso2.com/library/articles/2016/10/article-scalable-traffic-manager-deployment-patterns-for-wso2-api-manager-part-1/
http://wso2.com/library/articles/2016/10/article-scalable-traffic-manager-deployment-patterns-for-wso2-api-manager-part-2/
Is throttling completely not working in your environment?
Have you noticed any JMS connection related logs in gateways nodes?
In these tests, we have disabled HA to avoid possible complications. Neither subscription nor app throttling policies are working, both because parameters that should have values have not the adequate value (appTenant, stopOnQuotaReach).
Our scenario is far more basic. If we go with one instance of each component, it fails as Isabelle described. And the only thing we know is that both parameters come from the Authentication Context.
Thank you!
I have N nodes (i.e. distinct JREs) in my infrastructure running Akka (not clustered yet)
Nodes have no particular "role", but they are just processors of data. The "processors" of this data will be Actors. All sorts of non-Akka/Actor (other java code) (callers) can invoke specific types of processors by creating messages them data to work on. Eventually they need the result back.
A "processor" Actor is pretty simply and supports a method like "process(data)", they are stateless, they mutate and send data to an external system. These processors can vary in execution time so they are a good fit for wrapping up in an Actor.
There are numerous different types of these "processors" and the configuration for each unique one is stored in a database. Each node in my system, when it starts up, needs to create a router Actor that fronts N instances of each of these unique processor Actor types. I cannnot statically define/name/create these Actors hardwired in code, or akka configuration.
It is important to note that the configuration for any Actor processor can be changed in the database at anytime and periodically the creator of the routers for these Actors needs to terminate and recreate them dynamically based on the new configuration.
A key point is that some of these "processors" can only have a very limited # of Actor instances across all of my nodes. I.E processorType-A can have an unlimited number of instances, while processorType-B can only have 2 instances running across the entire cluster. Hence callers on NODE1 who want to invoke processorType-B would need to have their message routed to NODE2, because that node is the only node running processorType-B actor instances.
With that context in mind here is my question that I'm looking for some design help with:
For points 1, 2, 3, 4 above, I have a good understanding of and implementation for
For points 5 and 6 however I am not sure how to properly implement this with Akka clustering given that my "nodes" are not aware of each other AND they each run the same code to dynamically create these router actors based on that database configuration)
Issues that come to mind are:
How do I properly deal with the "names" of these router Actors across the cluster? I.E for "processorType-A", which can have an unlimited number of Actor instances. Each node would locally have these instances available, yet if they are all terminated on a single node, I would still want messages for their "processor type" to be routed on to another node that still has viable instances available.
How do I deal with enforcing/coordinating the "processor" instance limitation across the cluster (i.e. "processorType-B" can only have 2 instances globally) etc. While processorType-A can have a much higher number. Its like nodes need to have some way to check with each other as to who has created these instances across the cluster? I'm not sure if Akka has a facility to do this on its own?
ClusterRouterPool? w/ ClusterRouterPoolSettings?
Any thoughts and/or design tip/ideas are much appreciated! Thanks