WSO2 Stream Processor confusion - wso2

I have a little bit of confusion about the Stream Processor.
I've previously used the CEP and now I'm using the Stream Processor.
if I'm not mistaken, the Data Analytics Server, the CEP and the Machine Learner merged into the Stream Processor, is it true?
Because I found some inconsistencies, for example the SP can't publish directly in the dashboard, while CEP could.
So, my question is, all the feutures in the CEP and in the ML, are going to flow in the SP?

DAS, CEP and ML have not been completely merged into the Stream Processor.
In DAS, the real time analytics were handled by Siddhi and the batch analytics were done through Spark. However, in Stream Processor, only Siddhi acts as the core processor and Spark is not used.
Stream processor processes data in streaming manner through siddhi. In order to fulfill the requiremnts for batch analytics, incremental processing[1] which has been introduced to Siddhi 4.0.0 can be used.
Also ML support is provided through ml extentions written for Siddhi 4.0.0.
In das/cep it is required to define several artifacts like receivers, execution plans, publishers etc.. in order to create a analytic work flow.
But in Stream Processor,it is possible to define the whole flow in a single Siddhi-App.
For further clarification, please refer to the DAS to SP migration guide[2] and WSO2 analytics site[3].
[1] https://wso2.github.io/siddhi/documentation/siddhi-4.0/#incremental-aggregation
[2] https://docs.wso2.com/display/SP4xx/Upgrading+from+a+Previous+Release
[3] https://wso2.com/analytics

WSO2 Stream Processor is the latest WSO2 analytics offering. It has a super set of functionalities that WSO2 CEP had. Following is a comparison of capabilities of WSO2 CEP vs WSO2 SP.
General
The core of SP 4.x is the latest siddhi 4.x which is more stable and has improved performance. While CEP is powered by Siddhi 3.x.
SP is based on C5 and it's lean and light weight than CEP which was based on C4.
SP is designed to be container friendly and could native. Where as CEP had some challenges when deployed in containerised environments.
Everything is now contained in a Siddhi App, which is a single file which can be deployed and executed on it's own.
Incremental Analysis
New siddhi has the incremental analysis feature which is designed to cater batch analytics. With this feature users can easily do time series aggregations without having to integrate with other platforms such as Spark.
Incremental analysis smoothly federates real time analytics with batch analytics by allowing both forms of analytics to be done in the same message flow.
Distributed Deployment
SP 4.x has a distributed architecture which is highly scalable. SP's container friendly nature let's it be scaled massively.
The distributed deployment is fault tolerant and it supports exactly once processing with the aid of Apache Kafka.
CEP distributed architecture was based on Apache Storm.
Also, SP has in build support for Multi data center deployment. While CEP does not.
Tooling
SP has a rich editor which supports auto completion, event simulation, debugging of siddhi queries, etc. CEP only has the query editor UI in the management console.
Status Dashboard of SP let's users monitor their deployment with comprehensive set of statistics related to performance, resource consumption etc of Siddhi Apps and JVM. CEP had the carbon metric support which shows only JVM stats.
Business Rules
SP has Business rules feature where non-tech users can build processing logics through a graphical wizard-like UI without having to rite queries.
Developers can use this feature to present complex problems in a abstract manner which is understandable to business users.
CEP did not have feature focusing on business users.

So, my question is, all the feutures in the CEP and in the ML, are going to flow in the SP?
I don't believe so. StreamProcessor has only subset of capabilies of CEP, DAS or ML. IMHO it t is promoted currently as it is new, more lightweight and faster

Related

Is there any Query Builder with Graphical Interface for WSO2 Stream Processor?

I am looking for an Open source Business Intelligent (BI) Solutions for my organization. So I am trying WSO2 Stream Processor and I could not find any graphical Interface for building RDBMS Queries.
I check editor, portal and widgets.
widgets were very nice for visualizing data but samples were limited and I could not find what I am looking for.
Especially I need an Interface that shows me my DB (or multiple DBs) and when I select Them to show me Tables and I Could select Tables and building my query graphically.
As of now, this is not available in the Editor/Tooling interface. However, it will be easier if this can be viewed on the Editor itself. You can raise a feature request in Siddhi distribution repo and the team will see if we can incorporate it into the roadmap.
Please note, WSO2SP is not currently under active development and you can try the latest WSO2 Streaming Integrator or OSS option of Siddhi Cloud-native Stream Processor. However, Streaming Integrator focuses on streaming data integration. Whereas Siddhi Cloud-native Stream Processor bundles the newest version of Siddhi and it's a tool for building fully-fledged event-driven applications.

Planning an architecture in GCP

I want to plan an architecture based on GCP cloud platform. Below are the subject areas what I have to cover. Can someone please help me to find out the proper services which will perform that operation?
Data ingestion (Batch, Real-time, Scheduler)
Data profiling
AI/ML based data processing
Analytical data processing
Elastic search
User interface
Batch and Real-time publish
Security
Logging/Audit
Monitoring
Code repository
If I am missing something which I have to take care then please add the same too.
GCP offers many products with functionality that can overlap partially. What product to use would depend on the more specific use case, and you can find an overview about it here.
That being said, an overall summary of the services you asked about would be:
1. Data ingestion (Batch, Real-time, Scheduler)
That will depend on where your data comes from, but the most common options are Dataflow (both for batch and streaming) and Pub/Sub for streaming messages.
2. Data profiling
Dataprep (which actually runs on top of Dataflow) can be used for data profiling, here is an overview of how you can do it.
3. AI/ML based data processing
For this, you have several options depending on your needs. For developers with limited machine learning expertise there is AutoML that allows to quickly train and deploy models. For more experienced data scientists there is ML Engine, that allows training and prediction of custom models made with frameworks like TensorFlow or scikit-learn.
Additionally, there are some pre-trained models for things like video analysis, computer vision, speech to text, speech synthesis, natural language processing or translation.
Plus, it’s even possible to perform some ML tasks in GCP’s data warehouse, BigQuery in SQL language.
4. Analytical data processing
Depending on your needs, you can use Dataproc, which is a managed Hadoop and Spark service, or Dataflow for stream and batch data processing.
BigQuery is also designed with analytical operations in mind.
5. Elastic search
There is no managed Elastic search service directly provided by GCP, but you can find several options on the marketplace, like an API service or a Kubernetes app for Google’s Kubernetes Engine.
6. User interface
If you are referring to a user interface for your own use, GCP’s console is what you’d be using. If you are referring to a UI for end-users, I’d suggest using App Engine.
If you are referring to a UI for data exploration, there is Datalab, which is essentially a managed notebook service, and Data Studio, where you can build plots of your data in real time.
7. Batch and Real-time publish
The publishing service in GCP, for both synchronous and asynchronous messages is Pub/Sub.
8. Security
Most security concerns in GCP are addressed here. Which is a wide topic by itself and should probably need a separate question.
9. Logging/Audit
GCP uses Stackdriver for logging of most of its products, and provides many ways to process and analyze those logs.
10. Monitoring
Stackdriver also has monitoring features.
11. Code repository
For this there is Cloud Source Repositories, which integrate with GCP’s automated build system and can also be easily synched with a Github repository.
12. Analytical data warehouse
You did not ask for this one, but I think it's an important part of a data analysis stack.
In the case of GCP, this would be BigQuery.

Can we extend Siddhi CEP java library with Siddhi High Available feature

I am using Sidhhi CEP as Java library in my project . Now i need to analyse my data with High available system (Similar to Esper HA). I have done little bit study about Siddhi High availability
http://wso2.com/library/articles/2014/05/high-availability-deployment-in-wso2-complex-event-processor-0/
Also gone through with the above links
is that possible to the same task using Siddhi java library ???
Above document demonstrates how WSO2 CEP has achieved high availability by using multiple nodes with hazelcast clustering. If you are just using Siddhi CEP as a Java library, then you need to implement your clustering by using hazelcast or any other.
More about WSO2 CEP Clustering [1]
[1] https://docs.wso2.com/display/CLUSTER420/Clustering+Complex+Event+Processor

Business activity monitoring and business analytics relation

A simple question, yet I couldn't find much information on the subject. How is business activity monitoring related to business analytics? I always thought business analytics is a subsystem of the activity monitoring systems. But that's only my limited view so I was wondering. In that trail of thought, how are for instance WSO2 BAM and Google Analytics compared to each other?
Initially WSO2 BAM 2.x.x was just a data analytic framework that can process big data offline (as batch processes with Apache Hadoop) which can also receive data and visualize data.
But from BAM 2.4.0 it comprises WSO2 Complex Event Processing features (CEP) that can monitor events real-time, process them and visualize them in a relatively low latency according to [1].
In Google Analytics most analytics and dashboards are available out of the box but with WSO2 BAM you may need to write some hive queries and dashboards to come up with a great solution.
WSO2 BAM is open source (Apache Licences) and you can use it as you wish with great flexibility although it lacks some out of the box features compare to the Google Analytics.
From BAM 2.4.0 it comes with an inbuilt Activity Monitoring feature [2] that is based on the concept of an Activity ID. This can be used out of the box when your business process is properly configured for activity monitoring use case.
[1] https://docs.wso2.org/display/BAM240/Realtime+Analytics
[2] https://docs.wso2.org/display/BAM240/Activity+Monitoring+Dashboard

WSO2 CEP vs BAM

I am trying to understand the whole WSO2 SOA topology, but not able to understand
how the CEP and BAM fit together
Can CEP provide visual monitoring of processed events e.g. integration with WSO2 GS
Although WSO2 website says CEP is tightly integrated with BAM for post processing I couldnt
find any scenario explaining the same or how its done..( can CEP feed BAM ? how to configure the same)
Why would you have CEP + BAM together ? Any use case
Answers
All WSO2 projects are capable of integrating with each other because they are based on the same underlying platform (WSO2 Carbon). In this particular case, WSO2 CEP and GS. One way is, persisting processed results from CEP in a data store or file, and reading it from a Gadget backend so that the gadget (the frontend) can visualize it in the GS. If you want, you can install GS features (dashboard, gadget repo, etc) on top of CEP as well and use the same server runtime. But, for the latter it has to be based on the same Carbon version
This means, that the same data agent can send events to BAM as well as CEP. They both share the Thrift and REST APIs. Similar to 1, CEP and BAM can exist in the same runtime or can be downloaded and used separately. One related article is at here
The primary use case was processing the same event for real time analytics for CEP and a just-in-time (near real time) batch based processing for BAM. Ex: Processing up time related analytics for servers can be broken down to fit both servers. For CEP the query can do, Alert me a server does not respond for 3 requests in 30 secs. For BAM, you can plot the uptime trend within a hour/day/week.