WSO2 EI - Best Practice for Creating Data Sources - wso2

For an ERP system which has a huge database, what is the best practice to create a Data source? I want to create quite a few endpoints on the Enterprise Integrator. Wanted to know which is the best practice:
1. One database connection which will be shared by all the endpoints OR
2. Single connection for each endpoint which has access to only those database tables which it needs?

Related

How to configure WSO2 identity server to avoid single point of failure?

My Company wants to setup wso2 identity server cluster on 3 machines such that if one machine fails, the cluster still works.
All the wso2 documentation shows clustering with shared user store and database but does not mention how to avoid single point of failure.
As per my understanding, the only way to do the same is to form an external ldap cluster as user store and an external database cluster. But that would be much complex and hard to manage.
Can we configure the wso2's embedded ldap to replicate and sync with other node's embedded ldap?
Is there any other way to avoid single point of failure in wso2?
No, you can't use embedded LDAP.
You should avoid using embedded LDAP in production at all costs. It will sure get corrupted with concurrent requests and growth the of data. And you will not be able to recover at all. It's just there for testing purposes.
If you want to avoid any single point of failures due to DB or LDAPs, you should be using DB and LDAP clustered as instructed by the respective provider. And point the common LB URL to the WSO2 server.

WSO2 - Best way to implement ETL jobs

What is the best way to implement ETL jobs using WSO2.
We've been trying to leverage data services within WSO2 EI 6.4.
Our objective is to fetch data from web services as well as RDBMS and to store it to an RDBMS.
Any suggestions / ideas will be much appreciated.
In My Experience with WSO2 middleware, Data services may not be the best fit for ETL jobs.
We had similar case where we wanted to copy data from one Databse to another Application.
For that we wrote integration (java) web service to fetch data from database send to the applciation using application Interface (web service exposed from application). and configured the Integration web service in EI scheduler to periodically run the service to sync the data.
Yes, Stream processor is better but need to really see if it fits in for ETL

Sharing data between isolated microservices

I'd like to use the microservices architectural pattern for a new system, but I'm having trouble figuring out how to share and merge data between the services when the services are isolated from each other. In particular, I'm thinking of returning consolidated data to populate a web app UI over HTTP.
For context, I'm intending to deploy each service to its own isolated environment (Heroku) where I won't be able to communicate internally between services (e.g. via //localhost:PORT. I plan to use RabbitMQ for inter-service communication, and Postgres for the database.
The decoupling of services makes sense for CREATE operations:
Authenticated user with UserId submits 'Join group' webform on the frontend
A new GroupJoinRequest including the UserId is added to the RabbitMQ queue
The Groups service picks up the event and processes it, referencing the user's UserId
However, READ operations are much harder if I want to merge data across tables/schemas. Let's say I want to get details for all the users in a certain group. In a monolithic design, I'd just do a SQL JOIN across the Users and the Groups tables, but that loses the isolation benefits of microservices.
My options seem to be as follows:
Database per service, public API per service
To view all the Users in a Group, a site visitor gets a list of UserIDs associated with a group from the Groups service, then queries the Users service separately to get their names.
Pros:
very clear separation of concerns
each service is entirely responsible for its own data
Cons:
requires multiple HTTP requests
a lot of postprocessing has to be done client-side
multiple SQL queries can't be optimized
Database-per-service, services share data over HTTP, single public API
A public API server handles request endpoints. Application logic in the API server makes requests to each service over a HTTP channel that is only accessible to other services in the system.
Pros:
good separation of concerns
each service is responsible for an API contract but can do whatever it wants with schema and data store, so long as API responses don't change
Cons:
non-performant
HTTP seems a weird transport mechanism to be using for internal comms
ends up exposing multiple services to the public internet (even if they're notionally locked down), so security threats grow from greater attack surface
Database-per-service, services share data through message broker
Given I've already got RabbitMQ running, I could just use it to queue requests for data and then to send the data itself. So for example:
client requests all Users in a Group
the public API service sends a GetUsersInGroup event with a RequestID
the Groups service picks this up, and adds the UserIDs to the queue
The `Users service picks this up, and adds the User data onto the queue
the API service listens for events with the RequestID, waits for the responses, merges the data into the correct format, and sends back to the client
Pros:
Using existing infrastructure
good decoupling
inter-service requests remain internal (no public APIs)
Cons:
Multiple SQL queries
Lots of data processing at the application layer
harder to reason about
Seems strange to pass large quantities around data via event system
Latency?
Services share a database, separated by schema, other services read from VIEWs
Services are isolated into database schemas. Schemas can only be written to by their respective services. Services expose a SQL VIEW layer on their schemas that can be queried by other services.
The VIEW functions as an API contract; even if the underlying schema or service application logic changes, the VIEW exposes the same data, so that
Pros:
Presumably much more performant (single SQL query can get all relevant data)
Foreign key management much easier
Less infrastructure to maintain
Easier to run reports that span multiple services
Cons:
tighter coupling between services
breaks the idea of fundamentally atomic services that don't know about each other
adds a monolithic component (database) that may be hard to scale (in contrast to atomic services which can scale databases independently as required)
Locks all services into using the same system of record (Postgres might not be the best database for all services)
I'm leaning towards the last option, but would appreciate any thoughts on other approaches.
To evaluate the pros and cons I think you should focus on what microservices architecture is aiming to achieve. In my opinion Microservices is architectural style aiming to build loosely couple applications. It is not designed to build high performance application so scarification of performance and data redundancy are something we are ready accept when we decided to build applications in a microservices way.
I don't think you services should share database. Tighter coupling scarify the main objective of the microservices architecture. My suggestion is to create a consolidated data service which pick up the data changes events from all the other services and update the database behind it. You might want to design the database behind the consolidated data service in a way that is optimised for query (like a data warehouse) because that's all this service will be used for. You might want to consider using a NoSQL database to support your consolidated data service.

Web services over HBase

I am new to the Hadoop environment, sorry if the question is obvious...
I need to develop a web service to record and read large volumes of data. Because of this requirement I thought of using a Hadoop cluster and HBase as my database.
I have designed my hbase schema to satisfy my requirements, so far so good.
The thing is that since it is a service I am developing, I would like the users of the service not to know the internal representation of the data.
I do not want the users to have to invoke a Put to a certain table, for example, to the Clients table, but instead invoke a high-abstraction method, for example, createClient().
How do I add this abstraction layer on top of HBase while maintaining the characteristics of reliable and distributed and the capacity to service lots of users simultaneously offered by HBase itself?
Thanks a lot
Consider Hbase Stargate to enable a REST server. If you want to obscure the table name in the URI, perhaps proxy Stargate with a web server.

WSO2 Stratos - Multi-tenant application development

I am exploring the product WSO2 stratos ,watched some of the webinar recordings. I would like to create an application and expose it as SAAS.One of the webex recordings cover this in detail , but it is not explaining the multi-tenancy on data storage. Is there any tutorial available for the same ? I would like to use shared schema for data storage. What kind of database can i use for this ( For eg: MySql,MongoDB,Cassandra etc ) Is it possible to use some frame works like Athena ? I am just trying to do a kind of POC and then i need to decide whether this platform really fits for the application that i am thinking to build
You can create databases through WSO2 Storage Server in StratosLive which can be accessed via storage.stratoslive.wso2.com. You need to create a database and attach a user to it. Then you can access that database from your webapp (you will get a jdbc url) as you do it in normal cases. Also, you can create Cassandra keyspaces in the Storage Server. But we dont have the MongoDB support at the moment. There is no documentation on this yet.
Yes, you're right. Multi-tenant data architecture is up to the user to decide. This white paper from Microsoft explains multi-tenant data architecture nicely. The whitepaper however is written assuming you're using an RDBMS. I haven't played around with Athena so it's difficult to say how it'll map with what Stratos provides. The data architecture might be different when you're using a NoSQL DB and different DBs have different ways of filtering a set of data by a given tenant (or an ID). So probably going by the whitepaper it'll map to,
Different DBs -> Different keyspaces
Different tabeles -> Different column families
Shared schema -> Shared column family
Better to define your application characteristics before hand and then choose an appropriate DB