Simple ETL: Smooks or ETL product - wso2

I am fairly new to the subject and doing some research.
I have an ESB (using WSO2 ESB) and want to extract master data from the passing messages (like Customers, Orders, etc) and store them in DB to keep as a reference data. Source data is in XML coming from web services.
So there needs to be a component that will be able to maintain master data: insert new objects, delete old and update changed (would be also nice to have data events so ESB can route data accordingly).Basically, the logic will be similar for any entity type and it might be good idea to autogenerate it for all new entity types...
Options as I see them now:
Use Smooks with either SQLExecutor or Hibernate for persistence with all matching logic written either in smooks config or in DAO annotations
Use some open source ETL tool (like Talend, Kettle, Clover, etc). So the data will be passed to the ETL and all transformation logic is defined there. Also could accommodate future scenarios when they appear or can be an overkill..
.
Would appreciate if you share your thoughts and point me to the right direction.

You'd better to leave your database part to another tool.
If you have a fair amount of database interactions in your message flow, you can expect serious decreases in your performance.
However you do not need an ETL for the use case you explained. You can simply do it using WSO2 DSS by creating services to insert or update your data inside the database.
We have been using this for message logging purposes (inside DB) beside the ESB and are happy with that. It's better to use it as non-blocking fire-and-forget web services in your message flow within ESB. Hope this helps.

Related

Implementing a simple Restful service to store and retrieve data using AWS API Gateway/Lambda

I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!
Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.

Creating API for application interaction

We have an internal application. As time went on and new applications were requested, that exchange data between eachother, the interaction became bound to the database schema. Meaning changes in the database require changes everywhere else. As we plan to build even more applications that will depend on the same data this quickly will become and unmanagable mess.
Now i'm looking to abstract that interaction behind an API. Currently i have trouble choosing the right tool.
Interaction at times could be complex, meaning data is posted to one service and if the action has been completed it should notify the sender of that.
Another example would be that some data does not have context without the data from other services. Lets say there is one service for [Schools] and one for [Students]. So if the [School] gets deleted or changed the [Student] needs to be informed about it immeadetly and not when he comes to [School].
Advice? Suggestions? SOAP/REST/?
I don't think you need an API. In my opinion you need an architecture which decouples your database from the domain logic and other parts of the application. Such an architecture is for example clean architecture, onion architecture and hexagonal architecture (ports&adapters by new name). They share the same concepts, you have a domain logic, which does not depend from any framework, external lib, delivery method, data storage solutions, etc... This domain logic communicates with the outside world through adapters having well defined interfaces. If you first design the inside of your domain logic, and the interfaces of the adapters, and just after the outside components, then it is called domain driven design (DDD).
So for example if you want to move from MySQL to MongoDB you already have a DataStorageInterface, and the only thing you need is writing a MongoDBAdapter which implements this interface, and ofc migrate the data...
To design the adapters you can use two additional concepts; command and query segregation (CQRS) and event sourcing (ES). CQRS is for connecting delivery methods like REST, SOAP, webapplications, etc... to the domain logic. For example you can raise a CreateUserCommand from your REST API. After that the proper listener in the domain logic processes that command, and by success it raises a domain event, like UserCreatedEvent. Your REST API can listen to that event and respond with a success message to the REST client. The UserCreatedEvent can be listened by one or more storage adapter too. So they can process that event and persist the new user. You don't necessary use only a single database. For example if a relational database is faster by a specific type of query, then you can use that, but if a noSQL database suites better to the job, then you can use that too. So you can use as many databases as you want for your queries, the only thing you need is writing a storage adapter for them. For example if your REST client wants to retrieve the profile of a specific user, then it can raise a GetUserProfileByIdQuery and the domain logic can ask the adapter of a database which can serve the query. After that the adapter can send for example an SQL query to a MySQL database and return the response. By ES you add EventStorage to your system, which stores the raised domain events. It can be very useful if you want to migrate your data from one query database to another. In that case you create a new storage adapter to your new database, and replay all of the domain events from the EventStorage in historical order to that adapter, so it can fill the new database with the relevant data. That's all, you don't have to write complicated migration scripts...
In your case I think your should create at least domain events, and use event sourcing. That will totally decouple your database from the other parts of your application. Adding a REST or SOAP API can have a similar effect, but building HTTP connections to access your database can slow down your application.

how to persist runtime parameter of a service call then use as parameter for the next service call WSO2 ESB

I am seeking advice on the most appropriate method for the following use case.
I have created a number of services using the WSO2 Data Services Server which I want to run periodically passing in parameters of last run date. ie. the data services has two parameters start and end dates to run the sql against.
I plan to create a service within WSO2 ESB to mediate the execution of these service, combine the results to pass onto another web service. I think I can manage this ;-) I will use a scheduled task to start this at a predefined interval.
Where I am seeking advice is how to keep track of the last successful run time as I need to use this as parameters for the data services web services.
My options as I see them
create a config table in my database and create another data services web service to retrieve and persist these values
use vfs transport and somehow persist these values to a text file as xml, csv or json
use some other way like property values in the esb sequence and somehow persist these
any other??
With my current knowledge it would seem that 1 is easiest but it doesn't feel right as I would have to have write access to the database, something I possibly wouldn't normally have when architecting a solution like this in the future, 2 appears like it could work with my limited knowledge of WSO2 ESB to date but is 3 the best option? But as you see from the detail above this is where I start to flounder.
Any suggestions would be most welcome
I do not have much experience with ESB. However I also feel that your first option would be easier to implement.
A related topic was also discussed in WSO2 architecture mailing list recently with subject "[Architecture] Allow ESB to put and update registry properties"
It was discussed to introduce a registry mediator, but I'm not sure it will be implemented soon.
I hope this helps.
As of now there is no direct method to save content to ESB through ESB. But you can always write a custom mediator to do that or use the script mediator to achieve this
Following is the code snippet for the script mediator
<script language="js"><![CDATA[
importPackage(Packages.org.apache.synapse.config);
/* creates a new resource */
mc.getConfiguration().getRegistry().newResource("conf:/store/myStore",false);
/* update the resource */
mc.getConfiguration().getRegistry().updateResource(
"conf:/store/myStore", mc.getProperty("myProperty").toString());
]]></script>
I've written a blog post on how to do this in ESB 4.8.1. You can find it here

Does Biztalk Server support data exchange without use of web services

As I have very little knowledge on how ESB's work in tandem with database I'm asking a question regarding how communication can take place between the two hoping I'll atleast be pointed in the right direction to search in!
SITUATION : We have two systems(one of them is the client's) on different networks which have their own databases. We are required to do a regular real-time data exchange of all points present in our database with the other. We are also required to have a provision to be abel to import data into our system. This exchange has to follow SOA functionality over customer provided Biztalk ESB.We are supposed to provide the exchange by the use of ODBC.
Question: My query is whether it is possible to integrate the databases to the ESB as some endpoints without making any use of WEBSERVICES or extra interfaces, and send the data over the ESB as a pull-push transfer mechanism?
I have tried searching the net for this situation but have not come up with a lot of straightforward answers. Could someone please point me in the right direction.
ESB Toolkit in BizTalk is not an ESB! It is just small additional tool for some special cases.
Let's stop talk about the ESB, we need to solve the technical problem, right?
As I can understand you have two SQL databases and want to integrate them.
To do so with BizTalk the easiest way is to use the WCF-SQL ports/adapters.
You start the Wizards for this adapter, choose the tables/sp-s which should provide data/consume data, the Wizard will generate all needed Xml schemas for you.
Then you will use BizTalk Mapper to create the Xslt maps, which will transfer one SQL data format to another.
They you will create a pair of ports. One will consume data from one SQL database, the second will insert data to another SQL database. One of this port will use the mentioned above Xslt map.
If you need more processing, you could create and orchestration to manage additional processing, sophisticated error handling, etc.
I would recommend using MSMQ. There's a fairly detailed description of it here