Informatica real time event processing - informatica

I am new to Infromatica. I am looking for technology that will give me possibility to process big amounts of data (GB/s) in real time and do:
real time event filtering and classification
real time integration with different applications (sending alerts, SMS, emails)
stateless and statefull event analysis
data science stuff in real time (machine learning etc.)
I wonder if Informatica's technology stack is good choice. I found:
Informatica PowerCenter Real time Edition (but this looks for me like some more advanced CDC)
Informatica RulePoint (but this looks for me like some GUI for starting/stopping some predefined rules or creating your own rules for alerting purposes)
The question is, if above technologies, or any other from Informatica's technology stack, are able to do real time event processing and analysis (including statefull analyze and data science stuff- machine learing etc.). And if they do it efficiently.
Regards!

Related

Can akka handle heavy algorithms to transform data from db?

I am new to akka and am trying to see if it answers the problematics i am facing. I have data from databases to extract, transform with algorithms and send by and to actors. This involves a lot of computing.
Can akka handle all this (communication and computing)? Or do i have to call upon another tool to manage the calculus part?
Thank you all.
wip
Well, all I can offer here is my experience. As a matter of fact I am currently working on something similar (i.e an ETL with text files). We're essentially taking a lot of text files and loading their lines up into a PostgreSQL database. This is our setup :
Intel Xeon 8 cores + SSD
Files and app on the same machine
Remote database
We're able to fetch, parse and load 26 millions file lines and creating specific database indices in about 12 minutes, which is about 1.3GB worth of files and 3GB in database. On a much crappier mono-core and HDD setup we can do it in about 40 minutes.
The good thing about Akka is that it will allow you to save up resources and scale more since several actors can share one thread.
Akka can easily handle many millions of message sends per second, oldie but goodie on this topic here in this letitcrash.com post. As long as you factor out blocking operations in separate dispatchers (thread pools) the actor model eases parallel computations a lot, which of course gives you nice wall-clock-time in such data crunching apps.

WSO2 for real time semantic applications

I would like to know, if it is possible to build an application using WSO2. The problem is I need to receive sensor data’s from different devices. Process those data’s in real time and store meaningful information’s semantically using OWL. I am little confused with the type of WOS2 products that I need to consider for developing thise kind of application. In future I need to scale up this system with increase in devices.
You may take a look at WSO2 Complex Event Processor (CEP) for real-time event processing requirements. Please go through the documentation to get an idea.
http://docs.wso2.org/wiki/display/CEP210
Thanks,
Dileepa

Best way of storing/caching real-time data , with disk persistency

I am on Windows, using Visual Studio 2010 C++.
My Application processes a lot of data every 16ms (near-realtime). This data is basically a binary buffer of electric signals.
I need to store this data somehow that would allow for fast access.
My preference is to store some of this data in memory as it comes in real-time , and then persist it to disk in some fashion.
My app could at any point require data from any part in the session (beginning to current), and so access needs to be fast, and it would be nice if the queries could be cached for a certain amount of time as well.
So basically if anyone has experience with storing/caching and retrieving real-time data, it would be very helpful.
Any Ideas?
Roey
You should learn the ORM tools -- object-relational mapping. In short -- tools to keep your objects in SQL DB. Another way is to use object storages. Google for them too.
There is a number of tools in both categories. You should chose one that fits best for you by its price, performance and ease of use.

How do you model a business workflow in ColdFusion?

Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.

Has anyone used dataflow programming in a real project with a mainstream language?

I am looking at using some Dataflow programming techniques in a clojure program but I am having difficulty in finding much information from projects using Java, C#, or other mainstream languages that have used such techniques in the real world. I would be grateful to hear if anyone has any expereinces they could share regarding this.
Here, we are! We've made... (quotation is from one of my older post):
We've designed and implemented a DF
server for our automation project
(dispatcher, component iterface, a
bunch of components, DF language, DF
compiler, UI). It is written in bare
C++, and runs on several Unix-like
systems (Linux x86, MIPS, avr32 etc.,
Mac OSX). It lacks several features,
e.g. sophisticated flow control,
complex thread control (there is only
a not too advanced component for it),
so it is just a prototype, even it
works. We're now working on a
full-featured server. We've learnt lot
during implementing and using the
prototype.
Also, we'll make a visual editor some
day.
There're dataflow systems wich don't even mention dataflow approach:
SynthEdit: http://www.synthedit.com/ - It's an audio related framework and component set for creating VST plugins
TinyOS: http://www.tinyos.net/ - It's an embedded operating system/framework
Digital synthetisers/samplers are dataflow systems, programmed - supposedly - in C or some parts in Assembly, check my answer to another post about some examples.
Quartz Composer, a graphic magic tool for Mac,
Blender has dataflow subsystem for image composing.
Writing a dataflow system is not rocket science. Here's my older post about the basics of dataflow framework.
The term dataflow is wide. There are realtime synchronous dataflow systems, like synthetisers and samplers, there are asynchronous ones, like our home aut. system (the system is in idle unless the user presses a button or a timer runs out), and there're even different architectures, like spreadsheets or make.
Wanna reading more about dataflow programming? Read J. Paul Morrison's site and book.
Pervasive DataRush is a framework for parallel dataflow programming for any JVM language, including Clojure.
Pervasive DataRush uses a dataflow architecture. The architecture implements a program that executes as a graph of computation nodes interconnected by dataflow queues. The nodes use the queues to share data. As the data is streaming, only data required by any active operation needs to be in memory at any given time, allowing very large data sets to be analyzed. Besides offering the potential for scaling to problems larger than available memory, dataflow graphs exploit multiple forms of parallelism.
Customers are using DataRush for big data analytics and data preparation (ETL).
We've made another one: a collaborative spreadsheet with MySQL/PHP backend and AJAX frontend. The software is in beta state, documentation is under construction.