Are the reactive programming tools like Project Rector, Vertx, RxJava substitute Akka Actor model? [closed] - akka

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
For past couple of years I am seeing changes in "Reactive Programming/System" world. To implement Non-blocking, async, back pressure systems these are very good candidates. But often I try to see which one(s) is best among Akka/ProjectReactor/RxJava/Vertx to build reactive systems and also that can implement "Reactive Manifesto".
Personally I have used Akka at good level and have fair idea on Project-Reactor/Spring webflux. During my analysis ,I see that Akka satisfies all "Reactive Manifesto" properties , in fact is built on top of these principles. It provides several features like parallelism, inherent concurrency(Actor), Streams, APIs,Clusters, Monitoring, Resiliency ,...etc.
Other-side Project-Reactor like frameworks integrated/adopted well by Spring community,followed by Kafka Reactive, R2Drivers, R2Socket ....etc.
I think either of these alone not satisfying all requirements of reactive system. Seems we need combination like Akka and Reactor OR Akka & RxJava..... Please Share your thoughts.

I think this is a bit of an "opinion" question, but I'll try.
Akka is also a reactive framework. In fact, Akka and Vert.x are quite close in their concepts. They all "implement" Reactive Manifesto.
Let's see this on a Vert.x example:
Responsive: The system responds in a timely manner if at all possible.
This basically says that slow requests shouldn't block faster requests.
Vert.x utilises MultiReactor design pattern to provide responsiveness. This pattern is based on multiple EventLoop threads that aim to execute functions in a queue as quickly as possible.
Resilient: The system stays responsive in the face of failure.
This basically says that the system shouldn't crash altogether if a single request fails.
Vert.x uses the concept of Handlers to process new events. In case of an error, it will be handled by an ErrorHandler.
Elastic: The system stays responsive under varying workload.
The unit of elasticity in Vert.x is a Verticle. We can add more verticles at runtime to process more requests, and we also can undeploy verticles.
Message Driven: Reactive Systems rely on asynchronous message-passing
to establish a boundary between components that ensures loose
coupling.
Vert.x uses EventBus to pass messages between Verticles.

Related

When to use multithreading in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am C++ programmer (intermediate) and learning multi-threading now. I found it quite confusing when to use multi-threading in C++? How will i come to know that i need to use multi-threading in which part of section?
When to use multithreading in C++?
When you have resource intensive task like huge mathematical calculation , or I/O intensive task like reading or writing to file, use should your multithreading.
Purpose should be, you can be able to run multiple things (tasks) together, so that it will increase performance and responsiveness of your application. Also, learn about synchronization before implementing multithreading in your application.
When to use multithreading in C++?`
Well - the general rule of thumb is: use it when it can speed up your application. The answer isn't really language-dependant.
If you want to get an in-depth answer, then you have to consider a few things:
Is multithreading possible to implement inside your code? Do you have fragments which can be calulated at the same time and are intependent of other calculations?
Is multithreading worth implementing? Does your program run slow even when you did all you could to make it as fast as possible?
Will your code be run on machines that support multithreading (so have multiple processing units)? If you're designing code for some kind of machine with only one core, using multithreading is a waste of time.
Is there a different option? A better algorithm, cleaning the code, etc? If so - maybe it's better to use that instead of multithreading?
Do you have to handle things that are hard to predict in time, while the whole application has to constantly run? For example - receiving some information from a server in a game?
This is a slightly subjective subject... But I tend to use multi-threading in one of two situations.
1 - In a performance critical situation where the utmost power is needed (and the algorithm of course supports parallelism), for me, matrix multiplications.
2 - Rarely where it may be easier to have a thread managing something fairly independent. The classic is networking, perhaps have a thread blocking waiting for connections and spawning threads to manage each thread as it comes in. This is useful as the threads can block and respond in a timely manner. Say you have a server, one request might need disk access which is slow, another thread can jump in an field a different request while the first is waiting for its data.
As has been said by others, only when you need to should you think about doing it, it gets complicated fast and can be difficult to debug.
Multithreading is a specialized form of multitasking and a multitasking is the feature that allows your computer to run two or more programs concurrently.
I think this link can help you.
http://www.tutorialspoint.com/cplusplus/cpp_multithreading.htm
Mostly when you want things to be done at the same time. For instance, you may want a window to still respond to user input when a level is loading in a game or when you're downloading multiple files at once, etc. It's for things that really can't wait until other processing is done. Of course, both probably go slower as a result, but it really gives the illusion of multiple things happening at once.
Use multithreading when you can speed up your algorithms by doing things in parallel. Use it in opposition to multiprocessing when the threads need access to the parent process's resources.
My two cents.
Use cases:
Integrate your application in a lib/app that already runs a loop. You would need a thread of your own to run your code concurrently if you cannot integrate into the other app.
Task splitting. It makes sense to organize disjoint tasks in threads sometimes, such as in separating sound from image processing, for example.
Performance. When you want to improve the throghput of some task.
Recommendations:
In the general case, don't do multithreading if a single threaded solution will suffice. It adds complexity.
When needed, start with higher-order primitives, such as std::future and std::async.
When possible, avoid data sharing, which is the source of contention.
When going to lower level abstractions, such as mutexes and so on, encapsulate it in some pattern. You can take a look at these slides.
Decouple your functions from threading and compose the threading into the functions at a later point. Namely, don't embed thread creation into the logic of your code.

Document locking in multithreading environment [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We have an application that supports binary plugins (dynamically loaded libraries) as well as a number of plugins for this application. The application is itself multithreaded and the plugins may also start threads. There's a lot of locking going on to keep data structures consistent.
One major problem is that sometimes locks are held across calls from the application into a plugin. This is problematic because the plugin code might want to call back into the application, producing a deadlock. This problem is aggravated by the fact that different teams work on the base application and the plugins.
The question is: Is there a "standard" or at least widely used way of documenting locking schemes apart from writing tons of plain text?
It is a theorical approach, I hope it will help you a little.
To me you can avoid this situation by redesigning the way plugins and your application are communicating (if possible).
A plugin's code is not secure. To ensure the application's flexibility and its stability you must build a standard way to exchange informations and make critical actions with plugins.
The easiest way is to avoid to manage each specific plugin behavior by defining a lock free api.
To do that you can make the critical parts of your plugins asynchronous by using ring buffer / disruptor or just an action buffer.
EDIT
Sorry if I argue again in the same way, but this seems to me to be like an "IO" problem.
You have concurrent access on some resources (memory/disc/network .... don't know which ones) and the need to expose them with high availability. And finally these resources cannot be access randomly without locking your application.
With a manager dedicated on the critical parts, the wait can be short enough to be imperceptible.
However this is not easily applicable to an already existing application, mostly if it is a large one.
if you don't already know this kind of stuff, I encourage you to look to the "disruptor". To me it is one of the modern basic to consider every time I work with threads.
I suggest to use Petri Net which are simple to learn and can describe very well the cooperation among the different parts of your software. In this question are described several models and tools useful to document concurrency: https://stackoverflow.com/questions/164187/what-tools-diagrams-do-you-use-for-modelling-multithreaded-systems. You can choose the right model according your needs.
If your locking scheme is simple enough that you can describe it in documentation, then by all means do so. However, if deadlocks are occurring in practice, the problem may not be lack of documentation, but that the API is not serving the needs of your plugin authors. Documenting the limitations is a good first step, but removing the limitations is better.
Consider the possibilities for a deadlock on a single lock held by your code and requested by the plugin:
Your code is not in the middle of reading or writing, but is still holding the lock just because that's how the code was written. In that case, your code should release the lock before calling into the plugin.
Your code and the plugin are both reading data, and using the lock to prevent concurrent writers. In that case, use a readers-writers lock.
Your code is in the middle of changing data, and the plugin wants to read it. This is not generally safe; there's a reason you're using a lock to protect the entire modification, after all. Most attempts to make this safe fail in practice (it is as hard as writing lock-free code). In this case, the best thing to do is change your design so your code finishes changes before calling the plugin, or starts changes after calling the plugin.
Your code is in the middle of reading data, and the plugin wants to change it. Like the previous case, this is also not safe. Your code should release the lock before calling the plugin and acquire it again afterward, and assume the data have changed, re-reading anything you need to continue.
This is the best advice I can give without knowing anything more about your application and its specific needs.
For most applications, software companies shy away from 3rd party binary plugins in the same process because when something goes wrong, it is very difficult to figure out why. Users usually blame the application, not the plugin, and the perception of the quality of your application is poor. It can be made to work by keeping very close relationships with your plugin authors, usually including exchanging all source code (optionally under restrictive licenses or NDAs).
Yes, there is a standard way of documenting locking schemes using in university.
1/ use diagram
you must draw a diagram. each point on the diagram is a lock link to other thread.
ex: T1 T2
1 -R-> A
2 <-W- B
2/ use table
you must write down each point and thread on each row
ex: T1 T2
lockX(A) lockS(B)
read(A) read(B)
A<-A50 unlock(B)
Conclude: this is very complex task and take many time to trace.

Is there a distributed data processing pipeline framework, or a good way to organize one?

I am designing an application that requires of a distributed set of processing workers that need to asynchronously consume and produce data in a specific flow. For example:
Component A fetches pages.
Component B analyzes pages from A.
Component C stores analyzed bits and pieces from B.
There are obviously more than just three components involved.
Further requirements:
Each component needs to be a separate process (or set of processes).
Producers don't know anything about their consumers. In other words, component A just produces data, not knowing which components consume that data.
This is a kind of data flow solved by topology-oriented systems like Storm. While Storm looks good, I'm skeptical; it's a Java system and it's based on Thrift, neither of which I am a fan of.
I am currently leaning towards a pub/sub-style approach which uses AMQP as the data transport, with HTTP as the protocol for data sharing/storage. This means the AMQP queue model becomes a public API — in other words, a consumer needs to know which AMQP host and queue that the producer uses — which I'm not particularly happy about, but it might be worth the compromise.
Another issue with the AMQP approach is that each component will have to have very similar logic for:
Connecting to the queue
Handling connection errors
Serializing/deserializing data into a common format
Running the actual workers (goroutines or forking subprocesses)
Dynamic scaling of workers
Fault tolerance
Node registration
Processing metrics
Queue throttling
Queue prioritization (some workers are less important than others)
…and many other little details that each component will need.
Even if a consumer is logically very simple (think MapReduce jobs, something like splitting text into tokens), there is a lot of boilerplate. Certainly I can do all this myself — I am very familiar with AMQP and queues and everything else — and wrap all this up in a common package shared by all the components, but then I am already on my way to inventing a framework.
Does a good framework exist for this kind of stuff?
Note that I am asking specifically about Go. I want to avoid Hadoop and the whole Java stack.
Edit: Added some points for clarity.
Because Go has CSP channels, I suggest that Go provides a special opportunity to implement a framework for parallelism that is simple, concise, and yet completely general. It should be possible to do rather better than most existing frameworks with rather less code. Java and the JVM can have nothing like this.
It requires just the implementation of channels using configurable TCP transports. This would consist of
a writing channel-end API, including some general specification of the intended server for the reading end
a reading channel-end API, including listening port configuration and support for select
marshalling/unmarshalling glue to transfer data - probably encoding/gob
A success acceptance test of such a framework should be that a program using channels should be divisible across multiple processors and yet retain the same functional behaviour (even if the performance is different).
There are quite a few existing transport-layer networking projects in Go. Notable is ZeroMQ (0MQ) (gozmq, zmq2, zmq3).
I guess you are looking for a message queue, like beanstalkd, RabbitMQ, or ØMQ (pronounced zero-MQ). The essence of all of these tools is that they provide push/receive methods for FIFO (or non-FIFO) queues and some even have pub/sub.
So, one component puts data in a queue and another one reads. This approach is very flexible in adding or removing components and in scaling each of them up or down.
Most of these tools already have libraries for Go (ØMQ is very popular among Gophers) and other languages, so your overhead code is very little. Just import a library and start receiving and pushing messages.
And to decrease this overhead and avoid dependency on a particular API, you can write a thin package of yours which uses one of these message queue systems to provide very simple push/receive calls and use this package in all of your tools.
I understand that you want to avoid Hadoop+Java, but instead of spending time developing your own framework, you may want to have a look at Cascading. It provides a layer of abstraction over underlying MapReduce jobs.
Best summarized on Wikipedia, It [Cascading] follows a ‘source-pipe-sink’ paradigm, where data is captured from sources, follows reusable ‘pipes’ that perform data analysis processes, where the results are stored in output files or ‘sinks’. Pipes are created independent from the data they will process. Once tied to data sources and sinks, it is called a ‘flow’. These flows can be grouped into a ‘cascade’, and the process scheduler will ensure a given flow does not execute until all its dependencies are satisfied. Pipes and flows can be reused and reordered to support different business needs.
You may also want to have a look at some of their examples, Log Parser, Log Analysis, TF-IDF (especially this flow diagram).

Looking for a C or C++ library providing a functionality similar to Google Go's channels [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
...for use in a multithreaded network server.
I want to pass data around between multiple threads. Currently I'm using sockets, with the master thread blocking on select() and workers blocking on recv(), though I feel there probably are more advanced or prepackaged ways of handling this task in C++.
I would have worker threads waiting in a thread pool.
Then the master waiting on select (for both reads and writes).
As data comes the master adds jobs to the thread pool. As each job is added a thread wakes up executes the job and returns to the pool. This way you are not blocking threads waiting on specific ports with recv() and a fixed set of child threads can handle all incoming traffic.
Currentl libs that support this functionality in ready made objects:
ACE: http://www.cs.wustl.edu/~schmidt/ACE.html
Poco: http://pocoproject.org/
libthread from plan9port includes a Channel struct that will be very similar; take note of Russ Cox's contribution to both plan9port and go-lang, and the libthread history:
Moving in a different direction, Luca Cardelli and Rob Pike developed
the ideas in CSP into the Squeak mini-language [4] for generating user
interface code. (This Squeak is distinct from the Squeak Smalltalk
implementation.) Pike later expanded Squeak into the fully-fledged
programming language Newsqueak [5][6] which begat Plan 9's Alef [7]
[8], Inferno's Limbo [9], and Google's Go [13].
At a later point in Plan 9's history, it became too much effort to maintain infrastructure for two languages, so Alef was discontinued and the CSP constructs ported to C in the form of libthread.
So, since go channels are essentially a direct descendent from libthread, I don't think you'll find anything more similar :)
You can try the ACE library which ships with pipes and message queues which are specially suited for inter-thread communication.
**ACE stands for Adaptive Communication Environment*
Maybe ZeroMQ might be worth checking out. It has an 'inproc' channel which allows you to communicate between threads. Of course, you can only send strings between threads, not objects, but on the other hand it supports other transports like TCP/IP (so you can easily communicate between processes on a network), is cross platform and has language bindings for most current languages.
"A Channel is a buffered or unbuffered queue for fixed–size messages" (plan9 thread).
There is a buffered queue in the TBB: concurrent_bounded_queue.
And I've just implemented a kind of unbuffered Channel in C++11: https://gist.github.com/ArtemGr/7293793. Although a more generic implementation would be to create a pair of references (like in the Felix mk_ioschannel_pair), one for each endpoint of the channel, in order to interrupt any waiting in case the other end of the channel no longer exists.

best way to write a linux daemon [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
For work i need to write a tcp daemon to respond to our client software and was wondering if any one had any tips on the best way to go about this.
Should i fork for every new connection as normally i would use threads?
It depends on your application. Threads and forking can both be perfectly valid approaches, as well as the third option of a single-threaded event-driven model. If you can explain a bit more about exactly what you're writing, it would help when giving advice.
For what it's worth, here are a few general guidelines:
If you have no shared state, use forking.
If you have shared state, use threads or an event-driven system.
If you need high performance under very large numbers of connections, avoid forking as it has higher overhead (particularly memory use). Instead, use threads, an event loop, or several event loop threads (typically one per CPU).
Generally forking will be the easiest to implement, as you can essentially ignore all other connections once you fork; threads the next hardest due to the additional synchronization requirements; the event loop more difficult due to the need to turn your processing into a state machine; and multiple threads running event loops the most difficult of them all (due to combining other factors).
I'd suggest forking for connections over threads any day. The problem with threads is the shared memory space, and how easy it is to manipulate the memory of another thread. With forked processes, any communication between the processes has to be intentionally done by you.
Just searched and found this SO answer: What is the purpose of fork?. You obviously know the answer to that, but the #1 answer in that thread has good points on the advantages of fork().
Apart from #hobodave's good answer, another benefit of "forking per connection" is that you could implement your server very simply, by using inetd or tcpserver or the like: you can then use standard input and standard output for communicating with the socket, and don't have to do any listening-socket management (listening for connections, etc.), etc.
Another option, of course, is pre-forking several copies of the daemon and having each one staying alive and continuing to answer requests. It all depends on your application, expected load and performance requirements, among other things.
The easiest and simplest way is to write an inetd-based daemon; your software can ignore the fact that it is running over a TCP connection and simply handle input/output via stdin/stdout. That works well in the vast majority of cases.
If you're not planning to be hammered with many new connections per second, consider running from inetd. Otherwise...
Download the OpenSSH source. They've put a lot of work into the privilege separation just right, it's portable, and it's been scrutinized for security more than just about anything else.
Adapt it for your needs, you can probably throw out most of it. Comply with the license agreement of course. Follow future patches with a good SCC.
Don't worry about the performance of forking processes vs threads until you have good evidence it's a real issue. Apache went for years and years running the busiest sites with just the simple process-per-client model.
If you're really ambitious, you could use some kind of a non-blocking asynchronous IO model. I like Boost.Asio, but I'm heavy into C++.
Make sure your code handles signals correctly. HUP to reload configuration. TERM to shutdown gracefully.
Don't try to write your own log file. Use syslog only, or just write to stderr that can be redirected to syslog. It's a real pain trying to set up logrotate on home-rolled servers that all log slightly differently.
If you want to avoid threading / forking all together, I would recommend using all non-blocking I/O along with libevent.
Libevent is fairly well known as a high performance solution for event driven programming.
Look into ACE (C++/Java). It has a number of threaded, event, and forking TCP reactors that address your communications requirements. You can also look into Boost ASIO which does something similar