Can flink's state shared in several stream? - state

I have a question about flink' state. I want to know whether state could be shared by several stream. Thank you.

A given piece of Flink state is held in a single operator. But that operator can be something like a RichCoFlatmap or a CoProcessFunction, and receive and process inputs from two streams (and you can cascade these if more than two streams are involved).
Your question is very broad, so it's not clear if this approach matches the use case you have in mind.

Related

asio, shared data, Active Object vs mutexes

I want to understand what is true-asio way to use shared data?
reading the asio and the beast examples, the only example of using shared data is http_crawl.cpp. (perhaps I missed something)
in that example the shared object is only used to collect statistics for sessions, that is the sessions do not read that object's data.
as a result I have three questions:
Is it implied that interaction with shared data in asio-style is an Active Object? i.e. should mutexes be avoided?
whether the statement will be correct that for reading the shared data it is also necessary to use "requests" to Active Object, and also no mutexes?
has anyone tried to evaluate the overhead of "requests" to Active Object, compared to using mutexes?
Is it implied that interaction with shared data in asio-style is an Active Object? i.e. should mutexes be avoided?
Starting at the end, yes mutexes should be avoided. This is because all service handlers (initiations and completions) will be executed on the service thread(s) which means that blocking in a handler will block all other handlers.
Whether that leads to Active Object seems to be a choice to me. Yes, a typical approach would be like Active Object (see e.g. boost::asio and Active Object), where operations queue for the data.
However, other approaches are viable and frequently seen, like e.g. the data being moving with their task(s) e.g. through a task flow.
whether the statement will be correct that for reading the shared data it is also necessary to use "requests" to Active Object, and also no mutexes?
Yes, synchronization needs to happen for shared state, regardless of the design pattern chosen (although some design pattern reduce sharing alltogether).
The Asio approach is using strands, which abstract away the scheduling from the control flow. This gives the service the option to optimize for various cases (e.g. continuation on the same strand, the case where there's only one service thread anyway etc.).
has anyone tried to evaluate the overhead of "requests" to Active Object, compared to using mutexes?
Lots of people and lots of times. Often are wary of trying Asio because "it uses locking internally". If you know what you're doing, throughput can be excellent, which goes for most patterns and industrial-strength frameworks.
Specific benchmarks depend heavily on specific implementation choices. I'm pretty sure you can find examples on github, blogs and perhaps even on this site.
(perhaps I missed something)
You're missing the fact that all IO objects are not thread-safe, which means that they themselves are shared data for any composed asynchronous operation (chain)

What is the essential distinction b/w producer/consumer & write/read multithread modeling?

When interviewed multithreaded modeling questions, there are two models that are frequently asked:
producer/consumer model
writer/reader model
My question is I can't catch the essential distinction between these two models.
What I understand for these two models is below:
For producer/consumer model, producers until some halting criteria, at which it signals a consumer and waits on another condition variable while consumers wait until an item have been produced and then proceed to "consume it," notifying the producers that another slot is ready for production.
For writer/reader model, there are three key parameters applied(ref ): uses one mutex, two conditional_variable and three integers.
readers - readers in the cv readerQ plus the reading reader
writers - writers in cv writerQ plus the writing writer
active_writers - the writer currently writing. can only be 1 or 0.
For me, both of them use "mutex and condition variables", the only difference is producer/consumer wait&notify on conditional variables, while read/write uses conditional variables and integers together to check whether satisfied lock/unclock conditions or not.
I know one distinction is that for producer/consumer model, both producer and consumer would change the shared data, but they are disconnected from each other. They just communicate through the shared data (usually indicated by a queue).There is no need for producers/consumers to know whether there is an available consumer/producer, i.e the status of both parties is not important. However, in write/read model, both parties need to trace other partie's status (i.e.available number). BUT, I believe this is not the essential distinction.
Besides above naive understanding, could anyone help to tell me what are the essential distinctions between these two models? Thank you very much!
Well, they are actually quite irrelevant:
In very high level:
Producer/Consumer aims at having someone (Producers) producing data for processing. Someone else (Consumers) are waiting for data. Once the data-to-be-processed arrive, it will be consumed by one (and only one) Consumer. Then the Consumer owns the data and perform its work.
Reader/Writer is a way to lock a shared resource / data. Everyone are working against the same piece of data. However, we knows that sometimes the data needs to be modified, hence we want to work as Writers (hence get a Writer lock). Sometimes the data simply needs to be read, hence we want to work as Readers. The whole purpose of Reader-Writer-lock is to avoid unnecessary contention as Readers are only doing read-only operation on the resource.

OpenCV -- safe to enqueue the same function with different data on same Stream?

I'm trying to optimise my OpenCV code to run on the GPU. The problem is that there seem to be conflicting opinions on what is and isn't safe to run on the GPU.
In the thread here: how to use gpu::Stream in OpenCV? , the answer states:
Currently, you may face problems if same operation is enqueued twice with different data to different streams.
I would be happy to solve this by enqueuing these operations onto the same stream. However, in the document here http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-itseez-opencv-webinar.pdf the author writes (slide 28):
Current limitation:
– Unsafe to enqueue the same GPU operation multiple times
And he shows an example in which he states that it's unsafe to enqueue the same operation on the same stream, too.
I am confused -- would it be safe for me to enqueue the same operation on the same Stream, or not? Does anyone know?
Intuitively, I would have thought it to be OK, since the same stream would, I imagine, run in serial, so the two functions would never try to concurrently access the same data. But I'd really like confirmation before I implement something.
Thank you for your help!

Application of Shared Read Locks

what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!
If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.

How to free a through istream blocked thread

i have created two classes. One for input reading (through an istream object) and parsing and the other one for processing the output of the parser.
There is one instance of each of those.
I have the parser running in a loop calling istream::get() and then creating commands for the second object based upon the input. These commands are then put on a queue which the second object processes in a separate thread.
Now it is quite obvious that I eventually need to be able to send a "Quit" command. Here the problem arises though: The "Quit" command needs to end the parsing loop as well but I can't find a way to signal the parser that it should quit because it is caught within istream::get().
I would need a way to wake it from that method, but I cannot find any...
I have thought of writing some sort of "termination sequence" to the istream object (which in this case is cin) by creating an ostream object from istream::rdbuf(). But that doesn't work - The badbit is set after the attempt to write to the buffer.
In another question at StackOverflow I saw the asio class of the Boost library mentioned, but I'd rather not depend on third party libraries.
Is there a way to wake the thread from istream::get() - i.e. is there a way to write to the istream buffer (maybe assuming it actually is cin) from within the program?
Another approach would be to kill the thread which I could find acceptable as well since there is no cleanup needed in that specific place. But how can this be done? (I'm relying on a POSIX thread implementation)
You will have to depend on something other than the standard iostream classes, because they don't provide select()-style behaviour.
Also, killing the thread is impossible with POSIX (and utterly broken in Windows). You can issue a cancellation request via pthread_cancel(), but in your case, it may be stuck in an un-cancellable system call. Of particular interest to you, read() may or may not be cancellable, depending on the environment. At least one environment says that a cancellation point may occur in read(), though admittedly it is a Windows POSIX layer. Also, Mac OS X, as recently as Leopard 10.5.1, had a broken read() implementation with respect to cancellability.
Once past this hurdle, you also have to consider the uneasy relationship between C++ destructors and pthread_cancel. Not all environments guarantee that destructors will be called, so you have to be extremely cautions when using pthread_cancel in C++ code.
In short, for interruptible I/O, use low-level I/O and select(): one fd for I/O, a second fd (created by pipe()) for signalling. Or, if you're brave, use AIO, but you're probably better off using a high level interface such as Boost.Asio.
Any chance this is implemented in .NET? - if so take a look at the Reactive Framework.
It provides a very elegant way of handling streams and especially cancelling them on the fly.
On top of this, you get a very extensible library of Linq extension for all sorts of stuff, like Buffering, Memoization, Zip ect..
We use it a lot for transforming (and parsing), modelling of streamed data.
Jeff from the Reative team has a couble of nice blogs about Streaming and Reative here: