How to free a through istream blocked thread

How to free a through istream blocked thread - c++

i have created two classes. One for input reading (through an istream object) and parsing and the other one for processing the output of the parser.
There is one instance of each of those.
I have the parser running in a loop calling istream::get() and then creating commands for the second object based upon the input. These commands are then put on a queue which the second object processes in a separate thread.
Now it is quite obvious that I eventually need to be able to send a "Quit" command. Here the problem arises though: The "Quit" command needs to end the parsing loop as well but I can't find a way to signal the parser that it should quit because it is caught within istream::get().
I would need a way to wake it from that method, but I cannot find any...
I have thought of writing some sort of "termination sequence" to the istream object (which in this case is cin) by creating an ostream object from istream::rdbuf(). But that doesn't work - The badbit is set after the attempt to write to the buffer.
In another question at StackOverflow I saw the asio class of the Boost library mentioned, but I'd rather not depend on third party libraries.
Is there a way to wake the thread from istream::get() - i.e. is there a way to write to the istream buffer (maybe assuming it actually is cin) from within the program?
Another approach would be to kill the thread which I could find acceptable as well since there is no cleanup needed in that specific place. But how can this be done? (I'm relying on a POSIX thread implementation)

You will have to depend on something other than the standard iostream classes, because they don't provide select()-style behaviour.
Also, killing the thread is impossible with POSIX (and utterly broken in Windows). You can issue a cancellation request via pthread_cancel(), but in your case, it may be stuck in an un-cancellable system call. Of particular interest to you, read() may or may not be cancellable, depending on the environment. At least one environment says that a cancellation point may occur in read(), though admittedly it is a Windows POSIX layer. Also, Mac OS X, as recently as Leopard 10.5.1, had a broken read() implementation with respect to cancellability.
Once past this hurdle, you also have to consider the uneasy relationship between C++ destructors and pthread_cancel. Not all environments guarantee that destructors will be called, so you have to be extremely cautions when using pthread_cancel in C++ code.
In short, for interruptible I/O, use low-level I/O and select(): one fd for I/O, a second fd (created by pipe()) for signalling. Or, if you're brave, use AIO, but you're probably better off using a high level interface such as Boost.Asio.

Any chance this is implemented in .NET? - if so take a look at the Reactive Framework.
It provides a very elegant way of handling streams and especially cancelling them on the fly.
On top of this, you get a very extensible library of Linq extension for all sorts of stuff, like Buffering, Memoization, Zip ect..
We use it a lot for transforming (and parsing), modelling of streamed data.
Jeff from the Reative team has a couble of nice blogs about Streaming and Reative here:

Related

BOOST Asio async_write_some vs asio::async_write , force a single write operation

Some special files have semantics associated with individual writes. (For example, FunctionFS (USB gadgets in userspace) associates a single write to a sequence of USB packets within a single USB transfer. Two writes will never be merged into a single USB packet. Hence the first write may end with a short packet.)
I therefore don't want to use asio::async_write, because it allows zero or more calls to the underlying async_write_some of the passed stream. However if I use stream.async_write_some directly, it allows that not all the data will have been written.
What I need is the guarantee that the streams's async_write_some (of posix::basic_stream_descriptor) passes all buffers in the sequence to the vectored write system call, and write less data only when the system call decides to do so.
How can I achieve that? Am I on the completely wrong way?

What would you use in terms of POSIX API? Likely it ends up the exact same underlying API that ASIO hooks into. So, if that API behaves in the way you describe you can expect ASIO to behave in the same way (not resulting in partial completions).
The only place where I readily know ASIO might have opinions on buffer division/operation is the other way around where SSL buffer-sequences may be combined prior to writes for performance reasons.
It would help if we knew both interfaces you're trying to contrast/impact analyze. E.g. there will be a difference between using asio::posix::stream_descriptor or asio::serial_port for obvious reasons. There might be platform differences at play, but I assume the platform is Linux because of FunctionFS.

How are golang select statements implemented?

In particular, I have some blocking queues in C++, and I want to wait until any one of them has some item I can pop.
The only mechanism I can think of is to spawn a separate thread for each queue that pops from its input queue and feeds into a master queue that the original thread can wait on.
It seems kind of resource heavy to spawn N new threads and then kill them all every time I want to pop from a group of queues.
Does Golang implement some more elegant mechanism that I might be able to implement in my own C++ code?

I wouldn't necessarily say that Go's select implementation is elegant, but I think it's beautiful in its own way and it's fairly optimized.
it special-handles selects with a single non-default case
it permutes the order in which cases are evaluated in order to avoid deterministic starvation
it does an optimistic first pass over the cases looking for one that's already satisfied
it enqueues on the internal sender/receiver queues of each channel using many internal, known only to the runtime mechanisms
it uses sudogs which are like lightweight goroutine references (there can be many sudogs for the same goroutine) that allow quick jumping into the goroutine stack
it uses the scheduler's gopark mechanism to block itself which allows efficient unparking on signal
when signalled and unparked, it immediately goes into the triggered case handler function by manipulating the select goroutine's program counter
There's no single overarching groundbreaking idea in the implementation, but you would really appreciate how each step was carefully tinkered with so that it's fast, efficient and well integrated with concept of channels. Because of that, it's not very easy to reimplement Go's select statement in another language, unless you at least have the chan construct first.
You can take a look at the reimplementations available in other languages, where the idea was redone with various degrees of similarity and effectiveness. If I had to reimplement select from scratch in another language, I would probably first try a single shared semaphore and, in case that didn't work, switch to a cruder, sleep-a-little-then-check-in-random-order strategy.

Golang's select statement is inspired from the C select function (see the GNU libc documentation), that is used for waiting I/O on a set of file descriptors. If your queues communicate using a socket or a pipe, you may use it.

Periodically call a C function without manually creating a thread

I have implemented a WebSocket handler in C++ and I need to send ping messages once in a while. However, I don't want to start one thread per socket/one global poll thread which only calls the ping function but instead use some OS functionality to call my timer function. On Windows, there is SetTimer but that requires a working message loop (which I don't have.) On Linux there is timer_create, which looks better.
Is there some portable, low-overhead method to get a function called periodically, ideally with some custom context? I.e. something like settimer (const int millisecond, const void* context, void (*callback)(const void*))?
[Edit] Just to make this a bit clearer: I don't want to have to manage additional threads. On Windows, I guess using CreateThreadpoolTimer on the system thread pool will do the trick, but I'm curious to hear if there is a simpler solution and how to port this over to Linux.

If you are intending to go cross-platform, I would suggest you use a cross platform event library like libevent.
libev is newer, however currently has weak Win32 support.

If you use sockets, you can use select, to wait sockets events with timeout,
and in this loop calc time and call callback in suitable time.

If you are looking for a timer that will not require an additional thread, let you do your work transparently and then call the timer function at the appropriate time in the same thread by pre-emptively interrupting your application, then there is no such portable thing.
The first reason is that it's downright dangerous. That's like writing a multi-threaded application with absolutely no synchronization. The second reason is that it is extremely difficult to have good semantics in multi-threaded applications. Which thread should execute the timer callback?
If you're writing a web-socket handler, you are probably already writing a select()-based loop. If so, then you can just use select() with a short timeout and check the different connections for which you need to ping each peer.

Whenever you have asynchronous events, you should have an event loop. This doesn't need to be some system default one, like Windows' message loop. You can create your own. But you should be using it.
The whole point about event-based programming is that you are decoupling your code handling to deal with well-defined functional fragments based on these asynchronous events. Without an event loop, you are condemning yourself to interleaving code that get's input and produces output based on poorly defined "states" that are just fragments of procedural code.
Without a well-defined separation of states using an event-based design, code quickly becomes unmanageable. Because code pauses inside procedures to do input tasks, you have lifetimes of objects that will not span entire procedure scopes, and you will begin to write if (nullptr == xx) in various places that access objects created or destroyed based on events. Dispatch becomes comnbinatorially complex because you have different events expected at each input point and no abstraction.
However, simply using an event loop and dispatch to state machines, you've decreased handling complexity to basic management of handlers (O(n) handlers versus O(mn) branch statements with n types of events and m states). You decouple handling but still allow for functionality to change depending on state. But now these states are well-defined using state classes. And new states can be added if the requirements of the product change.
I'm just saying, stop trying to avoid an event loop. It's a software pattern for very important reasons, all of which have to do with producing professional, reusable, scalable code. Use Boost.ASIO or some other framework for cross platform capabilities. Don't get in the habit of doing it wrong just because you think it will be less of an effort. In the end, even if it's not a professional project that needs maintenance long term, you want to practice making your code professional so you can do something with your skills down the line.

Asynchronous thread-safe logging in C++

I'm looking for a way to do asynchronous and thread-safe logging in my C++ project, if possible to one file. I'm currently using cerr and clog for the task, but since they are synchronous, execution shortly pauses every time something is logged. It's a relatively graphics-heavy app, so this kind of thing is quite annoying.
The new logger should use asynchronous I/O to get rid of these pauses. Thread-safety would also be desirable as I intend to add some basic multithreading soon.
I considered a one-file-per-thread approach, but that seemed like it would make managing the logs a nightmare. Any suggestions?

I noticed this 1 year+ old thread. Maybe the asynchronous logger I wrote could be of interest.
http://www.codeproject.com/KB/library/g2log.aspx
G2log uses a protected message queue to forward log entries to a background worker that the slow disk accesses.
I have tried it with a lock-free queue which increased the average time for a LOG call but decreased the worst case time, however I am using the protected queue now as it is cross-platform. It's tested on Windows/Visual Studio 2010 and Ubuntu 11.10/gcc4.6.
It's released as public domain so you can do with it what you want with no strings attached.

This is VERY possible and practical. How do I know? I wrote exactly that at my last job. Unfortunately (for us), they now own the code. :-) Sadly, they don't even use it.
I intend on writing an open source version in the near future. Meanwhile, I can give you some hints.
I/O manipulators are really just function names. You can implement them for your own logging class so that your logger is cout/cin compatible.
Your manipulator functions can tokenize the operations and store them into a queue.
A thread can be blocked on that queue waiting for chunks of log to come flying through. It then processes the string operations and generates the actual log.
This is intrinsically thread compatible since you are using a queue. However, you still would want to put some mutex-like protection around writing to the queue so that a given log << "stuff" << "more stuff"; type operation remains line-atomic.
Have fun!

I think the proper approach is not one-file-per-thread, but one-thread-per-file. If any one file (or resource in general) in your system is only ever accessed by one thread, thread-safe programming becomes so much easier.
So why not make Logger a dedicated thread (or several threads, one per file, if you're logging different things in different files), and in all other threads, writing to log would place the message on the input queue in the appropriate Logger thread, which would get to it after it's done writing the previous message. All it takes is a mutex to protect the queue from adding an event while Logger is reading an event, and a condvar for Logger to wait on when its queue is empty.

Have you considered using a log library.
There are several available, I discovered Pantheios recently and it really seems to be quite incredible.
It's more a front-end logger, you can customize which system is used. It can interact with ACE or log4cxx for example and it seems really easy to use and configure. The main advantage is that it use typesafe operators, which is always great.
If you just want a barebone logging library:
ACE
log4c*
Boost.Log
Pick any :)
I should note that it's possible to implement lock-free queues in C++ and that they are great for logging.

I had the same issue and I believe I have found the perfect solution. I present to you, a single-header library called loguru: https://github.com/emilk/loguru
It's simple to use, portable, configurable, macro-based and by default doesn't #include anything (for that sweet, sweet compilation times).

Cross-platform (linux/Win32) nonblocking C++ IO on stdin/stdout/stderr

I'm trying to find the best solution for nonblocking IO via stdin/stdout with the following characteristics:
As long as there is enough data, read in n-sized chunks.
If there's not enough data, read in a partial chunk.
If there is no data available, block until there is some (even though it may be smaller than n).
The goal is to allow efficient transfer for large datasets while processing 'control' codes immediately (instead of having them linger in some partially-filled buffer somewhere).
I know I can achieve this by using threads and a istream::get() loop, or by writing a bunch of platform-specific code (since you can't select() on file handles in windows)... ((There is also istream::readsome() which seems promising, but the only results I can find on google were of people saying it doesn't actually work well.))
Since I haven't done much coding w/ these APIs, perhaps there is a better way.

Maybe boost::asio can be of use for you?

I used the threads and platform specific code. See my answer to another question. I was able to put the OS-specific stuff in inputAvailable() (Linux uses select, Windows just returns true). I could then use WaitForSingleObject() with a timeout on Windows to try to let the thread complete, then TerminateThread() to kill it. Very ugly, but the team didn't want to use this bit of boost.

I did something similar to jwhitlock ... I ended up with a StdinDataIO class that wraps around the appropriate OS-specific implementation(*) so that the rest of my program can select() on the file descriptor StdinDataIO provides, remaining blissfully ignorant of Windows' limitations regarding stdin. Have a look here and here if you like, the code is all open-source/BSD-licensed.
(*) the implementation is a simple pass-through for Linux/MacOSX, and in Windows it's a rather complex process of setting up a child thread to read from stdin and send the data it receives over a socket back to the main thread... not very elegant, but it works.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js