Use seekoff to detect a fire hose istream? - c++

When processing input, it can be useful to know if you are being sent data in a firehose-like manner; that is, you get to see the data once and it's forever lost once it goes through the stream.
There are lots of examples of firehoses, such as console input, signal capture devices, data streams from network sockets, or named pipes.
C++ streams and streambufs are supposed to encapsulate input and output behavior, but I'm not sure there's a standard non-destructive way to detect whether the associated sequence lurking behind a streambuf you've been handed is seekable or not.
What if you called streambuf::pubseekoff (0, cur, std::ios_base::in) ? Can you rely on that?
Does the standard (or at least design sense) require such a call to seekoff to return streampos(-1) if the associated sequence is not physically seekable? Would it set failbit? Is it undefined behavior?
These streams don't have a viable concept of "absolute position" and so a seekoff probably should return -1 even if the seek does nothing.
But if that behavior is not required, some library designer might have decided to return something else from seekoff in such a case. Perhaps the number of characters fed through the firehose so far. Or gptr()-eback() (useless). Or a random number.
The streambuf type I have the most uncertainty about is basic_filebuf. You should certainly be able to seek within a disk file, but the OS can also encapsulate firehose type streams within the 'file' concept.

Related

Can "eof" be set in ofstream?

I could not find pretty much any information on this. Whether and if so, under what circumstances, can eofbit be set (meaning ofstream_instance.eof() is true )?
I am more interested in an independent ofstream, one that is not associated with an ifstream within some fstream, so that the "shared" eofbit can't be set by the ifstream (if something like that is possible).
If I simply write to a file and there is no space on disk or operation system does not provide another space for the writing, then I'd expect just only either failbit or badbit to be set, but reaching end of file while writing to it does not make sense to me. However no discussion on this I was able to find.
No. eof() returns the eofbit, which has no real meaning for an output stream with no associated input stream.
eofbit indicates that an input operation reached the end of an input sequence
[ios.types] / 3.1, Table 107
The actions that set eofbit are enumerated here, and they all act on input streams only.
We could imagine some weird implementation-specific scenario in which EOF (as opposed to some other error condition) would be hit while writing to a file - maybe there is a fixed-size file buffer we are writing to through some OS functions - but as far as I know the standard library abstractions do not deal with that case, and I have never seen or heard of such an API in the first place.

Clarify the difference between input/output stream and input/output buffer

EDITED
Input stream and Input buffer
From what I understand, when a key is pressed on the keyboard, the character go into the input stream (stdin) and get stored in the buffer. Then the scanf (in case of C) or cin(in case of C++) extracts the character from the buffer and places it in the main memory.
Output stream and Output buffer
Similarly, before characters are displayed on the screen, they are first stored in a buffer. Then the printf (in case of C) or cout(in case of C++) extracts the characters from the buffer (when it is full) and sends it to the output (stdout) stream.
Am I right? I've been stuck on this for quite a while now and my logic may be flawed.
Side note: scanf() is not the function to read input, see more here.
Now for your question: When asking about C (and, C++), e.g., the language, you should stay within the abstract concepts the language provides. So, don't start at the keyboard, that's far outside your program.
Start here: The operating system wants to deliver some input to you. Now, your C runtime provides a stream of input to your code. The stream is an abstract concept, it just means something you can continuously read from. This stream can be buffered or unbuffered, and if it's buffered, there are different modes (fully buffered or line buffered) available. You can configure all of that.
If your stream is unbuffered, this means the operating system has to wait until your code wants to read from the input stream. By default, your standard input stream is line buffered, which means your C runtime accepts the input immediately and puts it into a buffer until there is a newline -- your code calling input functions will get a result from that buffer.
Conceptually the same happens with output, just the other way around. If your output stream is for example line buffered, your C runtime will fill a buffer until there is a newline and deliver that whole line to the operating system for output. If the output is unbuffered, every single character is immediately passed to the operating system.
Disclaimer: this is still a lot simplified, but should be enough to start with.
As you ask about the term "buffer overflow" in the comments, mentioning gets() -- this is about a buffer inside your own code. With any input function that reads more than a single value/char, you have to provide your own buffer for it to store the result to. With gets(), there's no way to tell the function how large this buffer is, so it will just overflow it whenever the input available is too large. This is why gets() is ill-defined and meanwhile removed from the language C. It has nothing to do with buffers of your C runtime that are possibly used in the implementation of the I/O streams.

Handling unget and putback with file streams

I have implemented std::basic_streambuf derived wrapper around std::basic_filebuf which converts between encodings. Within this wrapper I use a single buffer for both input and output. The buffering technique comes from this article.
Now, the problem I can't figure out is this. My internal buffer is filled on calls to underflow. According to the article, when switching from input to output, the buffer should be put in a state of limbo. To do this, I need to unget the unread data in the buffer. Reading the docs and the source codes, unget and putback are not guaranteed to succeed. This will leave me with an invalid tellg pointer with the next input operation.
I'm not asking for somebody to write this for me, but I am asking advice as to how to manage ungetting data from std::basic_filebuf in a way that will not fail.
I think, the only sure way is to calculate the bytes that would be written to file and adjust the offset accordingly. It's not as simple as it sounds though. The filebuf may have an associated locale, unknown at compile time. I tried getting the facet and passing the data through it's out member, but it doesn't work. The previously read data may have a non default mbstate_t value, and some codecvt objects also write a BOM.
Basically, it's almost impossible to calculate the 'on file' length of a section of file data after it has passed through a codecvt.
I have tagged this question with 'c' since 'c'-file streams also work with buffers and also use get and put pointers. std::basic_filebuf is just a wrapper around a 'c'-file stream. Answers in 'c' are also applicable to this problem.
Does anybody have any suggestions as to how to implement unlimited unget on file streams?

What is a stream in C++?

I have been hearing about streams, more specifically file streams.
So what are they?
Is it something that has a location in the memory?
Is it something that contains data?
Is it just a connection between a file and an object?
The term stream is an abstraction of a construct that allows you to send or receive an unknown number of bytes. The metaphor is a stream of water. You take the data as it comes, or send it as needed. Contrast this to an array, for example, which has a fixed, known length.
Examples where streams are used include reading and writing to files, receiving or sending data across an external connection. However the term stream is generic and says nothing about the specific implementation.
IOStreams are a front-end interface (std::istream, std::ostream) used to define input and output functions. The streams also store formatting options, e.g., the base to use for integer output and hold a std::locale object for all kind of customization. Their most important component is a pointer to a std::streambuf which defines how to access a sequence of characters, e.g., a file, a string, an area on the screen, etc. Specifically for files and strings special stream buffers are provided and classes derived from the stream base classes are provided for easier creation. Describing the entire facilities of the IOStreams library can pretty much fill an entire book: In C++ 2003 about half the library section was devoted to stream related functionality.
Stream is linear queue that connects a file to the program and maintain the flow of data in both direction. Here the source is any file, I/O device, Hard disk, CD/DVD etc.
Basically stream is if two type 1.Text Stream 2.Binary stream
Text Stream : It is a sequence of character arranges in line and each line terminated by new line (unix).
Binary Stream: It is data as it is coded internally in computer's main memory, without any modification.
File system is designed to work with a wide variety of devices, including terminals, disk drives, tape drives etc. Even though each device is different, file system transforms each into a logical device called stream. Streams are device independent so same function can be used to write a disk file and a tape file. In more technical term stream provides a abstraction between programmer and actual device being used.

See if part of data is lazy in clojure

Is there a function in clojure that checks whether data contains some lazy part?
Background:
I'm building a small server in clojure. Each connection has a state, an input-stream and an output-stream
The server reads a byte from an input-stream, and based on the value calls one of several functions (with the state and the input and output stream as parameters). The functions can decide to read more from the input-stream, write a reply to the output stream, and return a state. This part loops.
This will all work fine, as long as the state doesn't contain any lazy parts. If there is some lazy part in the state, that may, when it gets evaluated (later, during another function), start reading from the input-stream and writing to the output-stream.
So basically I want to add a post-condition to all of these functions, stating that no part of the returned state may be lazy. Is there any function that checks for lazy sequences. I think it would be easy to check whether the state itself is a lazy sequence, but I want to check for instance whether the state has a vector that contains a hash-map, one of whose values is lazy.
it's easier to ensure that it is not lazy by forcing evaluation with doall
I had this problem in a stream processing crypto app a couple years back and tried several ways until I finally accepted my lazy side and wrapped the input streams in a lazy sequence that closed in input streams when no more data was available. effectively separating concern for closing the streams from the concern over what the streams contained. The state you are tracking sounds a little more sophisticated than open vs closed though you may be able to separate it in a similar manner.
You could certainly force evaluation with doall as Arther wisely suggests.
However I would recommend instead refactoring to solve the real problem, which is that your handler function has side effects (reading from input, writing to output).
You could instead turn this into a pure function if you did the following:
Wrap the input stream as a lazy sequence
use [input-sequence state] as input to your handler function
use [list-of-writes new-state rest-of-input-sequence] as output, where list of writes is whatever needs to be subsequently written to the output stream
If you do this, your handler function is pure, you just need to run it in a simple loop (sending the list-of-writes to the output stream on each iteration) until all input is consumed and/or some other termination condition is reached.