Skip lines in std::istream - c++

I'm using std::getline() to read lines from an std::istream-derived class, how can I move forward a few lines?
Do I have to just read and discard them?

No, you don't have to use getline
The more efficient way is ignoring strings with std::istream::ignore
for (int currLineNumber = 0; currLineNumber < startLineNumber; ++currLineNumber){
if (addressesFile.ignore(numeric_limits<streamsize>::max(), addressesFile.widen('\n'))){
//just skipping the line
} else
return HandleReadingLineError(addressesFile, currLineNumber);
}
HandleReadingLineError is not standart but hand-made, of course.
The first parameter is maximum number of characters to extract. If this is exactly numeric_limits::max(), there is no limit:
Link at cplusplus.com: std::istream::ignore
If you are going to skip a lot of lines you definitely should use it instead of getline: when i needed to skip 100000 lines in my file it took about a second in opposite to 22 seconds with getline.

Edit: You can also use std::istream::ignore, see https://stackoverflow.com/a/25012566/492336
Do I have to use getline the number of lines I want to skip?
No, but it's probably going to be the clearest solution to those reading your code. If the number of lines you're skipping is large, you can improve performance by reading large blocks and counting newlines in each block, stopping and repositioning the file to the last newline's location. But unless you are having performance problems, I'd just put getline in a loop for the number of lines you want to skip.

Yes use std::getline unless you know the location of the newlines.
If for some strange reason you happen to know the location of where the newlines appear then you can use ifstream::seekg first.
You can read in other ways such as ifstream::read but std::getline is probably the easiest and most clear solution.

Related

Is seekg or ignore more efficient?

I'm using C++ to write a very time-sensitive application, so efficiency is of the utmost importance.
I have a std::ifstream, and I want to jump a specific amount of characters (a.k.a. byte offset, I'm not using wchar_t) to get to a specific line instead of using std::getline to read every single line because it is too inefficient for me.
Is it better to use seekg or ignore to skip a specified number of characters and start reading from there?
size_t n = 100;
std::ifstream f("test");
f.seekg(n, std::ios_base::beg);
// vs.
f.ignore(n);
Looking at cplusplus.com for both functions, it seems like ignore will use sbumpc or sgetc to skip the requested amount of characters. This means that it works even on streams that do not natively support skipping (which ifstream does), but it also processes every single byte.
seekg on the other hand uses pubseekpos or pubseekoff, which is implementation defined. For files, this should directly skip to the desired position without processing the bytes up to it.
I would expect seekg to be much more efficent, but as others said: doing your own tests with a big file would be the best way to go for you.

Fast way to get two first and last characters of a string from the input

I need to read a string from the input
a string has its length from 2 letters up to 1000 letters
I only need 2 first letters, 2 last letters, and the size of the entire string
Here is my way of doing it, HOWEVER, I do believe there is a smarter way, which is why I am asking this question. Could you please tell me, unexperienced and new C++ programmer, what are possible ways of doing this task better?
Thank you.
string word;
getline(cin, word);
// results - I need only those 5 numbers:
int l = word.length();
int c1 = word[0];
int c2 = word[1];
int c3 = word[l-2];
int c4 = word[l-1];
Why do I need this? I want to encode a huge number of really long strings, but I figured out I really need only those 5 values I mentioned, the rest is redundant. How many words will be loaded? Enough to make this part of code worth working on :)
I will take you at your word that this is something that is worth optimizing to an extreme. The method you've shown in the question is already the most straight-forward way to do it.
I'd start by using memory mapping to map chunks of the file into memory at a time. Then, loop through the buffer looking for newline characters. Take the first two characters after the previous newline and the last two characters before the one you just found. Subtract the address of the second newline from the first to get the length of the line. Rinse, lather, and repeat.
Obviously some care will need to be taken around boundaries, where one newline is in the previous mapped buffer and one is in the next.
The first two letters are easy to obtain and fast.
The issue is with the last two letters.
In order to read a text line, the input must be scanned until it finds an end-of-line character (usually a newline). Since your text lines are variable, there is no fast solution here.
You can mitigate the issue by reading in blocks of data from the file into memory and searching memory for the line endings. This avoids a call to getline, and it avoids a double search for the end of line (once by getline and the other by your program).
If you change the input to be fixed with, this issue can be sped up.
If you want to optimize this (although I can't imagine why you would want to do that, but surely you have your reasons), the first thing to do is to get rid of std::string and read the input directly. That will spare you one copy of the whole string.
If your input is stdin, you will be slowed down by the buffering too. As it has already been said, the best speed woukd be achieved by reading big chunks from a file in binary mode and doing the end of line detection yourself.
At any rate, you will be limited by the I/O bandwidth (disk access speed) in the end.

How to remove the first N lines of a std::istream (e.g. std::stringstream)? [duplicate]

I'm using std::getline() to read lines from an std::istream-derived class, how can I move forward a few lines?
Do I have to just read and discard them?
No, you don't have to use getline
The more efficient way is ignoring strings with std::istream::ignore
for (int currLineNumber = 0; currLineNumber < startLineNumber; ++currLineNumber){
if (addressesFile.ignore(numeric_limits<streamsize>::max(), addressesFile.widen('\n'))){
//just skipping the line
} else
return HandleReadingLineError(addressesFile, currLineNumber);
}
HandleReadingLineError is not standart but hand-made, of course.
The first parameter is maximum number of characters to extract. If this is exactly numeric_limits::max(), there is no limit:
Link at cplusplus.com: std::istream::ignore
If you are going to skip a lot of lines you definitely should use it instead of getline: when i needed to skip 100000 lines in my file it took about a second in opposite to 22 seconds with getline.
Edit: You can also use std::istream::ignore, see https://stackoverflow.com/a/25012566/492336
Do I have to use getline the number of lines I want to skip?
No, but it's probably going to be the clearest solution to those reading your code. If the number of lines you're skipping is large, you can improve performance by reading large blocks and counting newlines in each block, stopping and repositioning the file to the last newline's location. But unless you are having performance problems, I'd just put getline in a loop for the number of lines you want to skip.
Yes use std::getline unless you know the location of the newlines.
If for some strange reason you happen to know the location of where the newlines appear then you can use ifstream::seekg first.
You can read in other ways such as ifstream::read but std::getline is probably the easiest and most clear solution.

Is there any way to read characters that satisfy certain conditions only from stdin in C++?

I am trying to read some characters that satisfy certain condition from stdin with iostream library while leave those not satisfying the condition in stdin so that those skipped characters can be read later. Is it possible?
For example, I want characters in a-c only and the input stream is abdddcxa.
First read in all characters in a-c - abca; after this input finished, start read the remaining characters dddx. (This two inputs can't happen simultaneously. They might be in two different functions).
Wouldn't it be simpler to read everything, then split the input into the two parts you need and finally send each part to the function that needs to process it?
Keeping the data in the stdin buffer is akin to using globals, it makes your program harder to understand and leaves the risk of other code (or the user) changing what is in the buffer while you process it.
On the other hand, dividing your program into "the part that reads the data", "the part that parses the data and divides the workload" and the "part that does the work" makes for a better structured program which is easy to understand and test.
You can probably use regex to do the actual split.
What you're asking for is the putback method (for more details see: http://www.cplusplus.com/reference/istream/istream/putback/). You would have to read everything, filter the part that you don't want to keep out, and put it back into the stream. So for instance:
cin >> myString;
// Do stuff to fill putbackBuf[] with characters in reverse order to be put back
pPutbackBuf = &putbackBuf[0];
do{
cin.putback(*(pPutbackBuf++));
while(*pPutbackBuf);
Another solution (which is not exactly what you're asking for) would be to split the input into two strings and then feed the "non-inputted" string into a stringstream and pass that to whatever function needs to do something with the rest of the characters.
What you want to do is not possible in general; ungetc and putback exist, but they're not guaranteed to work for more than one character. They don't actually change stdin; they just push back on an input buffer.
What you could do instead is to explicitly keep a buffer of your own, by reading the input into a string and processing that string. Streams don't let you safely rewind in many cases, though.
No, random access is not possible for streams (except for fstream an stringstream). You will have to read in the whole line/input and process the resulting string (which you could, however, do using iostreams/std::stringstream if you think it is the best tool for that -- I don't think that but iostreams gurus may differ).

Reading line X until line Y from file in C++

I have a relatively simple question. Say I have a file but I only want to access line X of the file until line Y, whats the easiest way of doing that?
I know I can read in the lines one by one keeping count, until I reach the lines that I actually need, but is there a better more elegant solution?
Thanks.
In C++, no, not really (well, not in any language I'm familiar with, really).
You have to start at the start of the file so you can figure where line X starts (unless it's a fixed-record-length file but that's unlikely for text).
Similarly, you have to do that until you find the last line you're interested in.
You can read characters instead of lines if you're scared of buffer overflow exploits, or you can read in fixed-size block and count the newlines for speed but it all boils down to reading and checking every character (by your code explicitly or the language libraries implicitly) to count the newlines.
You can use istream::ignore() to avoid buffering the unneeded input.
bool skip_lines(std::istream &is, std::streamsize n)
{
while(is.good() && n--) {
is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return is.good();
}
Search for \n X times, start 'reading' (whatever processing that entails) until you reach the Y-X \n or EOF. Assuming unix style new lines.
Since you have to ensure end line characters inside each line in order to be able to count line, you do still need to iterate over you file. The only optimization I can think about is not read the file line by line, but buffer it and then iterate counting the lines.
Using C or C++, there exist some functions that you can use to skip a specified number of byte within a file (fseek in the stdio and seekp with istreams and ostreams). However, you cannot specify a given number of lines since each line might have a variable number of characters and therefore it's impossible to calculate the correct offset. A file is not some kind of array where each line occupies a "row": you rather have to see it as a continuous memory space (not talking hardware here thought...)