C++ Fastest way to parse through a file - c++

I have a file which I have opened using:
ifstream ifile(FilePath)
The file contains, say 10 lines of data and each line contains an evenly-incrementing number of comma separated values (like a pyramid). So first line has 1 value, second line has 2 values and so on....
I wanted to do the following, all within one function (whilst traversing the file char array just the once):
-Every time I encounter a newline character, I can increment a parameter passed in by value (which, when the function exits, I have the number of lines in the file).
-I also wanted to store each line of the file in an array. What would be the best way to "glue" together all the characters between newline characters?
I'd prefer to use statically-allocated arrays, but the problem is I only know the size of the array once I have performed step 1 (counting how many new line characters there are). Would it be quicker to perform a double-parse for this (one parse to count how many lines, then use that value to allocate a static array) or single parse, but insert into a dynamic array?
The emphasis on this is to write fast code (so not being OO-friendly is not a concern)
Apologies if I am asking a lot, hopefully you can see I have given this some thought.
EDIT, example file:
a
b,c
d,e,f,g,h
j,k,l,m,n,o,p
From this file I would want to achieve:
Knowledge that there are 4 lines in the file
A non-dynamic array containing each line
The number of elements in the second line

There are plenty of examples in existing posts on how to do this.
if you want to use ifstream to read in the file once, you can do something like this:
std::ifstream in("myfile");
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str());

Related

Read multiple strings input using fgets()

I need to
Take a command line argument giving number of strings (say N).
Call a function to take N input lines from user (using fgets)
Store them in an array of pointers.
i.e. char *input_lines[MAX_LINES];
All newlines from the lines should be removed
How can I achieve this?

How to read number of characters stored in input stream buffer

I have a quick question - how can I possibly write something in console window to std::cin without assigning it to a string or char[]? And then how to read the number of characters that are stored in buffer?
Let's say that I want to create an array of char, but it shall has the size of the input length. I might create a buffer or a variable of big size to store the input and then read its length, allocate memory to my char array and copy it. But let's also say that I am a purist and I don't want any additional (other than stream buffer) memory used. Is there a possibility to access std::cin buffer, read the number of characters stored and copy them to my array? I was trying to find the answer for several hours, reading cpp reference but I really couldn't find solution. I couldn't even find if there is a possibility to write something to std::cin buffer without assigning it to a variable, aka executing cin >> variable. I would appreciate any help, also if you have alternative solutions for this problem.
Also, does somebody know where can I find information about how buffers work (means where computer stores input from keyboard, how it is processed and how iostream works with computer to extract data from this).
Many thanks!
First of all in order for the input buffer to be filled you need to do some sort of read operation. The read operation may not necessary put what is read in to a variable. For example, cin.peek() may block until the user enters some value and returns the next character that will be read from the buffer without extracting it or you could also use cin.get along with cin.putback.
You can then use the streambuf::in_avail function to determine how many characters are in the input buffer including a new line character.
With that in mind you could do something like this:
char ch;
cin.get(ch);//this will block until some data is entered
cin.putback(ch);//put back the character read in the previous operation
streamsize size=cin.rdbuf()->in_avail();//get the number of character in the buffer available for reading(including spaces and new line)
if(size>0)
{
char* arr=new char[size];//allocate the size of the array(you might want to add one more space for null terminator character)
for(streamsize i=0;i<size;i++)
cin.get(arr[i]);//copy each character, including spaces and newline, from the input buffer to the array
for(streamsize i=0;i<size;i++)
cout<<arr[i];//display the result
}
That being said, i am sure you have a specific reason for doing this, but i don't think it is a good idea to do I/O like this. If you don't want to estimate the size of the character array you need for input then you can always use a std::string and read the input instead.

How to use go the m th line and n th character of a file??

If I want to insert or copy something from the the m th line and n th character in a file, what should I do? Is there a way better than using getline for m times and seekp? Thanks.
Is there a way better than using getline for m times and seekp?
Not really! Lines aren't "special" at the operating system level; they're just parts of a text file separated by the newline character. The only way to get to line m of a text file is to read through all of the file until you've seen m - 1 newlines. Your C++ library's getline() function is likely to have a pretty efficient implementation of that operation already, so you're probably best off just using that.
If your application needs to seek to specific lines of a large file many times during a single run, it may make sense to read in the whole file into a data structure at startup (e.g, an array of structures, each one representing a single line of text); once you've done this, seeking to a specific line is as easy as an array lookup. But if you only need to seek to a specific line once, that's not necessary.
A more memory-efficient approach for repeated seeks in larger files may be to record the file offset for each line number as you encounter it, so that you can easily return to a given line without starting over from the beginning. Again, though, this is only necessary if seeks will be repeated many times.

Fast way to get two first and last characters of a string from the input

I need to read a string from the input
a string has its length from 2 letters up to 1000 letters
I only need 2 first letters, 2 last letters, and the size of the entire string
Here is my way of doing it, HOWEVER, I do believe there is a smarter way, which is why I am asking this question. Could you please tell me, unexperienced and new C++ programmer, what are possible ways of doing this task better?
Thank you.
string word;
getline(cin, word);
// results - I need only those 5 numbers:
int l = word.length();
int c1 = word[0];
int c2 = word[1];
int c3 = word[l-2];
int c4 = word[l-1];
Why do I need this? I want to encode a huge number of really long strings, but I figured out I really need only those 5 values I mentioned, the rest is redundant. How many words will be loaded? Enough to make this part of code worth working on :)
I will take you at your word that this is something that is worth optimizing to an extreme. The method you've shown in the question is already the most straight-forward way to do it.
I'd start by using memory mapping to map chunks of the file into memory at a time. Then, loop through the buffer looking for newline characters. Take the first two characters after the previous newline and the last two characters before the one you just found. Subtract the address of the second newline from the first to get the length of the line. Rinse, lather, and repeat.
Obviously some care will need to be taken around boundaries, where one newline is in the previous mapped buffer and one is in the next.
The first two letters are easy to obtain and fast.
The issue is with the last two letters.
In order to read a text line, the input must be scanned until it finds an end-of-line character (usually a newline). Since your text lines are variable, there is no fast solution here.
You can mitigate the issue by reading in blocks of data from the file into memory and searching memory for the line endings. This avoids a call to getline, and it avoids a double search for the end of line (once by getline and the other by your program).
If you change the input to be fixed with, this issue can be sped up.
If you want to optimize this (although I can't imagine why you would want to do that, but surely you have your reasons), the first thing to do is to get rid of std::string and read the input directly. That will spare you one copy of the whole string.
If your input is stdin, you will be slowed down by the buffering too. As it has already been said, the best speed woukd be achieved by reading big chunks from a file in binary mode and doing the end of line detection yourself.
At any rate, you will be limited by the I/O bandwidth (disk access speed) in the end.

Reading line X until line Y from file in C++

I have a relatively simple question. Say I have a file but I only want to access line X of the file until line Y, whats the easiest way of doing that?
I know I can read in the lines one by one keeping count, until I reach the lines that I actually need, but is there a better more elegant solution?
Thanks.
In C++, no, not really (well, not in any language I'm familiar with, really).
You have to start at the start of the file so you can figure where line X starts (unless it's a fixed-record-length file but that's unlikely for text).
Similarly, you have to do that until you find the last line you're interested in.
You can read characters instead of lines if you're scared of buffer overflow exploits, or you can read in fixed-size block and count the newlines for speed but it all boils down to reading and checking every character (by your code explicitly or the language libraries implicitly) to count the newlines.
You can use istream::ignore() to avoid buffering the unneeded input.
bool skip_lines(std::istream &is, std::streamsize n)
{
while(is.good() && n--) {
is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return is.good();
}
Search for \n X times, start 'reading' (whatever processing that entails) until you reach the Y-X \n or EOF. Assuming unix style new lines.
Since you have to ensure end line characters inside each line in order to be able to count line, you do still need to iterate over you file. The only optimization I can think about is not read the file line by line, but buffer it and then iterate counting the lines.
Using C or C++, there exist some functions that you can use to skip a specified number of byte within a file (fseek in the stdio and seekp with istreams and ostreams). However, you cannot specify a given number of lines since each line might have a variable number of characters and therefore it's impossible to calculate the correct offset. A file is not some kind of array where each line occupies a "row": you rather have to see it as a continuous memory space (not talking hardware here thought...)