parse an unknown size string - c++

I am trying to read an unknown size string from a text file and I used this code :
ifstream inp_file;
char line[1000] ;
inp_file.getline(line, 1000);
but I don't like it because it has a limit (even I know it's very hard to exceed this limit)but I want to implement a better code which reallocates according to the size of the coming string .

The following are some of the available options:
istream& getline ( istream& is, string& str, char delim );
istream& getline ( istream& is, string& str );

One of the usual idioms for reading unknown-size inputs is to read a chunk of known size inside a loop, check for the presence of more input (i.e. verify that you are not at the end of the line/file/region of interest), and extend the size of your buffer. While the getline primitives may be appropriate for you, this is a very general pattern for many tasks in languages where allocation of storage is left up to the programmer.

Maybe you could look at using re2c which is a flexible scanner for parsing the input stream? In that way you can pull in any sized input line without having to know in advance... for example using a regex notation
^.+$
once captured by re2c you can then determine how much memory to allocate...

Have a look on memory-mapped files in boost::iostreams.

Maybe it's too late to answer now, but just for documentation purposes, another way to read an unknown sized line would be to use a wrapper function. In this function, you use fgets() using a local buffer.
Set last character in the buffer to '\0'
Call fgets()
Check the last character and see if it's still '\0'
If it's not '\0' and it's not '\n', implies not finished reading a line yet. Allocate a new buffer and copy the data into this new buffer and go back to step (1) above.
If there is already an allocated buffer, call realloc() to make it bigger. Otherwise, you are done. Return the data in an allocated buffer.
This was a tip given in my algorithms lecture.

Related

How to read number of characters stored in input stream buffer

I have a quick question - how can I possibly write something in console window to std::cin without assigning it to a string or char[]? And then how to read the number of characters that are stored in buffer?
Let's say that I want to create an array of char, but it shall has the size of the input length. I might create a buffer or a variable of big size to store the input and then read its length, allocate memory to my char array and copy it. But let's also say that I am a purist and I don't want any additional (other than stream buffer) memory used. Is there a possibility to access std::cin buffer, read the number of characters stored and copy them to my array? I was trying to find the answer for several hours, reading cpp reference but I really couldn't find solution. I couldn't even find if there is a possibility to write something to std::cin buffer without assigning it to a variable, aka executing cin >> variable. I would appreciate any help, also if you have alternative solutions for this problem.
Also, does somebody know where can I find information about how buffers work (means where computer stores input from keyboard, how it is processed and how iostream works with computer to extract data from this).
Many thanks!
First of all in order for the input buffer to be filled you need to do some sort of read operation. The read operation may not necessary put what is read in to a variable. For example, cin.peek() may block until the user enters some value and returns the next character that will be read from the buffer without extracting it or you could also use cin.get along with cin.putback.
You can then use the streambuf::in_avail function to determine how many characters are in the input buffer including a new line character.
With that in mind you could do something like this:
char ch;
cin.get(ch);//this will block until some data is entered
cin.putback(ch);//put back the character read in the previous operation
streamsize size=cin.rdbuf()->in_avail();//get the number of character in the buffer available for reading(including spaces and new line)
if(size>0)
{
char* arr=new char[size];//allocate the size of the array(you might want to add one more space for null terminator character)
for(streamsize i=0;i<size;i++)
cin.get(arr[i]);//copy each character, including spaces and newline, from the input buffer to the array
for(streamsize i=0;i<size;i++)
cout<<arr[i];//display the result
}
That being said, i am sure you have a specific reason for doing this, but i don't think it is a good idea to do I/O like this. If you don't want to estimate the size of the character array you need for input then you can always use a std::string and read the input instead.

Using ifstream's getLIne C++

Hello World,
I am fairly new to C++ and I am trying to read a text file Line by Line. I did some research online and stumbled across ifstream.
What is troubling me is the getLine Method.
The parameters are istream& getline (char* s, streamsize n );
I understand that the variable s is where the line being read is saved. (Correct me if I am wrong)
What I do not understand is what the streamsize n is used for.
The documentation states that:
Maximum number of characters to write to s (including the terminating null character).
However if I do not know how long a given line is what do I set the streamsize n to be ?
Also,
What is the difference between ifstream and istream ?
Would istream be more suitable to read lines ? Is there a difference in performance ?
Thanks for your time
You almost never want to use this getline function. It's a leftover from back before std::string had been defined. It's for reading into a fixed-size buffer, so you'd do something like this:
static const int N = 1024;
char mybuffer[N];
myfile.getline(mybuffer, N);
...and the N was there to prevent getline from writing into memory past the end of the space you'd allocated.
For new code you usually want to use an std::string, and let it expand to accommodate the data being read into it:
std::string input;
std::getline(myfile, input);
In this case, you don't need to specify the maximum size, because the string can/will expand as needed for the size of the line in the input. Warning: in a few cases, this can be a problem--if (for example) you're reading data being fed into a web site, it could be a way for an attacker to stage a DoS attack by feeding an immense string, and bringing your system to its knees trying to allocate excessive memory.
Between istream and ifstream: an istream is mostly a base class that defines an interface that can be used to work with various derived classes (including ifstream objects). When/if you want to open a file from disk (or something similar) you want to use an ifstream object.

Any way to get rid of the null character at the end of an istream get?

I'm currently trying to write a bit of code to read a file and extract bits of it and save them as variables.
Here's the relevant code:
char address[10];
ifstream tracefile;
tracefile.open ("trace.txt");
tracefile.seekg(2, ios::beg);
tracefile.get(address, 10, ' ');
cout << address;
The contents of the file: (just the first line)
R 0x00000000
The issue I'm having is that address misses the final '0' because it puts a /0 character there, and I'm not sure how to get around that? So it outputs:
0x0000000
I'm also having issues with
tracefile.seekg(2, ios::cur);
It doesn't seem to work, hence why I've changed it to ios::beg just to try and get something work, although obviously that won't be useable once I try to read multiple lines after one another.
Any help would be appreciated.
ifstream::get() will attempt to produce a null-terminated C string, which you haven't provided enough space for.
You can either:
Allocate char address[11]; (or bigger) to hold a null-terminated string longer than 9 characters.
Use ifstream::read() instead to read the 10 bytes without a null-terminator.
Edit:
If you want a buffer that can dynamically account for the length of the line, use std::getline with a std::string.
std::string buffer;
tracefile.seekg(2, ios::beg);
std::getline( tracefile, buffer );
Edit 2
If you only want to read to the next whitespace, use:
std::string buffer;
tracefile.seekg(2, ios::beg);
tracefile >> buffer;
Make the buffer bigger, so that you can read the entire input text into it, including the terminating '\0'. Or use std::string, which doesn't have a pre-determined size.
There are several issues with your code. The first is that
seekg( 2, ios::beg ) is undefined behavior unless the stream
is opened in binary mode (which yours isn't). It will work
under Unix, and depending on the contents of the file, it
might work under Windows (but it could also send you to the
wrong place). On some other systems, it might systematically
fail, or do just about anything else. You cannot reliably seek
to arbitrary positions in a text stream.
The second is that if you want to read exactly 10 characters,
the function you need is istream::read, and not
istream::get. On the other hand, if you want to read up to
the next white space, using >> into a string will work best.
If you want to limit the number of characters extracted to a
maximum, set the width before calling >>:
std::string address;
// ...
tracefile >> std::setw( 10 ) >> address;
This avoids all issues of '\0', etc.
Finally, of course, you need error checking. You should
probably check whether the open succeeded before doing anything
else, and you should definitely check whether the read succeeded
before using the results. (As you've written the code, if the
open fails for any reason, you have undefined behavior.)
If you're reading multiple lines, of course, the best solution
is usually to use std::getline to read each line into a
string, and then parse that string (possibly using
std::istringstream). This prevents the main stream from
entering error state if there is a format error in the line, and
it provides automatic resynchronization in such cases.

How to use fgets if you don't know the number of characters to be read?

I need to read a file and send the text from it to a string so I can parse it. However, the program won't know exactly how long the file is, so what would I do if I wanted to use fgets(), or is there a better alternative?
Note:
char *fgets(char *str, size_t num, FILE *stream);
Don't forget that fgets() reads a line at a time, subject to having enough space.
Humans seldom write lines longer than ... 80, 256, pick a number ... characters. POSIX suggests a line length of 4096. So, I usually use:
char buffer[4096];
while (fgets(buffer, sizeof(buffer), fp))
{
...process line...
}
If you are worried that someone might provide more than 4K of data in a single line (and a machine generated file, such as HTML or JSON, might contain that), then you have to decide what to do next. You can do any of the following (and there are likely some other options I've not mentioned):
Process the over-long lines in bits without assuming that there was a newline in between.
Allocate memory for a longer line (say 8K to start with), copy the initial 4K into the allocated buffer, and read more data into the second half of the buffer, iterating until you find the end of line.
Use the POSIX 2008 function getline() which is available on Linux. It does memory allocation for you.
You can use fgets iteratively, but a simpler alternative is (stdio.h's) getline. It's in POSIX, but it's not standard C.
Since you're using C++ though, can you use std::string functions like iostream's getline?
If you're not on a POSIX system and don't have getline available, take a look at Chuck Falconer's public domain ggets/fggets functions which dynamically grow a buffer to consume an entire line. (That link seems to be down right now, but archive.org has a copy.)
Allocate a buffer (the one that str points to), and pass the size of the buffer for num. The actual space taken up will only be the length of the text read by fgets.
Something like:
char str[1000];
fgets(str, 1000, &file);
If the next line only has 10 characters before the newline, then str will hold those 10 characters, the newline, and the null terminator.
Edit: just in case there is any confusion, I didn't intend the above to sound as if the extra space in the buffer isn't in use. I only meant to illustrate that you don't need to know ahead of time how long your string is going to be, as long as you can put a maximum length on it.

Overloading operator>> to a char buffer in C++ - can I tell the stream length?

I'm on a custom C++ crash course. I've known the basics for many years, but I'm currently trying to refresh my memory and learn more. To that end, as my second task (after writing a stack class based on linked lists), I'm writing my own string class.
It's gone pretty smoothly until now; I want to overload operator>> that I can do stuff like cin >> my_string;.
The problem is that I don't know how to read the istream properly (or perhaps the problem is that I don't know streams...). I tried a while (!stream.eof()) loop that .read()s 128 bytes at a time, but as one might expect, it stops only on EOF. I want it to read to a newline, like you get with cin >> to a std::string.
My string class has an alloc(size_t new_size) function that (re)allocates memory, and an append(const char *) function that does that part, but I obviously need to know the amount of memory to allocate before I can write to the buffer.
Any advice on how to implement this? I tried getting the istream length with seekg() and tellg(), to no avail (it returns -1), and as I said looping until EOF (doesn't stop reading at a newline) reading one chunk at a time.
To read characters from the stream until the end of line use a loop.
char c;
while(istr.get(c) && c != '\n')
{
// Apped 'c' to the end of your string.
}
// If you want to put the '\n' back onto the stream
// use istr.unget(c) here
// But I think its safe to say that dropping the '\n' is fine.
If you run out of room reallocate your buffer with a bigger size.
Copy the data across and continue. No need to be fancy for a learning project.
you can use cin::getline( buffer*, buffer_size);
then you will need to check for bad, eof and fail flags:
std::cin.bad(), std::cin.eof(), std::cin.fail()
unless bad or eof were set, fail flag being set usually indicates buffer overflow, so you should reallocate your buffer and continue reading into the new buffer after calling std::cin.clear()
A side note: In the STL the operator>> of an istream is overloaded to provide this kind of functionality or (as for *char ) are global functions. Maybe it would be more wise to provide a custom overload instead of overloading the operator in your class.
Check Jerry Coffin's answer to this question.
The first method he used is very simple (just a helper class) and allow you to write your input in a std::vector<std::string> where each element of the vector represents a line of the original input.
That really makes things easy when it comes to processing afterwards!