C++ fstream: how to know size of string when reading? - c++

...as someone may remember, I'm still stuck on C++ strings. Ok, I can write a string to a file using a fstream as follows
outStream.write((char *) s.c_str(), s.size());
When I want to read that string, I can do
inStream.read((char *) s.c_str(), s.size());
Everything works as expected. The problem is: if I change the length of my string after writing it to a file and before reading it again, printing that string won't bring me back my original string but a shorter/longer one. So: if I have to store many strings on a file, how can I know their size when reading it back?
Thanks a lot!

You shouldn’t be using the unformatted I/O functions (read() and write()) if you just want to write ordinary human-readable string data. Generally you only use those functions when you need to read and write compact binary data, which for a beginner is probably unnecessary. You can write ordinary lines of text instead:
std::string text = "This is some test data.";
{
std::ofstream file("data.txt");
file << text << '\n';
}
Then read them back with getline():
{
std::ifstream file("data.txt");
std::string line;
std::getline(file, line);
// line == text
}
You can also use the regular formatting operator >> to read, but when applied to string, it reads tokens (nonwhitespace characters separated by whitespace), not whole lines:
{
std::ifstream file("data.txt");
std::vector<std::string> words;
std::string word;
while (file >> word) {
words.push_back(word);
}
// words == {"This", "is", "some", "test", "data."}
}
All of the formatted I/O functions automatically handle memory management for you, so there is no need to worry about the length of your strings.

Although your writing solution is more or less acceptable, your reading solution is fundamentally flawed: it uses the internal storage of your old string as a character buffer for your new string, which is very, very bad (to put it mildly).
You should switch to a formatted way of reading and writing the streams, like this:
Writing:
outStream << s;
Reading:
inStream >> s;
This way you would not need to bother determining the lengths of your strings at all.
This code is different in that it stops at whitespace characters; you can use getline if you want to stop only at \n characters.

You can write the strings and write an additional 0 (null terminator) to the file. Then it will be easy to separate strings later. Also, you might want to read and write lines
outfile << string1 << endl;
getline(infile, string2, '\n');

If you want to use unformatted I/O your only real options are to either use a fixed size or to prepend the size somehow so you know how many characters to read. Otherwise, when using formatted I/O it somewhat depends on what your strings contain: if they can contain all viable characters, you would need to implement some sort of quoting mechanism. In simple cases, where strings consist e.g. of space-free sequence, you can just use formatted I/O and be sure to write a space after each string. If your strings don't contain some character useful as a quote, it is relatively easy to process quotes:
std::istream& quote(std::istream& out) {
char c;
if (in >> c && c != '"') {
in.setstate(std::ios_base::failbit;
}
}
out << '"' << string << "'";
std::getline(in >> std::ws >> quote, string, '"');
Obviously, you might want to bundle this functionality a class.

Related

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

Reading from ifstream won't read whitespace

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.
Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the spaces now, but not the new-line character that I need to lex some constructs.
Here's the offending code (extended comments truncated)
while(input >> current) {
always_next_struct val = always_next_struct(next);
if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
continue;
}
if (current == L'/') {
input >> current;
if (current == L'/') {
// explicitly empty while loop
while(input.get(current) && current != L'\n');
continue;
}
I'm breaking on the while line and looking at every value of current as it comes in, and \r or \n are definitely not among them- the input just skips to the next line in the input file.
There is a manipulator to disable the whitespace skipping behavior:
stream >> std::noskipws;
The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.
Edit:
Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).
Edit (analyzing code):
After
while(input.get(current) && current != L'\n');
continue;
there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?
I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):
//: get.cpp : compile, then run: get < get.cpp
#include <iostream>
int main()
{
char c;
while (std::cin.get(c))
{
if (c == '/')
{
char last = c;
if (std::cin.get(c) && c == '/')
{
// std::cout << "Read to EOL\n";
while(std::cin.get(c) && c != '\n'); // this comment will be skipped
// std::cout << "go to next line\n";
std::cin.putback(c);
continue;
}
else { std::cin.putback(c); c = last; }
}
std::cout << c;
}
return 0;
}
This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.
If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...
You could open the stream in binary mode:
std::wifstream stream(filename, std::ios::binary);
You'll lose any formatting operations provided my the stream if you do this.
The other option is to read the entire stream into a string and then process the string:
std::wostringstream ss;
ss << filestream.rdbuf();
OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous.
EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.
Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.
Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.
You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .
/*Open the stream in default mode.*/
std::ifstream myfile("myfile.txt");
if(myfile.good()) {
/*Read data using streambuffer iterators.*/
vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));
/*str_buf holds all the data including whitespaces and newline .*/
string str_buf(buf.begin(),buf.end());
myfile.close();
}
By default, this skipws flag is already set on the ifstream object, so we must disable it. The ifstream object has these default flags because of std::basic_ios::init, called on every new ios_base object (more details). Any of the following would work:
in_stream.unsetf(std::ios_base::skipws);
in_stream >> std::noskipws; // Using the extraction operator, same as below
std::noskipws(in_stream); // Explicitly calling noskipws instead of using operator>>
Other flags are listed on cpp reference.
The stream extractors behave the same and skip whitespace.
If you want to read every byte, you can use the unformatted input functions, like stream.get(c).
Why not simply use getline ?
You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)
Just Use getline.
while (getline(input,current))
{
cout<<current<<"\n";
}
I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.

How to read a file and get words in C++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

C++: Why does space always terminate a string when read?

Using type std::string to accept a sentence, for practice (I haven't worked with strings in C++ much) I'm checking if a character is a vowel or not. I got this:
for(i = 0; i <= analyse.length(); i++) {
if(analyse[i] == 'a' || analyse[i] == 'e' [..etc..]) {
...vowels++;
} else { ...
...consonants++;
}
This works fine if the string is all one word, but the second I add a space (IE: aeio aatest) it will only count the first block and count the space as a consonant, and quit reading the sentence (exiting the for loop or something).
Does a space count as no character == null? Or some oddity with std::string?, It would be helpful to know why that is happening!
EDIT:
I'm simply accepting the string through std::cin, such as:
std::string analyse = "";
std::cin >> analyse;
I'd guess you're reading your string with something like your_stream >> your_string;. Operator >> for strings is defined to work (about) the same as scanf's %s conversion, which reads up until it encounters whitespace -- therefore, operator>> does the same.
You can read an entire line of input instead with std::getline. You might also want to look at an answer I posted to a previous question (provides some alternatives to std::getline).
I can't tell from the code that you have pasted, but I'm going to go out on a limb and guess that you're reading into the string using the stream extraction operator (stream >> string).
The stream extraction operator stops when it encounters whitespace.
If this isn't what's going on, can you show us how you're populating your string, and what its contents are?
If I'm right, then you're going to want a different method of reading content into the string. std::getline() is probably the easiest method of reading from a file. It stops at newlines instead of at whitespace.
Edit based on edited question:
use this (doublecheck the syntax. I'm not in front of my compiler.):
std::getline(std::cin, analyze);
This ought to stop reading when you press "enter".
If you want to read in an entire line (including the blanks) then you should read using getline. Schematically it looks like this:
#include <string>
istream& std::getline( istream& is, string& s );
To read the whole line you do something like this:
string s;
getline( cin, s );
cout << "You entered " << s << endl;
PS: the word is "consonant", not "consenent".
The >> operator on an istream separates strings on whitespace. If you want to get a whole line, you can use readline(cin,destination_string).

How to read a word into a string ignoring a certain character

I am reading a text file which contains a word with a punctuation mark on it and I would like to read this word into a string without the punctuation marks.
For example, a word may be " Hello, "
I would like the string to get " Hello " (without the comma). How can I do that in C++ using ifstream libraries only.
Can I use the ignore function to ignore the last character?
Thank you in advance.
Try ifstream::get(Ch* p, streamsize n, Ch term).
An example:
char buffer[64];
std::cin.get(buffer, 64, ',');
// will read up to 64 characters until a ',' is found
// For the string "Hello," it would stream in "Hello"
If you need to be more robust than simply a comma, you'll need to post-process the string. The steps might be:
Read the stream into a string
Use string::find_first_of() to help "chunk" the words
Return the word as appropriate.
If I've misunderstood your question, please feel free to elaborate!
If you only want to ignore , then you can use getline.
const int MAX_LEN = 128;
ifstream file("data.txt");
char buffer[MAX_LEN];
while(file.getline(buffer,MAX_LEN,','))
{
cout<<buffer;
}
EDIT: This uses std::string and does away with MAX_LEN
ifstream file("data.txt");
string string_buffer;
while(getline(file,string_buffer,','))
{
cout<<string_buffer;
}
One way would be to use the Boost String Algorithms library. There are several "replace" functions that can be used to replace (or remove) specific characters or strings in strings.
You can also use the Boost Tokenizer library for splitting the string into words after you have removed the punctuation marks.