Easiest/clearest way to read formatted data in C++ - c++

I'm reading in a file of space/newline delimited numbers. After trying stringstreams and ifstreams, it appears C++ hasn't improved much on fopen and fscanf for this simple task in terms of simplicity, readability, or efficiency.
What about robustness? Since I check that fscanf returned the number of items I expect, this doesn't seem like an issue. The only benefit I can think of is stringstream's giving you more options to handle a failure.
Here is a quick example using fscanf:
FILE * pFile;
pFile = fopen ("my_file.txt","r");
if( pFile == NULL ) return -1;
double x,y,z;
int items_read;
while( true )
{
items_read = fscanf( pFile, "%lf %lf %lf", x, y, z );
if( items_read < 3 ) break; // Checks for EOF (which is -1) or reading 1-2 numbers
std::cout << x << " " << y << " " << z << "\n";
}
NOTE: for extra security, could replace fopen/fscanf with fopen_s/fscanf_s in Visual Studio.

In my experience neither C nor C++ offer you "robust input that tolerates idiot users".
It is adequate for "well formed input where it's OK to say 'something wrong in your input, please fix it'...", but not for robust situations where you need to check everything carefully (e.g. someone putting two instead of three numbers on a line, so the whole rest of the data is happily acceped, but now all your z values are actually x values, and everything else is "shifted one").
In that case, you do need to write some functions that do the appropriate checking by reading a line, checking that it can fetch three numbers out of that line - or something like that. You may well find that using stringstream or something similar is adequate for checking that there are three valid numbers on the line, but just using f >> x >> y >> z; will obviously lead to the next line being used to satisfy whatever is missing on this line.

Related

Reading different types of variables from a file in C++

I have a problem with my code in C++ and need some help. There are some related questions but I couldn't really understand the answers.
I have a text file ('parameters.dat' in the example below) that I want to read in with my code written in C++. The file includes different types of variables: Boolean, doubles and integers as well as some comments which I want to skip when reading.
My file looks something like that:
150 // this is an integer
4e-1 // this is a double
1.05 // another double
0 // this is a logical (Boolean) variable: 0 is false and 1 is true
A simple version of the code that I use is
int N;
double var_1, var_2;
bool inital;
ifstream read_parameters;
read_parameters.open("parameters.dat");
read_parameters >> N >> var_1 >> var_2 >> initial;
read_parameters.close();
The comments seem to ruin everything and even without them there seem to be some problems with reading the logical variables correctly. The file that I try to read is made by me, so I can substitute the '//' above with something else if necessary. Does anyone have any advice?
Thanks in advance!
Simple, cheesy way:
Read a token then read_parameters.ignore(numeric_limits<streamsize>::max(), '\n') to discard the rest of the line. eg:
read_parameters >> N;
read_parameters.ignore(numeric_limits<streamsize>::max(), '\n');
read_parameters >> var_1;
read_parameters.ignore(numeric_limits<streamsize>::max(), '\n');
...
This doesn't care if a comment exists or not, but requires modification if you have two or more tokens on a line.
Oh, and remember to test the state of the stream after reading. Plugging in "fubar" for one of the doubles will currently ruin things. read_parameters will be in an error state that needs to be cleared before you can read it again.
if (!read_parameters >> N)
{
std::cerr << "bad input for parameter N" << std::endl;
read_parameters.clear();
}
read_parameters.ignore(numeric_limits<streamsize>::max(), '\n');
Is better, but you probably want to handle an error with something better than a printline.
Take input in a Dynamic Character array then run a for loop in which if a character has ascii 65 - 97 - onward then it will be Alphabet and else if ascii represents integers like 1,2,3 then separte them in another array after calculating total integers as count++ .

Storing data from a text file in an array of structures C++

I am attempting to read data from a text file into an array of structures. The first iteration of the for-loop reads and displays everything correctly until it reaches the Boolean value, and every iteration after that does not display as expected. Is the bool value at the end causing the entire rest of the file to be read incorrectly? Or perhaps an issue stemming from getline?
int main()
{
groceryProduct inventoryDatabase[25];
ifstream fin("inventory.txt");
if (!fin)
{
cout << "File could not be located.";
}
string itemName;
for (int index = 0; index < 25; index++)
{
getline(fin, inventoryDatabase[index].itemName, '\n');
fin >> inventoryDatabase[index].itemNumber;
fin >> inventoryDatabase[index].itemPrice;
fin >> inventoryDatabase[index].membershipPrice;
fin >> inventoryDatabase[index].payByWeight;
cout << inventoryDatabase[index].itemName << endl;
cout << inventoryDatabase[index].itemNumber << endl;
cout << inventoryDatabase[index].itemPrice << endl;
cout << inventoryDatabase[index].membershipPrice << endl;
cout << inventoryDatabase[index].payByWeight << endl;
}
return 0;
};
The structure:
struct groceryProduct
{
double itemPrice;
double membershipPrice;
double itemWeight;
int itemQuantity;
string itemNumber;
string itemName;
bool payByWeight;
};
The output:
Apple
P0000
0.85
0.8
204 (expected output of 'false' instead of 204)
Output for every iteration of loop after first iteration:
-9.25596e+61
-9.25596e+61
204
Thank you, and please let me know if you require any more information.
File:
Apple
P0000
.85
.80
false
Orange
P0001
.95
.85
false
Lemon
P0002
.65
.60
false
You need to tell your stream that the bool values are text with
fin >> boolalpha >> inventoryDatabase[index].payByWeight
You're seeing garbage data after the first bool input because failbit gets set in the stream and no further inputs will work until it is reset. This results in you array's data staying uninitialized.
Here are a couple of things I see that may be causing your problem.
1) An array is not "magically filled" with data. You have an uninitialized array, meaning that the data inside of it does not yet exist. At all.
What you have to do to remedy this is to add a new instance of the struct to the array at the start of each loop iteration.
How'd I spot that? Good rule of thumb: if it's weird, it's memory-related. Make sure you've initialized everything.
2) I've seen weird things happen when you use getline and << next to each other. Is there a particular reason you are using getline over <<?
(I would need to re-research how to work around that weirdness. I used to hit it a lot in my C++ class way back when.)
3) What 1201ProgramAlarm said is absolutely correct.
Side note: Do NOT get into the habit of throwing double around because "I want to be able to arbitrarily throw a large value in there." It's a bad habit that wastes space, as double is twice as large as float.
Learn the difference between float and double - you will almost never need double outside of scientific situations, because it is for numbers with a LOT of decimal places. (That's oversimplifying it.) If you're using double over float all the time, you're using twice the memory you need - 32 bits per variable extra, in fact. It adds up. (And people wonder why modern programs need 8GB of RAM to do the same thing as their 100MB-RAM-using predecessors...)
Prices always have two (rarely three) decimal places, so float should fit that perfectly in all cases. Same with weights.

Simple C++ not reading EOF

I'm having a hard time understanding why while (cin.get(Ch)) doesn't see the EOF. I read in a text file with 3 words, and when I debug my WordCount is at 3 (just what I hoped for). Then it goes back to the while loop and gets stuck. Ch then has no value. I thought that after the newline it would read the EOF and break out. I am not allowed to use <fstream>, I have to use redirection in DOS. Thank you so much.
#include <iostream>
using namespace std;
int main()
{
char Ch = ' ';
int WordCount = 0;
int LetterCount = 0;
cout << "(Reading file...)" << endl;
while (cin.get(Ch))
{
if ((Ch == '\n') || (Ch == ' '))
{
++WordCount;
LetterCount = 0;
}
else
++LetterCount;
}
cout << "Number of words => " << WordCount << endl;
return 0;
}
while (cin >> Ch)
{ // we get in here if, and only if, the >> was successful
if ((Ch == '\n') || (Ch == ' '))
{
++WordCount;
LetterCount = 0;
}
else
++LetterCount;
}
That's the safe, and common, way to rewrite your code safely and with minimal changes.
(Your code is unusual, trying to scan all characters and count whitespace and newlines. I'll give a more general answer to a slightly different question - how to read in all the words.)
The safest way to check if a stream is finished if if(stream). Beware of if(stream.good()) - it doesn't always work as expected and will sometimes quit too early. The last >> into a char will not take us to EOF, but the last >> into an int or string will take us to EOF. This inconsistency can be confusing. Therefore, it is not correct to use good(), or any other test that tests EOF.
string word;
while(cin >> word) {
++word_count;
}
There is an important difference between if(cin) and if(cin.good()). The former is the operator bool conversion. Usually, in this context, you want to test:
"did the last extraction operation succeed or fail?"
This is not the same as:
"are we now at EOF?"
After the last word has been read by cin >> word, the string is at EOF. But the word is still valid and contains the last word.
TLDR: The eof bit is not important. The bad bit is. This tells us that the last extraction was a failure.
The Counting
The program counts newline and space characters as words. In your file contents "this if fun!" I see two spaces and no newline. This is consistent with the observed output indicating two words.
Have you tried looking at your file with a hex editor or something similar to be sure of the exact contents?
You could also change your program to count one more word if the last character read in the loop was a letter. This way you don't have to have newline terminated input files.
Loop Termination
I have no explanation for your loop termination issues. The while-condition looks fine to me. istream::get(char&) returns a stream reference. In a while-condition, depending on the C++ level your compiler implements, operator bool or operator void* will be applied to the reference to indicate if further reading is possible.
Idiom
The standard idiom for reading from a stream is
char c = 0;
while( cin >> c )
process(c);
I do not deviate from it without serious reason.
you input file is
this is fun!{EOF}
two spaces make WordCount increase to 2
and then EOF, exit loop! if you add a new line, you input file is
this is fun!\n{EOF}
I took your program loaded it in to visual studio 2013, changed cin to an fstream object that opened a file called stuff.txt which contains the exact characters "This is fun!/n/r" and the program worked. As previous answers have indicated, be careful because if there's not a /n at the end of the text the program will miss the last word. However, I wasn't able to replicate the application hanging in an infinite loop. The code as written looks correct to me.
cin.get(char) returns a reference to an istream object which then has it's operator bool() called which returns false when any of the error bits are set. There are some better ways to write this code to deal with other error conditions... but this code works for me.
In your case, the correct way to bail out of the loop is:
while (cin.good()) {
char Ch = cin.get();
if (cin.good()) {
// do something with Ch
}
}
That said, there are probably better ways to do what you're trying to do.

Need to write specific lines of a text into a new text

I have numerical text data lines ranging between 1mb - 150 mb in size, i need to write lines of numbers related to heights, for example: heights=4 , new text must include lines: 1,5,9,13,17,21.... consequentially.
i have been trying to find a way to do this for a while now, tried using a list instead of vector which ended up with compilation errors.
I have cleaned up the code as advised. It now writes all lines sample2 text, all done here. Thank you all
I am open to method change as long as it delivers what i need, Thank you for you time and help.
following is what i have so far:
#include <iostream>
#include <fstream>
#include <string>
#include <list>
#include <vector>
using namespace std;
int h,n,m;
int c=1;
int main () {
cout<< "Enter Number Of Heights: ";
cin>>h;
ifstream myfile_in ("C:\\sample.txt");
ofstream myfile_out ("C:\\sample2.txt");
string line;
std::string str;
vector <string> v;
if (myfile_in.is_open()) {
myfile_in >> noskipws;
int i=0;
int j=0;
while (std::getline(myfile_in, line)) {
v.push_back( line );
++n;
if (n-1==i) {
myfile_out<<v[i]<<endl;
i=i+h;
++j;
}
}
cout<<"Number of lines in text file: "<<n<<endl;
}
else cout << "Unable to open file(s) ";
cout<< "Reaching here, Writing one line"<<endl;
system("PAUSE");
return 0;
}
You need to use seekg to set the position at the beginning of the file, once you have read it (you have read it once, to count the lines (which I don't think you actually need, as this size is never used, at least in this piece of code)
And what is the point if the inner while? On each loop, you have
int i=1;
myfile_out<<v[i]; //Not writing to text
i=i+h;
So on each loop, i gets 1, so you output the element with index 1 all the time. Which is not the first element, as indices start from 0. So, once you put seekg or remove the first while, your program will start to crash.
So, make i start from 0. And get it out of the two while loops, right at the beginning of the if-statement.
Ah, the second while is also unnecessary. Leave just the first one.
EDIT:
Add
myfile_in.clear();
before seekg to clear the flags.
Also, your algorithm is wrong. You'll get seg fault, if h > 1, because you'll get out of range (of the vector). I'd advise to do it like this: read the file in the while, that counts the lines. And store each line in the vector. This way you'll be able to remove the second reading, seekg, clear, etc. Also, as you already store the content of the file into a vector, you'll NOT lose anything. Then just use for loop with step h.
Again edit, regarding your edit: no, it has nothing to do with any flags. The if, where you compare i==j is outside the while. Add it inside. Also, increment j outside the if. Or just remove j and use n-1 instead. Like
if ( n-1 == i )
Several things.
First you read the file completely, just to count the number of lines,
then you read it a second time to process it, building up an in memory
image in v. Why not just read it in the first time, and do everything
else on the in memory image? (v.size() will then give you the number
of lines, so you don't have to count them.)
And you never actually use the count anyway.
Second, once you've reached the end of file the first time, the
failbit is set; all further operations are no-ops, until it is reset.
If you have to read the file twice (say because you do away with v
completely), then you have to do myfile_in.clear() after the first
loop, but before seeking to the beginning.
You only test for is_open after having read the file once. This test
should be immediately after the open.
You also set noskipws, although you don't do any formatted input
which would be affected by it.
The final while is highly suspect. Because you haven't done the
clear, you probably never enter the loop, but if you did, you'd very
quickly start accessing out of bounds: after reading n lines, the size
of v will be n, but you read it with index i, which will be n * h.
Finally, you should explicitly close the output file and check for
errors after the close, just in case.
It's not clear to me what you're trying to do. If all you want to do is
insert h empty lines between each existing line, something like:
std::string separ( h + 1, '\n' );
std::string line;
while ( std::getline( myfile_in, line ) ) {
myfile_out << line << separ;
}
should do the trick. No need to store the complete input in memory.
(For that matter, you don't even have to write a program for this.
Something as simple a sed 's:$:\n\n\n\n:' < infile > outfile would do
the trick.)
EDIT:
Reading other responses, I gather that I may have misunderstood the
problem, and that he only wants to output every h-th line. If this is
the case:
std::string line;
while ( std::getline( myfile_in, line ) ) {
myfile_out << line << '\n';
for ( int count = h - 1; h > 0; -- h ) {
std::getline( myfile_in, line );
// or myfile_in.ignore( INT_MAX, '\n' );
}
}
But again, other tools seem more appropriate. (I'd follow thiton's
suggestion and use AWK.) Why write a program in a language you don't
know well when tools are already available to do the job.
If there is no absolutely compelling reason to do this in C++, you are using the wrong programming language for this. In awk, your whole program is:
{ if ( FNR % 4 == 1 ) print; }
Or, giving the whole command line e.g. in sh to filter lines 1,5,9,13,...:
awk '{ if ( FNR % 4 == 1 ) print; }' a.txt > b.txt

How to getline() from specific line in a file? C++

I've looked around a bit and have found no definitive answer on how to read a specific line of text from a file in C++. I have a text file with over 100,000 English words, each on its own line. I can't use arrays because they obviously won't hold that much data, and vectors take too long to store every word. How can I achieve this?
P.S. I found no duplicates of this question regarding C++
while (getline(words_file, word))
{
my_vect.push_back(word);
}
EDIT:
A commenter below has helped me to realize that the only reason loading a file to a vector was taking so long was because I was debugging. Plainly running the .exe loads the file nearly instantaneously. Thanks for everyones help.
If your words have no white-space (I assume they don't), you can use a more tricky non-getline solution using a deque!
using namespace std;
int main() {
deque<string> dictionary;
cout << "Loading file..." << endl;
ifstream myfile ("dict.txt");
if ( myfile.is_open() ) {
copy(istream_iterator<string>(myFile),
istream_iterator<string>(),
back_inserter<deque<string>>(dictionary));
myfile.close();
} else {
cout << "Unable to open file." << endl;
}
return 0;
}
The above reads the entire file into a string and then tokenizes the string based on the std::stream default (any whitespace - this is a big assumption on my part) which makes it slightly faster. This gets done in about 2-3 seconds with 100,000 words. I'm also using a deque, which is the best data structure (imo) for this particular scenario. When I use vectors, it takes around 20 seconds (not even close to your minute mark -- you must be doing something else that increases complexity).
To access the word at line 1:
cout << dictionary[0] << endl;
Hope this has been useful.
You have a few options, but none will automatically let you go to a specific line. File systems don't track line numbers within files.
One way is to have fixed-width lines in the file. Then read the appropriate amount of data based upon the line number you want and the number of bytes per line.
Another way is to loop, reading lines one a time until you get to the line that you want.
A third way would be to have a sort of index that you create at the beginning of the file to reference the location of each line. This, of course, would require that you control the file format.
I already mentioned this in a comment, but I wanted to give it a bit more visibility for anyone else who runs into this issue...
I think that the following code will take a long time to read from the file because std::vector probably has to re-allocate its internal memory several times to account for all of these elements that you are adding. This is an implementation detail, but if I understand correctly std::vector usually starts out small and increases its memory as necessary to accommodate new elements. This works fine when you're adding a handful of elements at a time, but is really inefficient when you're adding a thousand elements at once.
while (getline(words_file, word)) {
my_vect.append(word); }
So, before running the loop above, try to initialize the vector with my_vect(100000) (constructor with the number of elements specified). This forces std::vector to allocate enough memory in advance so that it doesn't need to shuffle things around later.
The question is exceedingly unclear. How do you determine the specific
line? If it is the nth line, simplest solution is just to call
getline n times, throwing out all but the last results; calling
ignore n-1 times might be slightly faster, but I suspect that if
you're always reading into the same string (rather than constructing a
new one each time), the difference in time won't be enormous. If you
have some other criteria, and the file is really big (which from your
description it isn't) and sorted, you might try using a binary search,
seeking to the middle of the file, reading enough ahead to find the
start of the next line, then deciding the next step according to it's
value. (I've used this to find relevant entries in log files. But
we're talking about files which are several Gigabytes in size.)
If you're willing to use system dependent code, it might be advantageous
to memory map the file, then search for the nth occurance of a '\n'
(std::find n times).
ADDED: Just some quick benchmarks. On my Linux box, getting the
100000th word from /usr/share/dict/words (479623 words, one per line,
on my machine), takes about
272 milliseconds, reading all words
into an std::vector, then indexing,
256 milliseconds doing the same, but
with std::deque,
30 milliseconds using getline, but
just ignoring the results until the
one I'm interested in,
20 milliseconds using
istream::ignore, and
6 milliseconds using mmap and
looping on std::find.
FWIW, the code in each case is:
For the std:: containers:
template<typename Container>
void Using<Container>::operator()()
{
std::ifstream input( m_filename.c_str() );
if ( !input )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
Container().swap( m_words );
std::copy( std::istream_iterator<Line>( input ),
std::istream_iterator<Line>(),
std::back_inserter( m_words ) );
if ( static_cast<int>( m_words.size() ) < m_target )
Gabi::ProgramManagement::fatal()
<< "Not enough words, had " << m_words.size()
<< ", wanted at least " << m_target;
m_result = m_words[ m_target ];
}
For getline without saving:
void UsingReadAndIgnore::operator()()
{
std::ifstream input( m_filename.c_str() );
if ( !input )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
std::string dummy;
for ( int count = m_target; count > 0; -- count )
std::getline( input, dummy );
std::getline( input, m_result );
}
For ignore:
void UsingIgnore::operator()()
{
std::ifstream input( m_filename.c_str() );
if ( !input )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
for ( int count = m_target; count > 0; -- count )
input.ignore( INT_MAX, '\n' );
std::getline( input, m_result );
}
And for mmap:
void UsingMMap::operator()()
{
int input = ::open( m_filename.c_str(), O_RDONLY );
if ( input < 0 )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
struct ::stat infos;
if ( ::fstat( input, &infos ) != 0 )
Gabi::ProgramManagement::fatal() << "Could not stat " << m_filename;
char* base = (char*)::mmap( NULL, infos.st_size, PROT_READ, MAP_PRIVATE, input, 0 );
if ( base == MAP_FAILED )
Gabi::ProgramManagement::fatal() << "Could not mmap " << m_filename;
char const* end = base + infos.st_size;
char const* curr = base;
char const* next = std::find( curr, end, '\n' );
for ( int count = m_target; count > 0 && curr != end; -- count ) {
curr = next + 1;
next = std::find( curr, end, '\n' );
}
m_result = std::string( curr, next );
::munmap( base, infos.st_size );
}
In each case, the code is run
You could seek to a specific position, but that requires that you know where the line starts. "A little less than a minute" for 100,000 words does sound slow to me.
Read some data, count the newlines, throw away that data and read some more, count the newlines again... and repeat until you've read enough newlines to hit your target.
Also, as others have suggested, this is not a particularly efficient way of accessing data. You'd be well-served by making an index.