I have this code....
file_parser::file_parser(string file){
txtfile.open(file.c_str(), ios::in);
if (!txtfile.good()){
string error=err(0," "+file+" not found. Exit code ",1);
throw file_parse_exception(error);
}
while (!txtfile.eof()){
char str[200];
txtfile.getline(str, 200);
string str2=str;
vfile.push_back(str2);
}
txtfile.close();
}
and the problem is that if I have a line in the input file greater than 200 characters it hangs then crashes. I checked out the value of str at crash and it is preceded by a null char then it tries to push back a null(non-initialized) string onto the vector which causes the hang/crash. does anyone know a way to get around this? I thought by using getline it would truncate the char array at 199(+null) characters but apparently this isn't happening. I'm stumped. The thing is that I want each pushback to have a max of 200 characters. I really don't want the WHOLE line which is what 'string str' would do. and if a line is over 200 characters it should read the first 200 and then move on to the next line.
Replace your input loop with this:
std::string str;
while (std::getline(txtfile, str)){
vfile.push_back(str);
}
Using ios::eof() as a loop condition almost always creates a buggy program, as it did here. In this case, using eof() has two problems. First, eof() is only set after the read fails, not before, but you are checking it before the read. Second, eof() doesn't check the range of other errors. When an input line has more than 200 characters, istream::getline sets failbit, but not eofbit.
EDIT: With the added requirement of limiting the input lines to 200 characters, this should work:
// untested
std::string str;
while(std::getline(txtfile, str)) {
if(str.size() > 200)
str.erase(200, std::string::npos);
vfile.push_back(str);
}
Related
The first line of my file must be exactly a two digit number which I read into firstline[2]. I use sscanf to read the data from that buffer and store it into an int to represent the number of lines (not including the first one) in the file. If there is a third character I must exit with an error code.
I've tried introducing a new char buffer thirdchar[1] and comparing it to a new line (10 or '\n'). If thirdchar does not equal newline then it should exit with an error code. Later in my program I use sscanf to read firstline and store that number into an int called numberoflines. When I intriduce thirdchar, it appends an extra two zeros to numberoflines to the end of what was in firstline.
//If the first line was "20"
int numberoflines;
char firstline[2];
file.get(firstline[0]);//should be '2'
file.get(firstline[1]);//should be '0'
char thridchar[1];
file.get(thirdchar[0]);//should be '\n'
if (thirdchar !=10){exit();}//10 is the value gdb spits out to represent '\n'
sscanf(firstline, "%d", &numberoflines);//numberoflines should be 20
I debugged this and firstline and thirdchar are the expected values, but numberoflines becomes 2000! I've removed the code related to thirdchar and it works fine, but doesnt meet the requirement of it being a 2 digit number. Am I misunderstanding what sscanf does? Is there a better way to implement this? Thanks.
---------------UPDATE------------------
So I've updated my code to use std::string and std::getline:
std::string firstline;
std::getline(file, firstline);
And I get the following error when trying to print the value of firstline
$1 = Python Exception <class 'gdb.error'> There is no member named _M_dataplus.:
sscanf requires the input string to be null-terminated. You are not passing it a null-terminated string so it's not behaving as expected.
As suggested, you would be better placed reading in the string using std::getline and converting the std::string into an integer.
Further reading here if using C++11 onwards, or here otherwise.
I'm attempting to write a lexer and parser but I'm having trouble getting the final variable in a text file due to in_file.tellg() equaling -1. My program only works if I add a space character after the variable, otherwise I get a compiler error. I want to mention that I'm able to get every other variable in the text file but the last one. I believe the cause of the problem is in_file.peek()!=EOF setting in_file.tellg() to -1.
My program is something like this:
ifstream in(file_name);
char c;
in >> noskipws;
while(in >> c ){
if(is_letter_part_of_variable(c)) {
int start_pos = in.tellg(),
end_pos,
length;
while(is_letter_part_of_variable(c) && in.peek()!=EOF ) {
in>>c;
}
end_pos = in.tellg(); // This becomes -1 for some reason
length = end_pos - start_pos; // Should be 7
// Reset file pointer to original position to chomp word.
in.clear();
in.seekg(start_pos-1, in.beg);
// The word 'message' should go in here.
char *identifier = new char[length];
in.read(identifier, length);
identifier[length] = '\0';
}
}
example.text
message = "Hello, World"
print message
I tried removing peek()!= EOF which gives me an eternal loop. I tried !in_file.eof() and that also makes tellg() equal to -1. What can I do to fix/enhance this code?
I believe the cause of the problem is in_file.peek()!=EOF setting in_file.tellg() to -1.
Close. peek attempts to read a character and returns EOF if it reads past the end of the stream. Reading past the end of a stream sets the stream's fail bit. tellg returns -1 if the fail bit is set.
Simple Solution
clear the fail bit before calling tellg.
Better solution
Use std::string.
std::string identifier;
while(in>>c && is_letter_part_of_variable(c)) {
identifier += c;
}
All of the messing around with peek, seekg, tellg and the dreaded new vanish.
I was looking at this post and few other. What happens if ignore() is called when input buffer is already empty? I observed in below code that if ignore() is called when buffer is already empty, it will not work and waits for some character to be entered first.
int main(void)
{
char myStr[50];
cin.ignore (std::numeric_limits<std::streamsize>::max(),'\n');
cout<<"Enter the String\n";
cin>>myStr;
// After reading remove unwanted characters from the buffer
// so that next read is not affected
cin.ignore (std::numeric_limits<std::streamsize>::max(),'\n');
}
cin.clear() after ignore() creates further problem if the buffer is already empty it looks. I guess clearing the buffer after cin() is safe. But what if I do not know the status of input buffer and I clear even when it is already empty? Do I have to check first if input buffer is empty using cin.fail() or something similar if any?
Secondly, cin itself may not be safe as space is not allowed. So getline() is suggested by some SO posts as given here. But does getline() also requires clearing input buffer or is it safe always? Does the code below work without any trouble (it works now, but now sure if it is safe code).
void getString(string& str)
{
do
{
cout<<"Enter the String: ";
getline(std::cin,str);
} while (str.empty());
}
Other SO references:
Ref 3
Breaking down main:
int main(void)
{
char myStr[50];
cin.ignore (std::numeric_limits<std::streamsize>::max(),'\n');
A bad idea, but you noticed that already. There must be a newline in the stream or you sit and wait for one. If the user's not expecting this behaviour you can expect to wait a long time and have a frustrated user. That's a bad scene.
cout<<"Enter the String\n";
cin>>myStr;
Also a bad idea, but for a different reason. >> doesn't know it should stop at 49 characters to prevent overflowing myStr. Bad things happen at that 50th character.
// After reading remove unwanted characters from the buffer
// so that next read is not affected
cin.ignore (std::numeric_limits<std::streamsize>::max(),'\n');
This one is safe. >> won't consume the newline, or any other whitespace and in order for the stream to hand over the data from the console someone must have hit enter and provided a newline.
}
A general rule of thumb is to not ignore unless you have reason to ignore, and if you have reason, ignore right away. Do not wait until before the next stream operation to ignore, be cause what if this operation is the first? Or the previous operation did not leave anything to ignore?. ignore after the operation that left what you want ignored in the stream. So
std::string getfirstword()
{
std::string firstword;
if (std::cin >> firstword)
{
cin.ignore (std::numeric_limits<std::streamsize>::max(),'\n');
return firstword;
}
return ""; // or perhaps
// throw std::runtime_error("There is no first word.");
// is more appropriate. Your call.
}
is good, but
std::string getfirstword()
{
cin.ignore (std::numeric_limits<std::streamsize>::max(),'\n');
std::string firstword;
if (std::cin >> firstword)
{
return firstword;
}
return "";
}
is an offence in the eyes of all that is holy. Don't do it.
As for getline, it gets a line. All of it up to the end of the file or the end of the line, whichever comes first. It also eats the end of the line for you so you don't have to worry about a stray newline harshing your mellow later.
If you only want part of the line, you will have to break it down. Typical usage for this is something along the lines of
std::string line;
if (std::getline(std::cin,line))
{
std::istringstream istr(line);
std::string firstword;
if (istr >> firstword)
{
// do something with firstword
}
else
{
// there is no firstword. Do something else.
}
}
getline reads everything up to and including the newline. It's no longer in the stream, so I'd consider this safe. You don't have to worry about garbage hanging around on the end of the line. You may have to worry about the next line, though.
This is how I get the name of the file from the command line and open a file and save the content of the file line by line to a string. All the procedures works fine except three empty spaces at the beginning of the file. Is anyone can say why these empty spaces occurred and how can I ignore them?
string filename = "input.txt";
char *a=new char[filename.size()+1];
a[filename.size()]=0;
memcpy(a,filename.c_str(),filename.size());
ifstream fin(a);
if(!fin.good()){
cout<<" = File does not exist ->> No File for reading\n";
exit(1);
}
string s;
while(!fin.eof()){
string tmp;
getline(fin,tmp);
s.append(tmp);
if(s[s.size()-1] == '.')
{
//Do nothing
}
else
{
s.append(" ");
}
cout<<s<<endl;
The most probable cause is that your file is encoded in something else than ASCII. It contains a bunch of unprintable bytes and the string you on the screen is the result of your terminal interpreting those bytes. To confirm this, print the size of s after the reading is done. It should be larger than the number of characters you see on the screen.
Other issues:
string filename = "input.txt";
char *a=new char[filename.size()+1];
a[filename.size()]=0;
memcpy(a,filename.c_str(),filename.size());
ifstream fin(a);
is quite an overzealous way to go about it. Just write ifstream fin(a.c_str());, or simply ifstream fin(a); in C++11.
Next,
while(!fin.eof()){
is almost surely a bug. eof() does not tell if you the next read will succeed, only whether the last one reached eof or not. Using it this way will tipically result in last line seemingly being read twice.
Always, always, check for success of a read operation before you use the result. That's idiomatically done by putting getline in the loop condition: while (getline(fin, tmp))
once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.