Find a string in a file C++ - c++

I am trying to parse a file in C++. My file contents are as follows:
//Comments should be ignored
FileVersion,1;
Count,5;
C:\Test\Files\Test_1.txt 0,16777216,16777552,0,0,1,0,1,1,1;
FileVersion is the first line I need to read information. All the previous lines are just comments which begin with a '//'. How do I set my cursor to line containing FileVersion? Becuase I am using fscanf to read the information from the file.
if ( 1 != fscanf( f, "FileVersion,%d;\n", &lFileVersion ))
{
//Successfully read the file version.
}

I like to write parsers (assuming "line-based") by reading a line at a time, and then using sscanf strncmp and strcmp (or C++'s std::stringstream and std::string::substr) to check for various content.
In your example, something like:
enum Sates
{
Version = 1,
Count = 2,
...
} state = Version;
char buffer[MAXLEN];
while(fgets(buffer, MAXLEN, f) != NULL)
{
if (0 == strncmp("//", buffer, 2))
{
// Comment. Skip this line.
continue;
}
switch (state)
{
case Version:
if (0 == strncmp("FileVersion,", buffer, 12))
{
if (1 == sscanf(buffer, "FileVersion,%d;", &version))
{
state = Count;
break;
}
Error("Expected file version number...");
}
break;
...
}
}
There are of course oodles of other ways to do this.

Since this is tagged C++, I will give you a C++ solution.
You can use a single call to f.ignore() to discard the first line of the stream:
f.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
Technically this skips everything up and including the newline at the end of the first line, so the stream position will be just before the newline on the second line. Formatted I/O discards leading whitespace so this will be no issue.
The above requires the use of C++ file streams since this is C++, and the use of the formatted operators operator>>() and operator<<() to perform input and output.

Not a particular C++ solution, but:
read a line with fgets (oh okay, if you want, you can substitute a C++ function for that);
if it starts with your 'comment' designator, skip to end of loop
if the line is empty (i.e., it contains only a hard return; or, possibly, check for zero or more whitespace characters and then an end-of-line), skip to end of loop
at end of loop: if you got something else, use sscanf on that string.

Related

c++ parsing comments using string buffer

I am trying to write a code that will read in c++ files, recognize comments, and store each words in comment into a vector. My problem is that I cannot find a way to read in a single line comment.
My logic is this: If the first character in a string buffer is '/' check for second char to determine whether it is a single line or multi-line comments. If the comment is single line, read in every word delimited by whitespace until I hit new line character '\n'. If the comment is multi-line, I will read in every word until I hit another */. so the code snippet for this is,
while(!input.eof())
{
string buffer;
input >> buffer;
//check if the line is comment
if(buffer[0] == '/')
{
//single line comment
if(buffer[1] == '/')
{
//read in until I hit newlineChar, and store all words into vector
while(buffer[0] != '\n')
{
input >> buffer;
vector.add(buffer);
}
}
//multiline comment
else if(buffer[1] == '*')
{
//read until I hit */ and store all words into vector
while(buffer[buffer.size()-1] != '*' && buffer[buffer.size()] != '/')
{
input >> buffer;
vector.add(buffer);
}
}
}
}
The Problem is with my understanding of new line character. I don't quite understand how string processes the new line char. I'm assuming the string treats new line char as another delimiter just like whitespace. But even in such case, there has to be a way to recognize end of a line using string. What could be a solution to this? Any input is appreciated.
EDIT: Taking advice of user4581301, I added the while loop that reads till the end of file. And to note the problem of lines having extraction operator followed by // like
std::cout<<"//this is not a comment.";
And one way I can think of to avoid this is to read in entire line using getline and char*.
char buffer[200];
input.getline(buffer,200);
string tempStr = buffer;
vector.add(tempStr);
In this case, how can I break individual string stored in vector into words?

Reading from a file without skipping whitespaces

I'm trying to make a code which would change one given word from a file, and change it into another one. The program works in a way that it copies word by word, if it's normal word it just writes it into the output file, and if it's the one i need to change it writes the one i need to change to. However, I've enountered a problem. Program is not putting whitespaces where they are in the input file. I don't know the solution to this problem, and I have no idea if I can use noskipws since I wouldn't know where the file ends.
Please keep in mind I'm a complete newbie and I have no idea how things work. I don't know if the tags are visible enough, so I will mention again that I use C++
Since each reading of word is ended with either a whitespace or end of file, you could simply check whether the thing which stop your reading is end of file, or otherwise a whitespace:
if ( reached the end of file ) {
// What I have encountered is end of file
// My job is done
} else {
// What I have encountered is a whitespace
// I need to output a whitespace and back to work
}
And the problem here is how to check the eof(end of file).
Since you are using ifstream, things will be quite simple.
When a ifstream reach the end of file (all the meaningful data have been read), the ifstream::eof() function will return true.
Let's assume the ifstream instance that you have is called input.
if ( input.eof() == true ) {
// What I have encountered is end of file
// My job is done
} else {
// What I have encountered is a whitespace
// I need to output a whitespace and back to work
}
PS : ifstream::good() will return false when it reaches the eof or an error occurs. Checking whether input.good() == false instead can be a better choice here.
First I would advise you not to read and write in the same file (at least not during reading) because it will make your program much more difficult to write/read.
Second if you want to read all whitespaces easiest is to read whole line with getline().
Program that you can use for modifying words from one file to another could look something like following:
void read_file()
{
ifstream file_read;
ofstream file_write;
// File from which you read some text.
file_read.open ("read.txt");
// File in which you will save modified text.
file_write.open ("write.txt");
string line;
// Word that you look for to modify.
string word_to_modify = "something";
string word_new = "something_new";
// You need to look in every line from input file.
// getLine() goes from beginning of the file to the end.
while ( getline (file_read,line) ) {
unsigned index = line.find(word_to_modify);
// If there are one or more occurrence of target word.
while (index < line.length()) {
line.replace(index, word_to_modify.length(), word_new);
index = line.find(word_to_modify, index + word_new.length());
}
cout << line << '\n';
file_write << line + '\n';
}
file_read.close();
file_write.close();
}

Using get line() with multiple types of end of line characters

I am using std::getline() in the following manner:
std::fstream verify;
verify.open(myURI.c_str());
std::string countingLine;
if(verify.is_open()){
std::getline(verify, countingLine);
std::istringstream iss(countingLine);
size_t pos;
// Check for the conventional myFile header.
pos = iss.str().find("Time,Group,Percent,Sign,Focus");
if(pos == std::string::npos){//its not there
headerChk = false;
this->setStatusMessage("Invalid header for myFile file");
return 0;
}
// loop that does more validation
iss.clear();
}
The problem is I'm coding on a mac (and some files get modified with both windows tools and apple tools). Some end of line characters are \r instead of \n, so my file string is never broken into lines. I believe there is also a third one I should be checking for. I'm having trouble finding an example of setting up the delim parameter for multiple endOfLine characters.
If someone could help with that example or a different approach that would be great.
Thanks
std::getline() only supports one end of line character. When opening a file in text mode, the system's end of line sequences are converted into one single end of line character (\n). However, this doesn't deal with end of line character sequences from other systems. Practically, all what really needs to be done is to remove the \r character from the input which remains. The best way to remove characters is probably to create a filtering stream buffer. Here is a trivial, untested, and probably slow one (it isn't buffering which means there is virtual function call for each individual character; this is horrific; creating a buffered version isn't much harder, though):
class normalizebuf
: std::streambuf {
std::streambuf* sbuf_;
char buffer_[1];
public:
normalizebuf(std::streambuf* sbuf): sbuf_(sbuf) {}
int underflow() {
int c = this->sbuf_->sbumpc();
while (c == std::char_traits<char>::to_int_type('\r')) {
c = this->sbuf->sbumpc();
}
if (c != std::char_traits<char>::eof()) {
this->buffer_[0] = std::char_traits<char>::to_char_type(c);
this->setg(this->buffer_, this->buffer_, this->buffer_ + 1);
}
return c;
};
You'd use this filter with an existing stream buffer, something like this:
std::ifstream fin("foo");
normalizebuf sbuf(fin.rdbuf());
std::istream in(&sbuf);
... and then you'd use in to read the file with all \r characters removed.

Reading from ifstream won't read whitespace

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.
Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the spaces now, but not the new-line character that I need to lex some constructs.
Here's the offending code (extended comments truncated)
while(input >> current) {
always_next_struct val = always_next_struct(next);
if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
continue;
}
if (current == L'/') {
input >> current;
if (current == L'/') {
// explicitly empty while loop
while(input.get(current) && current != L'\n');
continue;
}
I'm breaking on the while line and looking at every value of current as it comes in, and \r or \n are definitely not among them- the input just skips to the next line in the input file.
There is a manipulator to disable the whitespace skipping behavior:
stream >> std::noskipws;
The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.
Edit:
Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).
Edit (analyzing code):
After
while(input.get(current) && current != L'\n');
continue;
there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?
I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):
//: get.cpp : compile, then run: get < get.cpp
#include <iostream>
int main()
{
char c;
while (std::cin.get(c))
{
if (c == '/')
{
char last = c;
if (std::cin.get(c) && c == '/')
{
// std::cout << "Read to EOL\n";
while(std::cin.get(c) && c != '\n'); // this comment will be skipped
// std::cout << "go to next line\n";
std::cin.putback(c);
continue;
}
else { std::cin.putback(c); c = last; }
}
std::cout << c;
}
return 0;
}
This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.
If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...
You could open the stream in binary mode:
std::wifstream stream(filename, std::ios::binary);
You'll lose any formatting operations provided my the stream if you do this.
The other option is to read the entire stream into a string and then process the string:
std::wostringstream ss;
ss << filestream.rdbuf();
OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous.
EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.
Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.
Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.
You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .
/*Open the stream in default mode.*/
std::ifstream myfile("myfile.txt");
if(myfile.good()) {
/*Read data using streambuffer iterators.*/
vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));
/*str_buf holds all the data including whitespaces and newline .*/
string str_buf(buf.begin(),buf.end());
myfile.close();
}
By default, this skipws flag is already set on the ifstream object, so we must disable it. The ifstream object has these default flags because of std::basic_ios::init, called on every new ios_base object (more details). Any of the following would work:
in_stream.unsetf(std::ios_base::skipws);
in_stream >> std::noskipws; // Using the extraction operator, same as below
std::noskipws(in_stream); // Explicitly calling noskipws instead of using operator>>
Other flags are listed on cpp reference.
The stream extractors behave the same and skip whitespace.
If you want to read every byte, you can use the unformatted input functions, like stream.get(c).
Why not simply use getline ?
You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)
Just Use getline.
while (getline(input,current))
{
cout<<current<<"\n";
}
I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.

Reading a text document character by character

I am reading a text file character by character using ifstream infile.get() in an infinite while loop.
This sits inside an infinite while loop, and should break out of it once the end of file condition is reached. (EOF). The while loop itself sits within a function of type void.
Here is the pseudo-code:
void function (...) {
while(true) {
...
if ( (ch = infile.get()) == EOF) {return;}
...
}
}
When I "cout" characters on the screen, it goes through all the character and then keeps running outputting what appears as blank space, i.e. it never breaks. I have no idea why. Any ideas?
In C++, you don't compare the return value with EOF. Instead, you can use a stream function such as good() to check if more data can be read. Something like this:
while (infile.good()) {
ch = infile.get();
// ...
}
One idiom that makes it relatively easy to read from a file and detect the end of the file correctly is to combine the reading and the testing into a single, atomic, event, such as:
while (infile >> ch)
or:
while (std::getline(infile, instring))
Of course, you should also consider using a standard algorithm, such as copy:
std::copy(std::istream_iterator<char>(infile),
std::istream_iterator<char>(),
std::ostream_itertror<char>(std::cout, "\n"));
One minor note: by default, reading with >> will skip white space. When you're doing character-by-character input/processing, you usually don't want that. Fortunately, disabling that is pretty easy:
infile.unsetf(std::ios_base::skipws);
try converting the function to an int one and return 1 when reaching EOF
The reason it is not working is that get() returns an int but you are using the input as a char.
When you assign the result of get() to a char it is fine as long as the last character read was a character. BUT if the last character read was a special character (such as EOF) then it will get truncated when assigned to a char and thus the subsequent comparison to EOF will always fail.
This should work:
void function (...)
{
while(true)
{
...
int value;
if ( (value = infile.get()) == EOF) {return;}
char ch = value;
...
}
}
But it should be noted that it is a lot easier to use the more standard pattern where the read is done as part of the condition. Unfortunately the get() does not give you that functionality. So we need to switch to a method that uses iterators.
Note the standard istream_iterator will not work as you expect (as it ignores white space). But you can use the istreambuf_iterator (notice the buf after istream) which does not ignore white space.
void function (...)
{
for(std::istreambuf_iterator<char> loop(infile);
loop != std::istreambuf_iterator<char>();
++loop)
{
char ch = *loop;
...
}
}