c++ parsing comments using string buffer - c++

I am trying to write a code that will read in c++ files, recognize comments, and store each words in comment into a vector. My problem is that I cannot find a way to read in a single line comment.
My logic is this: If the first character in a string buffer is '/' check for second char to determine whether it is a single line or multi-line comments. If the comment is single line, read in every word delimited by whitespace until I hit new line character '\n'. If the comment is multi-line, I will read in every word until I hit another */. so the code snippet for this is,
while(!input.eof())
{
string buffer;
input >> buffer;
//check if the line is comment
if(buffer[0] == '/')
{
//single line comment
if(buffer[1] == '/')
{
//read in until I hit newlineChar, and store all words into vector
while(buffer[0] != '\n')
{
input >> buffer;
vector.add(buffer);
}
}
//multiline comment
else if(buffer[1] == '*')
{
//read until I hit */ and store all words into vector
while(buffer[buffer.size()-1] != '*' && buffer[buffer.size()] != '/')
{
input >> buffer;
vector.add(buffer);
}
}
}
}
The Problem is with my understanding of new line character. I don't quite understand how string processes the new line char. I'm assuming the string treats new line char as another delimiter just like whitespace. But even in such case, there has to be a way to recognize end of a line using string. What could be a solution to this? Any input is appreciated.
EDIT: Taking advice of user4581301, I added the while loop that reads till the end of file. And to note the problem of lines having extraction operator followed by // like
std::cout<<"//this is not a comment.";
And one way I can think of to avoid this is to read in entire line using getline and char*.
char buffer[200];
input.getline(buffer,200);
string tempStr = buffer;
vector.add(tempStr);
In this case, how can I break individual string stored in vector into words?

Related

C++ read different kind of datas from file until there's a string beginning with a number

In C++, I'd like to read from an input file which contains different kind of datas: first the name of a contestant (2 or more strings with whitespaces), then an ID (string without whitespaces, always beginning with a number), then another strings without ws and a numbers (the sports and their achieved places).
For example:
Josh Michael Allen 1063Szinyei running 3 swimming 1 jumping 1
I show you the code what I started to write and then stucked..
void ContestEnor::next()
{
string line;
getline(_f , line);
if( !(_end = _f.fail()) ){
istringstream is(line);
is >> _cur.contestant >> _cur.id; // here I don't know how to go on
_cur.counter = 0;
//...
}
}
Thank you for your help in advance.
You should look into using std::getline with a delimiter. This way, you can delimit on a space character and read until you find a string where the first character in a number. Here is a short code example (this seems rather homework-like, so I don't want to write too much of it for you ;):
std::string temp, id;
while (std::getline(_f, temp, ' ')) {
if (temp[0] >= 0 && temp[0] <= '9') {
id = temp;
}
// you would need to add more code for the rest of the data on that line
}
/* close the file, etc. */
This code should be pretty self-explanatory. The most important thing to know is that you can use std::getline to get data up until a delimiter. The delimiter is consumed, just like the default behavior of delimiting on a newline character. Thus, the name getline isn't entirely accurate - you can still get only part of a line if you need to.

I filed my vector from a text file and it wont cout as one line. How can I do this?

Long story short I need my vector to cout as a single line without creating its own new lines for my program to work correctly. the text file i read into the vector was
laptop#a small computer that fits on your lap#
helmet#protective gear for your head#
couch#what I am sitting on#
cigarette#smoke these for nicotine#
binary#ones and zeros#
motorcycle#two wheeled motorized bike#
oj#orange juice#
test#this is a test#
filled the vector using the loop:
if(myFile.is_open())
{
while(getline(myFile, line, '#'))
{
wordVec.push_back(line);
}
cout << "words added.\n";
}
and printed it using this:
for(int i = 0; i < wordVec.size(); i++)
{
cout << wordVec[i];
}
and it outputs as such:
laptopa small computer that fits on your lap
helmetprotective gear for your head
couchwhat I am sitting on
cigarettesmoke these for nicotine
binaryones and zeros
motorcycletwo wheeled motorized bike
ojorange juice
testthis is a test
my program works if I manually input the words and add them to my data structure but if added from the vector which is filled via text file, half of the program doesnt work. before anyone says asks for a better description of the problem, all I need to know is how to fill the vector so that it will output as a single line.
You code getline(myFile, line, '#') reads everything up to end-of-file or the next '#' into line - that includes any newlines. So, as you read text file content...
laptop#a small computer that fits on your lap#
helmet#protective gear for your head#
...which you could also think of as...
"laptop#a small computer that fits on your lap#\nhelmet#protective gear for your head#"
...line takes on successive values...
"laptop"
"a small computer that fits on your lap"
"\nhelmet"
...etc....
Note the newline in "\nhelmet".
There are many ways to avoid or correct this, such as...
while ((myFile >> std::skipws) and getline(myFile, line, '#'))
...
...or...
if (not line.empty() and line[0] == '\n')
line.erase(0, 1);
...or (as Barry suggests in comments)...
while (getline(myFile, line))
{
std::istringstream iss(line);
std::string field;
while (getline(iss, field, '#'))
...
}
while(getline(myFile, line, '#'))
Here, you told std::getline to use the '#' character instead of a newline, '\n', as a delimiter.
So, this simply means that std::getline will no longer think there's anything special about '\n'. It's just another character that std::getline() will keep reading, looking for the next #.
So, you end up reading newline characters into your individual strings, and then outputing them to std::cout, as part of the strings you've printed.

Deleting from a certain point in a file to the end of the line?

I'm having some trouble with detecting two '//' as a char and then deleting from the first '/' till the end of the line (im guessing /n comes into use here).
{
ifstream infile;
char comment = '//';
infile.open("test3.cpp");
if (!infile)
{
cout << "Can't open input file\n";
exit(1);
}
char line;
while (!infile.eof())
{
infile.get(line);
if (line == comment)
{
cout << "found it" << endl;
}
}
return 0;
}
In the test3.cpp file there are three comments, so 3 lots of '//'. But I can't detect the double slash and can only detect a single / which will affect other parts of the c++ file as I only want to delete from the beginning of a comment to the end of the line?
I'm having some trouble with detecting two '//' as a char
That's because // is not a character. It is a sequence of two characters. A sequence of characters is known as a string. You can make string literals with double quotation marks: "//".
A simple solution is to compare the current input character from the stream to the first character of the string "//" which is '/'. If it matches, then compare the next character from the stream with the second character in the string that is searched for. If you find two '/' in a row, you have your match. Or you could be smart and read the entire line into a std::string and use the member functions to find it.
Also:
while (!infile.eof())
{
infile.get(line);
// using line without testing eof- and badbit
This piece of code is wrong. You test for eofbit before reading the stream and process the input.
And your choice of name for the line variable is a bit confusing since it doesn't contain the entire. line but just one character.

How to NOT use \n as delimiter in getline()

I'm trying to read in lines from a plain text file, but there are line breaks in the middle of sentences, so getline() reads until a line break as well as until a period. The text file looks like:
then he come tiptoeing down and stood right between us. we could
a touched him nearly. well likely it was minutes and minutes that
there warnt a sound and we all there so close together. there was a
place on my ankle that got to itching but i dasnt scratch it.
My read-in code:
// read in sentences
while (file)
{
string s, record;
if (!getline( file, s )) break;
istringstream ss(s);
while (ss)
{
string s;
if (!getline(ss, s, '.')) break;
record = s;
if(record[0] == ' ')
record.erase(record.begin());
sentences.push_back(record);
}
}
// output sentences
for (vector<string>::size_type i = 0; i < sentences.size(); i++)
cout << sentences[i] << "[][][][]" << endl;
The purpose of the [ ][ ][ ][ ] was to check if linebreaks were used as delimiters and were not just being read into the string. The output would look like:
then he come tiptoeing down and stood right between us.[][][][]
we could[][][][]
a touched him nearly.[][][][]
well likely it was minutes and minutes that[][][][]
there warnt a sound and we all there so close together.[][][][]
there was a[][][][]
place on my ankle that got to itching but i dasnt scratch it.[][][][]
What exactly is your question?
You're using getline() to read from the file stream with a newline delimiter, then parsing that line with a getline() using the istringstream is and a delimiter '.'. So of course you're getting your strings broken at both the new line and the '.'.
getdelim() works like getline(), except that a line delimiter other than newline can be specified as the delimiter argument. As with getline(), a delimiter character is not added if one was not present in the input before end of file was reached.
ssize_t getdelim(char **restrict lineptr, size_t *restrict n, int delimiter, FILE *restrict stream);

Reading from ifstream won't read whitespace

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.
Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the spaces now, but not the new-line character that I need to lex some constructs.
Here's the offending code (extended comments truncated)
while(input >> current) {
always_next_struct val = always_next_struct(next);
if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
continue;
}
if (current == L'/') {
input >> current;
if (current == L'/') {
// explicitly empty while loop
while(input.get(current) && current != L'\n');
continue;
}
I'm breaking on the while line and looking at every value of current as it comes in, and \r or \n are definitely not among them- the input just skips to the next line in the input file.
There is a manipulator to disable the whitespace skipping behavior:
stream >> std::noskipws;
The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.
Edit:
Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).
Edit (analyzing code):
After
while(input.get(current) && current != L'\n');
continue;
there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?
I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):
//: get.cpp : compile, then run: get < get.cpp
#include <iostream>
int main()
{
char c;
while (std::cin.get(c))
{
if (c == '/')
{
char last = c;
if (std::cin.get(c) && c == '/')
{
// std::cout << "Read to EOL\n";
while(std::cin.get(c) && c != '\n'); // this comment will be skipped
// std::cout << "go to next line\n";
std::cin.putback(c);
continue;
}
else { std::cin.putback(c); c = last; }
}
std::cout << c;
}
return 0;
}
This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.
If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...
You could open the stream in binary mode:
std::wifstream stream(filename, std::ios::binary);
You'll lose any formatting operations provided my the stream if you do this.
The other option is to read the entire stream into a string and then process the string:
std::wostringstream ss;
ss << filestream.rdbuf();
OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous.
EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.
Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.
Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.
You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .
/*Open the stream in default mode.*/
std::ifstream myfile("myfile.txt");
if(myfile.good()) {
/*Read data using streambuffer iterators.*/
vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));
/*str_buf holds all the data including whitespaces and newline .*/
string str_buf(buf.begin(),buf.end());
myfile.close();
}
By default, this skipws flag is already set on the ifstream object, so we must disable it. The ifstream object has these default flags because of std::basic_ios::init, called on every new ios_base object (more details). Any of the following would work:
in_stream.unsetf(std::ios_base::skipws);
in_stream >> std::noskipws; // Using the extraction operator, same as below
std::noskipws(in_stream); // Explicitly calling noskipws instead of using operator>>
Other flags are listed on cpp reference.
The stream extractors behave the same and skip whitespace.
If you want to read every byte, you can use the unformatted input functions, like stream.get(c).
Why not simply use getline ?
You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)
Just Use getline.
while (getline(input,current))
{
cout<<current<<"\n";
}
I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.