Unknown '\\' character in C++ - c++

I am working on a legacy codebase & came across a '\\' character:
do
{
string tmpLine;
getline( *testcaseFilePtr, tmpLine );
testcaseFileLineNumber++;
if( tmpLine.size() > 0 && tmpLine[tmpLine.size() - 1] == '\\' )
{
readAnotherLine = true;
tmpLine[tmpLine.size() - 1] = ' ';
}
else
{
readAnotherLine = false;
}
line.append( tmpLine );
} while( readAnotherLine );
As I have seen in gdb debugger, 'readAnotherLine' is always turning out to be false and do while is always exiting after a single iteration.
Suppose my input file looks like:
DEFINE xyz;
DEFINE_MODULE
{
cout << "Example snippet" << endl;
do_this; USER_MACRO ( do_that );
}
The debugger is showing that line string is containing one single line at a time and processing it in further steps. It is not concatenating all the lines of the input file in line altogether.
Please suggest whether it is a typo or it may have some functionality.
Thanks in advance for your suggestion!

The code is looking for lines in the input file like:
abc\
def\
ghi
and will read them into a single line: abc def ghi. In other words, a trailing \ is being treated as a line continuation character.

Related

seekg() seeminlgy skipping characters past intended position C++

I've been having an issue with parsing through a file and the use of seekg(). Whenever a certain character is reached in a file, I want to loop until a condition is met. The loops works fine for the first iteration but, when it loops back, the file seemingly skips a character and causes the loop to not behave as expected.
Specifically, the loop works fine if it is all contained in one line in the file, but fails when there is at least one newline within the loop in the file.
I should mention I am working on this on Windows, and I feel like the issue arises from how Windows ends lines with \r\n.
Using seekg(-2, std::ios::cur) after looping back fixes the issue when the beginning loop condition is immediately followed by a newline, but does not work for a loop contained in the same line.
The code is structured by having an Interpreter class hold the file pointer and relevant variables, such as the current line and column. This class also has a functional map defined like so:
// Define function type for command map
typedef void (Interpreter::*function)(void);
// Map for all the commands
std::map<char, function> command_map = {
{'+', increment_cell},
{'-', decrement_cell},
{'>', increment_ptr},
{'<', decrement_ptr},
{'.', output},
{',', input},
{'[', begin_loop},
{']', end_loop},
{' ', next_col},
{'\n', next_line}
};
It iterates through each character, deciding if it has functionality or not in the following function:
// Iterating through the file
void Interpreter::run() {
char current_char;
if(!this->file.eof() && this->file.good()) {
while(this->file.get(current_char)) {
// Make sure character is functional command (ie not a comment)
if(this->command_map.find(current_char) != this->command_map.end()) {
// Print the current command if in debug mode
if(this->debug_mode && current_char != ' ' && current_char != '\n') {
std::cout << this->filename << ":" << this->line << ":"
<< this->column << ": " << current_char << std::endl;
}
// Execute the command
(this->*(command_map[current_char]))();
}
// If it is not a functional command, it is a comment. The rest of the line is ignored
else{
std::string temp_line = "";
std::getline(file, temp_line);
this->line++;
this->column = 0;
}
this->temp_pos = file.tellg();
this->column++;
}
}
else {
std::cout << "Unable to find file " << this->filename << "." << std::endl;
exit(1);
}
file.close();
}
The beginning of the loop (signaled by a '[' char) sets the beginning loop position to this->temp_pos:
void Interpreter::begin_loop() {
this->loop_begin_pointer = this->temp_pos;
this->loop_begin_line = this->line;
this->loop_begin_col = this->column;
this->run();
}
When the end of the loop (signaled by a ']' char) is reached, if the condition for ending the loop is not met, the file cursor position is set back to the beginning of the loop:
void Interpreter::end_loop() {
// If the cell's value is 0, we can end the loop
if(this->char_array[this->char_ptr] == 0) {
this->loop_begin_pointer = -1;
}
// Otherwise, go back to the beginning of the loop
if(this->loop_begin_pointer > -1){
this->file.seekg(this->loop_begin_pointer, std::ios::beg);
this->line = this->loop_begin_line;
this->column = this->loop_begin_col;
}
}
I was able to put in debugging information and can show stack traces for further clarity on the issue.
Stack trace with one line loop ( ++[->+<] ):
+ + [ - > + < ] [ - > + < ] done.
This works as intended.
Loop with multiple lines:
++[
-
>
+<]
Stack trace:
+ + [ - > + < ] > + < ] <- when it looped back, it "skipped" '[' and '-' characters.
This loops forever since the end condition is never met (ie the value of the first cell is never 0 since it never gets decremented).
Oddly enough, the following works:
++[
-
>+<]
It follows the same stack trace as the first example. This working and the last example not working is what has made this problem hard for me to solve.
Please let me know if more information is needed about how the program is supposed to work or its outputs. Sorry for the lengthy post, I just want to be as clear as possible.
Edit 1:
The class has the file object as std::ifstream file;.
In the constructor, it is opened with
this->file.open(filename), where filename is passed in as an argument.
For a file stream, seekg is ultimately defined in terms of fseek from the C standard library. The C standard has this to say:
7.21.9.2/4 For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
So for a file opened in text mode, you can't do any arithmetic on offsets. You can rewind to the beginning, position at the end, or return to the position you were at previously and captured with tellg (which ultimately calls ftell). Anything else would exhibit undefined behavior.

Issues with reading from a .txt file c++

I was seeking some help on an issue. I have to read certain "passwords" from a .txt file, like "abE13#" and do some simple error checking to make sure they fit certain requirements. But at the current moment, it's printing out the passwords (which is meant to be done), but it's ignoring the checking and gets stuck in a loop where new lines are being printed out. I'm sure it has to do something with while(ch!='\n') but I'm not all that sure what is needed there in place of that to check.
ch = inFile.get();
while(!inFile.eof())
{
while(ch != '\n')
{
cout << ch;
if(isalpha(ch))
{
charReq++;
if(isupper(ch))
uppercaseReq++;
else
lowercaseReq++;
}
else if(isdigit(ch))
{
charReq++;
digitReq++;
}
else if(isSpecial(ch))
{
charReq++;
specialCharReq++;
}
if(uppercaseReq < 1)
cout << "\n missing uppercase" << endl;
ch = inFile.get();
}
}
It's supposed to kind of follow this format,
Read a character from the password.txt file
while( there are characters in the file )
{
while( the character from the file is not a newline character )
{
Display the character from the file
Code a cascading decision statement to test for the various required characters
Increment a count of the number of characters in the password
Read another character from the password.txt file
}
Determine if the password was valid or not. If the password was invalid,
display the things that were wrong. If the password was valid, display that it
was valid.
Read another character from the file (this will get the character after the
newline character -- ie. the start of a new password or the end of file)
}
Display the total number of passwords, the number of valid passwords, and the
number of invalid passwords
It keeps prints because of this while(inFile). This is always true. Change it to an if statement just to check if file is open:
if ( inFile )
EDIT: It will stop at the first password because of this while(ch != '\n'). When he gets to the end of the first password ch will be '\n', while fails and stop reading. Change it to:
while( !inFile.eof() )
while( the character from the file is not a newline character )
You have converted this line of pseudocode into this line of c++ code:
while (ch != '\t')
'\t' is the tab character, not the newline character. This could definitely cause problems as to why you are never ending and instead just printing out new lines (Really EOF, but you don't see that).
'\n' is the newline character.
Give that a try.
EDIT:
Also, your only checking for an entire ifstream to be false. I don't quite know when that would happen, but I would recommend checking for the EOF flag. Your code should turn into something along the lines of this:
while( !inFile.eof() )
{
while(ch != '\n' && !inFile.eof() )
{
// ...
}
}
If you don't check for infile twice, you may end up in an infinite loop.
while(infile.good())
{
while (inFile.good() && ch != '\n')
{
...
}
if (ch == '\n')
{...}
else
{...}
}

How to remove newlines inside csv cells using regex/terminal tools?

I have a csv file where some of the cells have newline character inside. For example:
id,name
01,"this is
with newline"
02,no newline
I want to remove all the newline characters inside cells.
How to do it with regex or with other terminal tools generically without knowing number of columns in advance?
This is actually a harder problem than it looks, and in my opinion, means that regex isn't the right solution. Because you're dealing with quoting/escaped strings, spanning multiple 'lines' you end up with a complicated and difficult to read regex. (It's not impossible, it's just messy).
I would suggest instead - use a parser. Perl has one in Text::CSV and it goes a bit like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline( \*ARGV ) ) {
s/\n/ /g for #$row;
$csv->print( \*STDOUT, $row );
}
This will take files as piped in/specified on command line - that's what \*ARGV does - it's a special file handle that lets you do ... basically what sed does:
somecommand.sh | myscript.pl
myscript.pl filename_to_process
The ARGV filehandle doe either automagically. (You could explicitly open a file or use \*STDIN if you prefer)
I suspect that instead of removing the newline you actually want to replace it with a space. If your input file is as simple as it looks this should do it for you:
$ awk '{ORS=( (c+=gsub(/"/,"&"))%2 ? FS : RS )} 1' file
id,name
01,"this is with newline"
02,no newline
If you are using this xlsx2csv tool, it has this option:
-e, --escape Escape \r\n\t characters
Use it, and then replace \n as needed, like (if \n should be replaced by the empty string):
sed 's/\\n//g' filein.csv` > fileout.csv
In one pass:
PATH/TO/xlsx2csv.py -e filein.xlsx | sed 's/\\n//g' > fileout.csv
How to do it with regex or with other terminal tools generically without knowing number of columns in advance?
I don't think a regex is the most appropriate approach and might end up being quite complicated. Instead, I think a separate program to process the files might be easier to maintain in the long-term.
Since you're OK with any terminal tools, I've chosen python, and the code's below:
#!/usr/bin/python3 -B
import csv
import sys
with open(sys.argv[1]) as csvfile:
reader = csv.reader(csvfile)
for row in reader:
stripped = [col.replace('\n', ' ') for col in row]
print(','.join(stripped))
I think the code above is very straightforward and easy to understand, without a need for complicated regular expressions.
The input file here has the following contents:
id,name
01,"this is
with newline"
02,no newline
To prove it works, its output is reproduced below:
➜ ~ ./test.py input.csv
id,name
01,this is with newline
02,no newline
You could call the python script from some other program and feed filenames to it. You just need to add a minor update for the python program to write out files, if that's what you really need.
I've replaced the newlines with spaces to avoid a potentially unwanted concatenation (e.g. this iswith newline), but you can replace the newline with whatever you want, including the empty string ''.
I have written a method to remove the embedded new line inside the cell. The method below returns a java.util.List object that contains all rows in the CSV file
List<String> getAllRowsInCSVFileAsList(File selectedCSVFile){
FileReader fileReader = null;
BufferedReader reader = null;
List<String> values = new ArrayList<String>();
try{
fileReader = new FileReader(selectedCSVFile);
reader = new BufferedReader(fileReader);
String line = reader.readLine();
String previousLine = "";
//
boolean intendLineInCell = false;
while(line != null){
if(intendLineInCell){
if(line.indexOf("\"") != -1 && line.indexOf("\"") == line.lastIndexOf("\"")){
previousLine += line;
values.add(previousLine);
previousLine = "";
intendLineInCell = false;
} else if(line.indexOf("\"") != -1 && line.indexOf("\"") != line.lastIndexOf("\"")){
if(getTotalNumberOfCharacterSequenceOccurrenceInString("\"", line) % 2 == 0){
previousLine += line;
}else{
previousLine += line;
values.add(previousLine);
previousLine = "";
intendLineInCell = false;
}
} else{
previousLine += line;
}
}else{
if(line.indexOf("\"") == -1){
values.add(line);
}else if ((line.indexOf("\"") == line.lastIndexOf("\"")) && line.indexOf("\"") != -1){
intendLineInCell = true;
previousLine = line;
}else if(line.indexOf("\"") != line.lastIndexOf("\"") && line.indexOf("\"") != -1){
values.add(line);
}
}
line = reader.readLine();
}
}catch(IOException ie){
ie.printStackTrace();
}finally{
if(fileReader != null){
try {
fileReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if(reader != null){
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return values;
}
int getTotalNumberOfCharacterSequenceOccurrenceInString(String characterSequence, String text){
int count = 0;
while(text.indexOf(characterSequence) != -1){
text = text.replaceFirst(characterSequence, "");
count++;
}
return count;
}
Imagine you are creating a csv file with one row and five columns and in the 4th cell you have an embedded new line(enter inside the cell)
Your data will be look like below (actually we have only one row in csv but if you opened it in notepad it would look like 2 rows).
dinesh,kumar,24,"23
tambaram india",green
If there is a enter inside the cell could be like below
"23
tambaram india"
That cell starts with double quote(") and ends with double quote(").
Through using the double quote(") while reading the line if there is a double quote(") we can understand there is a embedded enter inside the cell.
The code concats the next line with that line and checks whether there is an end double quote(") or not. If there is, it adds a new row in the java.util.List object else it concats the next line and check it for end double quote(") and so on. Here I have explained for one cell, but the method also works if the row has a lot of cells with embedded enter.
Open the *csv file with notepadd++ and then press Ctrl+ H. Go to tab replace and enter to search box the "newline" and then write to replace the word you want to replace or let it empty if you want.

Unix c++: getline and line.empty not working

Happy New Year, everyone!
I have a text file that looks like this:
A|AAAAA|1|2
R|RAAAA
R|RAAAA
A|BBBBB|1|2
R|RBBBB
R|RBBBB
A|CCCCC|1|2
R|RCCCC
The following code searches for the relevant text in the file based on the key and returns all the lines that belong to the key:
while( std::getline( ifs, line ) && line.find(search_string) != 0 );
if( line.find(search_string) != 0 )
{
navData = "N/A" ;
}
else{
navData = line + '\n' ; // result initially contains the first line
// now keep reading line by line till we get an empty line or eof
while( std::getline( ifs, line ) && !line.empty() )
{
navData += line + '\n';
}
}
ifs.close();
return navData;
In Windows I get what I need:
A|BBBBB|1|2
R|RBBBB
R|RBBBB
In Mac, however, code "&& !line.empty()" seems to get ignored, since I get the following:
A|BBBBB|1|2
R|RBBBB
R|RBBBB
A|CCCCC|1|2
R|RCCCC
Does anyone know why?
Cheers, everyone!
Windows and Mac have different opinions about how an empty line looks like. On Windows, lines are teminated by "\r\n". On Mac, lines are terminated by "\n" and the preceding "\r" leads to the line not being empty.

std::getline removes whitespaces?

So I am creating a command line application and I am trying to allow commands with parameters, or if the parameter is enclosed with quotations, it will be treated as 1 parameter.
Example: test "1 2"
"test" will be the command, "1 2" will be a single parameter passed.
Using the following code snippet:
while(getline(t, param, ' ')) {
if (param.find("\"") != string::npos) {
ss += param;
if (glue) {
glue = false;
params.push_back(ss);
ss = "";
}
else {
glue = true;
}
}
else {
params.push_back(param);
}
}
However std::getline seems to auto remove whitespace which is causing my parameters to change from "1 2" to "12"
I've looked around but results are flooded with "How to remove whitespace" answers rather than "How to not remove whitespace"
Anybody have any suggestions?
However std::getline seems to auto remove whitespace
That's exactly what you are telling getline to do:
getline(t, param, ' ');
The third argument in getline is the delimiter. If you want to parse the input line, you should read it until '\n' is found and then process it:
while(getline(t, param)) {
/* .. */
}
Umm, you are telling it to use ' ' as a delimiter in std::getline. Of course it's going to strip the whitespace.
http://www.cplusplus.com/reference/string/getline/