C++ Reading a text file has extra spaces between characters - c++

We are using C++ on windows and Linux and UNIX and are reading a text file. On Windows with some files we are getting extra spaces between characters in a line.
I have seen an article that seems to explain this but we are getting errors.
C++ getline adding spaces
When we read some files created by applications on Windows there is a space between characters. When we create a text file in Windows it does not have extra spaces.
fstream file;
string fileline;
file.open(configuration_file, ios::in|ios::out);
// This line was added from the post and we get errors
file.imbue(std::locale(file.getloc(), new std::codecvt_utf16<char, 0x10FFFF, std::consume_header>));
if (!file){
print_progress("Configuration File does not exist\n");
}
else {
while(!file.eof()) {
getline(file, fileline);
std::cout << std::string(fileline) + "\n";
}
}
file.close();
How do we resolve this in C++? Is there a library that manages this?
Many thanks

Related

Problem in using seekg() to read file in C++

I am learning to read and write file in C++ and find a problem.
My test.txt file contains 3 string in 3 lines:
abc
def
mnp
My problem is: I don't understand why I need to use f.seekg(2, ios::cur);
instead of f.seekg(1, ios::cur);
I know how to use seekg() in c++ and I think that I just need to ignore 1 byte
to get the next line by the getline() function.
This is my code:
ifstream f;
f.open("D:\\test.txt", ios::in);
string str1, str2, str3;
f >> str1;
f.seekg(2, ios::cur);
getline(f, str2);
getline(f, str3);
cout << str1 << " " << str2 << " " << str3 << endl;
Reason for your trouble is explained for example here:
Why does std::getline() skip input after a formatted extraction?
However about your actual question about seekg. You open the file in text mode. This means that when you read the file, line feeds are given to your C++ code as single characters, '\n'. But on the disk they may be something else, and it seems you are running your code on Windows. There newline in a text file is typically two bytes, CR (ASCII code 13) and LF (ASCII code 10). Reading or writing in text mode will perform this conversion between a single character in your C++ string vs two bytes in the file for you.
seekg works on offsets and does not care about this, offsets are same whether you open the file in text or binary mode. If you use seekg to skip new line, your code becomes platform-dependent, on Windows you need to skip 2 bytes as explained above, while in other platforms such as Unix you need to skip just single byte.
So, do not use seekg for this purpose, see the linked question for better solutions.

How can i convert linux text file to windows text file by using qt?

When I copy text files to USB flash memory with Qt on raspberry pi 3 , and when I open these text files on Windows , text file '\n' characters not seem to work on Windows.
I searched this topic and I saw that text file formats are different on Linux and Windows.So I have to copy Linux based text files to Flash Memory with Qt and open these files on Windows.
There are a few characters which can indicate a new line. The usual ones are these two:
'\n' or '0x0A' (10 in decimal) -> This character is called "Line Feed" (LF).
'\r' or '0x0D' (13 in decimal) -> This one is called "Carriage return" (CR).
Different Operating Systems handle newlines in a different way. Here is a short list of the most common ones:
DOS and Windows :
They expect a newline to be the combination of two characters, namely '\r\n' (or 13 followed by 10).
Unix (and hence Linux as well) :
Unix uses a single '\n' to indicate a new line.
Mac :
Macs use a single '\r'.
EDIT : As MSalters mentioned Mac OSX is Unix and uses \n. The single \r is ancient Mac OS9
I guess you are just transporting the file, not doing anything with it, but I can't think of another option than opening it and rewrite the line endings.
If you open the .txt file on Windows and read from it (with c++ or c++/Qt) and then write the lines as you get them to a new file, the line endings should then fit the Windows sepcifics.
You can read the file like this:
std::ifstream file;
file.open(filePath);
std::ofstream file2;
file2.open(filePath2);
while(std::getline(file, line))
{
file2<<line;
}
std::getline
At least the documentation states that getline searches for '\n', it should work on windows and Unix. If it doesn't, you can still set the delimeter to '\n'.
If you want to write the file 'Windowslike' on your raspberry, you can try to replace the '\n' characters with '\r\n'
It should look somehow like this:
std::string myFileAsString;
std::string toReplace = "\n";
std::string replaceWith = "\r\n";
myFileAsString.replace(myFileAsString.find(toReplace), toReplace.length(), replaceWith);
where find searches for '\n' and then replaces it with '\r\n'
replace
find

C++ std::stringstream/ostringstream and UTF characters

I'm writing a program which processes some data, outputs it to a .csv file, then writes a GNUplot script, and calls GNUplot to execute the script and create an image file all with the same name (only different extensions). The filenames contain UTF characters (UTF-8 I believe?) such as °, φ and θ. All of this works perfectly fine when I compile and execute it in Linux with g++ 4.4.7. I then altered my code to compile in Microsoft Visual Studio 2008, and the problems start when I run the program.
I use the following two bits of code to
Make a standard filename string (to which I just add extensions for the various files)
Open a stream to write to a file (the only difference between the GNUplot script and the .csv files is the extensions
// Generate a file name string
stringstream ss;
ss << type << " Graph #" << gID << " - " << title;
string fileName = ss.str();
// Open a stream for the output file
ostringstream outfile;
outfile << fileName << ".gplt" << ends;
ofstream ofs( outfile.str().c_str() );
The contents of the ofstream files where ofs writes contain the UTF characters properly, however the stringstream-created string fileName and the ostringstream created filename (even when not created with fileName, I tested it) show the characters incorrectly.
Example:
What it should be - CDFvsRd Graph #32 - MWIR # 300m, no-sun, 30kts, θ=all°.csv
What it ends up as - CDFvsRd Graph #32 - MWIR # 300m, no-sun, 30kts, Ï=allË.csv
What can I do to remedy this, with as much standard C++ as possible? Would converting my fileName string to wstring help?
The solution was to write the Windows portion of the code to not export the filenames without the graph titles, omitting the UTF-8 characters from the filename. This wasn't a true solution, only a workaround.

How do I remove the character "" from the beginning of a text file in C++?

I'm trying to read a text file, and for each word, I will put them into a node of a binary search tree. However, the first character is always read as " + first word". For example, if my first word is "This", then the first word that is inserted into my node is "This". I've been searching the forum for a solution to fix it, there was one post asking the same problem in Java, but no one has addressed it in C++. Would anyone help me to fix it ? Thank you.
I came to the a simple solution. I opened the file in Notepad, and saved it as ANSI. After that, the file is reading and passing correctly into the binary search tree
That's UTF-8's BOM
You need to read the file as UTF-8. If you don't need Unicode and just use the first 127 ASCII code points then save the file as ASCII or UTF-8 without BOM
This is Byte Order Mark (BOM). It's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
In C++, you can use the following function to convert a UTF-8 BOM file to ANSI.
void change_encoding_from_UTF8BOM_to_ANSI(const char* filename)
{
ifstream infile;
string strLine="";
string strResult="";
infile.open(filename);
if (infile)
{
// the first 3 bytes (ef bb bf) is UTF-8 header flags
// all the others are single byte ASCII code.
// should delete these 3 when output
getline(infile, strLine);
strResult += strLine.substr(3)+"\n";
while(!infile.eof())
{
getline(infile, strLine);
strResult += strLine+"\n";
}
}
infile.close();
char* changeTemp=new char[strResult.length()];
strcpy(changeTemp, strResult.c_str());
char* changeResult = change_encoding_from_UTF8_to_ANSI(changeTemp);
strResult=changeResult;
ofstream outfile;
outfile.open(filename);
outfile.write(strResult.c_str(),strResult.length());
outfile.flush();
outfile.close();
}
in debug mode findout the symbol for the special character and then replace it
content.replaceAll("\uFEFF", "");

How to change EOL coding in txt files?

I am writing in Visual Studio 2008 in C++ and I have problems with other libraries - they do not accept the line endings (EOL) I generate with my txt files.
How can I change that while writing a file with
std::ofstream myFile;
myFile.open("traindata.txt");
myFile << "stuff" << endl;
// or
//myFile << "stuff" << '\n';
myFile.close();
EDIT 2 :
Ok, I did a mistake in code : I was appending "0 " for every iteration so that I had whitespace before the EOL.
By bad. You guys have been right. Thanks for help.
Is it possible that you just don't want the \n to end of line sequence to happen? Open your file using std::ios_base::binary: this turns off precisely the conversion. ... and don't use std::endl unless you really want to flush the stream:
std::ofstream myFile("traindata.txt", std::ios_base::binary);
myFile << "stuff\n";
The close() is typically also unnecessary unless you want to check that it was successful.
You should open the file stream in binary mode to keep C++ library from converting line endings automatically:
myFile.open("traindata.txt", std::ios_base::out|std::ios_base::binary);
That will keep C++ library from converting '\n' to OS-specific EOL symbol (CR-LF on Windows).
Download notepad++ - that should be able to fix the problem.
Or dos2unix, unix2dos on cygwin
Use myfile << "\r\n" for the Windows-style ending, or myfile << "\n" for the UNIX-style.