MSVC++ 2008 text file parsing difference between win7 <-> xp - c++

Work environment
Windows 7 Ultimate X64 SP1
Microsoft Visual Studio 2008 SP1 (C++)
Text file
My program is parsing pure text-file and running. the next code is that text-file.
/*main.txt*/
_function main()
{
a = 3; //a is 3
b = "three";
c = 4.3; //automatic typecast
_outd(b+"3\n");
_outd(b+a+c);
d = a/c;
_outd("\n"+d+"\n"); ///test commantation
/*//
comment
*/
}
it is a some kind of simple c-styple script that editable at a text editor like notepad.(the contents in this text is not important)
Code
Text open code.
std::wstring wsFileName; //from
char* sz_FileName = new char[wsFileName.size()+1];
ZeroMemory(sz_FileName,sizeof(char)*(wsFileName.size()+1));
//////////////////////////////////////////////////////////////////////////
//for using 'fopen', change wsFileName to ascii
WideCharToMultiByte(CP_UTF8,0,wsFileName.c_str(),-1,sz_FileName,wsFileName.size(),NULL,NULL);
//open ascii text file
FILE* fp=fopen(sz_FileName,"rt");
std::string ch;
while(!feof(fp))
{
ch.push_back(fgetc(fp)); //get textfile in ascii
}
Problem
Watching std::string ch in debugger that:
That is exactly same i expected result in Windows 7. And the released-execution-file is work fine too. But in Windows XP SP3, it invokes a runtime error.
I dug around a whole night, and finally I found it. I installed the visual studio 2008 sp1 on windows xp sp3 and debugged it.
Watching std::string ch:
I guessed the "newline mark" is the reason why that program crashed in windows xp. and i guessed this problem based 'text encoding between windows 7 and windows xp'. so i make the new script text using windows xp's notepad. and try to parse it. But the windows xp has the same problem. i know that : The problem occurs not only the text file orignated windows 7 but also a new generated text in windows xp. i don't know why the problem happens.
Question
a pure text file (generated by notepad, editplus, ultraedit, and others...) that generated any windows os(xp & vista & 7), how my program can read that text same result?
i want each user can edit script to control and customize my program in their own OS, using their own simple text editor.

Why don't test the char before adding to the string if it contains a unwanted character? E.g. something like:
char temp;
while(...)
{
temp = fgetc(fp);
if (temp != "\r")
ch.push_back(temp);
}
I haven't tested the code, but it should work, if the "wrong" char is indeed "\r".

Related

Working with UTF-8 std::string objects in C++

I'm using Visual Studio and C++ on Windows to work with small caps text like ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ using e.g. this website. Whenever I read this text from a file or put this text directly into my source code using std::string, the text visualizer in Visual Studio shows it in the wrong encoding, presumably the visualizer uses Windows (ANSI). How can I force Visual Studio to let me work with UTF-8 strings properly?
std::string message_or_file_path = "...";
auto message = message_or_file_path;
// If the file path is valid, read from that file
if (GetFileAttributes(message_or_file_path.c_str()) != INVALID_FILE_ATTRIBUTES
&& GetLastError() != ERROR_FILE_NOT_FOUND)
{
std::ifstream file_stream(message_or_file_path);
std::string text_file_contents((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());
message = text_file_contents; // Displayed in wrong encoding
message = "ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in wrong encoding
std::wstring wide_message = L"ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in correct encoding
}
I tried the additional command line option /utf-8 for compiling and setting the locale:
std::locale::global(std::locale(""));
std::cout.imbue(std::locale());
Neither of those fixed the encoding issue.
From What’s Wrong with My UTF-8 Strings in Visual Studio?, there are a couple of ways to see the contents of a std::string with UTF-8 encoding.
Let's say you have a variable with the following initialization:
std::string s2 = "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9f\x8d\x8c";
Use a Watch window.
Add the variable to Watch.
In the Watch window, add ,s8 to the variable name to display its contents as UTF-8.
Here's what I see in Visual Studio 2015.
Use the Command Window.
In the Command Window, use ? &s2[0],s8 to display the text as UTF-8.
Here's what I see in Visual Studio 2015.
A working solution was simply rewriting all std::strings as std::wstrings and adjusting the code logic properly to work with std::wstrings, as indicated in the question as well. Now everything works as expected.

C++ MFC VC 6.0 to VS2013 lStreamReturn = GetRichEditCtrl().StreamIn(SF_RTF, es);

Found an issue when converting the Tool from the VC++ 6.0 to VS2013. The error is not an actual error in the code as the code compiles with no "errors" and it works just fine. The program has been adjusted minimally, almost no real change to the code, to allow the program to run and function correctly in VS2013, or so I thought. When we tested the code to read from a external memory device, it displayed the RichText tree in the left pane of the application which seemed to work or function with what seemed to be all the data present, but the Rich Text we are so used to visually seeing was not present in the Right pane of the main application. What peaked my interest was the fact that in the original program you couldn't edit the text, but in our newest compiled program, you could see that the area had not changed from its original state. Almost as if that the data was getting to the application, but for some odd reason was getting dismissed or deleted right before displaying to the pane.
So here's the problem, when the WCARichEdit.cpp does this
"
EDITSTREAM es;
es.dwError=0;
es.dwCookie = (DWORD) &Report;
es.pfnCallback = CBStreamIn;
lStreamReturn = GetRichEditCtrl().StreamIn(SF_RTF, es);
GetRichEditCtrl().SetReadOnly(TRUE);
"
It breaks or throws the error 0 unless SF_RTF is changed to SF_TEXT. The code then generates all the data, but the formatting is read into the stream of text. One giant stream that is. We are under some assumption that the formatting in this code is the culprit as to why the text is not showing up when we compile our code. So when the SplitterFrame.CPP does this
"
Void CSplitterFrame::DisplayReport(CString Report)
{
CWcaRichEdit*RichEditView = (CWcaRichEdit*) m_wndSplitter.GetPane(0,1);
CH1_MainteanceToolDoc*pDoc = (CH1_MainteanceToolDoc*)
((CMainFrame *)AfxGetMainWnd())->GetActiveDocument();
RichEditView->DisplayReport(pDoc, Report);
}
"
The RichEditView->DisplayReport(pDoc, Report) doesn't seem to be getting any code as it just gets zeroed out. This is confirmed by the dwError=0 displaying no change when SF_RTF is left unchanged.
Any Thoughts as to how to get this Rich Text to display?
During troubleshooting this code below was written to push the string to a text file.
#if
DWORD dwError;
CFile testfile;
if (0 == testfile.Open ("C:\\...rtftestfile.txt", CFile::modeCreate | CFile:modeWrite | CFile::shareDenyNone))
{
dwError = GetLastError();
{
testfile.Write((LPCTSTR) Report, Report.GetLength());
testfile.Close();
#endif
The file was created successfully, and out of a whim, the decision was made to save the file after opening the .text file in WordPad. We then saved the file as a new .rtf file extension. Oddly enough the program didn't view all of our formatting but rather added some code in the mix as the size of the wordpad file and the text file varied by size. We then took each file and "Drag and Drop" 'ed them into the notepad program for further review. Strangely enough a "\rtf1" was added to the beginning of our gigantic string. Odd, why would WordPad add that...wait. The realization came and we went back and changed our code from
const char RTF_Header[] = "{\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fontbl{\\f0\\fnil\\fcharset0 Courier New;}}\\viewkind4\\uc1\\pard\\fs17 ";
to
const char RTF_Header[] = "{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fontbl{\\f0\\fnil\\fcharset0 Courier New;}}\\viewkind4\\uc1\\pard\\fs17 ";
The learning point is this, if you know your formatting is breaking your code, print that giant string to a file to see what it's doing and push it into something that will place the rtf formatting where it is missing.
The other option is to have someone on hand that loves to utilize the awesome power of Rich Text and can remember all the ways to format it.
Also here is the Microsoft Forum discussion if you want to brave it:
Microsoft Forum GetRichEditCtrl().StreamIn breaks on formating

Reading of text file in Ubuntu has extra //r

I am porting a program created in C++ from MS Studio to Ubuntu . The program works fine except when it reads from a text file .
My text file consists of lines of information seperated by the delimiter :
General Manager:G001:def
Customer:C001:def:Lim:Tom:Mr:99999999:zor#hotmail.com:Blk 145 B North #03-03 Singapore 111111
Read method
while (getline(afile,line,'\n')) //read line and store string in variable line
{
stringstream ss(line);
string s;
while (getline(ss,s,':'))
{
word.push_back(s);
}
word.clear();
}
On Windows platform , it is stored correctly as def
However on Ubuntu platform , it is stored as def\\r
It works fine for Customer Record but gives problem for General Manager
I know it has something to do with Carriage return but I am not sure how to resolve it
If the text file was created on Windows, you can use the dos2unix command to remove the extra \r's from the file. The command is simply dos2unix filenamegoeshere

FStream reading a binary file written with Delphi's binary writer

im creating a dll in MS Visual Studio 2010 Express which loads a binary data file (*.mgr extension -> used exclusively in my company's applications) using fstream library in C++. The file is created with an app developed by someone else in my company who is using Delphi. He says the first 15 bytes should be some characters which indicate the date the file was created and some other stuff like version of the app:
"XXXX 2012".
The result after loading with fstream (in binary mode) and writing another file with fstream (string mode) is as follows:
"[] X X X X 2 0 1 2"
The first char is an unknown char (rectangle) then there are spaces between each char. Finally it is 31 bytes wide. 15 for actual chars + 15 for white spaces + 1 for the rect char = 31.
Some other information:
I'm using C++, the app developer is using Delphi.
Im using fstream. he is using BW.Write() function. (BW == Binary Writer?)
He uses Windows 7 whilst i use Windows XP Professional.
Can you make a diagnosis of the problem?
Thanks in advance
First Edit: I'm adding c++ code that loads those first bytes.
Firstly he is using Delphi XE2 from embarcadero Rad Studio XE2.
From what i know PChar is a null-terminated string consisting of widechars (since delphi 2009) which are 2 bytes wide as opposed to normal chars (one byte). So basically he's saving words instead of bytes.
here is the code loading the mgr:
wchar_t header[15];
DXFLIBRARY_API void loadMGR(const char* szFileName, const char* szOutput)
{
fstream file;
file.open( szFileName, ios::binary | ios::in );
if(file.is_open())
{
file.read(reinterpret_cast<char*>(header),sizeof(header));
}
file.close();
//zapis
fstream saveFile;
saveFile.open( szOutput, ios::out );
if(saveFile.is_open())
{
saveFile.write(reinterpret_cast<const char*>(header),sizeof(header));
}
saveFile.close();
}
Header contains 15 wchar_t's so we get 30 bytes. Still after investigating i have no idea how to convert.
It seems pretty clear that somewhere along the way the data is being mangled between an 8 bit text encoding and a 16 bit encoding. The spurious first character is almost certainly the UTF-16 BOM.
One possible explanation is that the Delphi developer is writing UTF-16 encoding text to the file. And presumably you are expecting an 8 bit encoding.
Another explanation is that the Delphi code is correctly writing out 8 bit text, but that your code is mangling it. Perhaps your read/write code is doing that.
Use a hex editor on the file output from the Delphi program to narrow down exactly where the mangling occurs.
In the absence of any code in the question, it's hard to be more specific than this.

CEdit::GetLine() windows 7

I have the following segment of code where m_edit is a CEdit control:
TCHAR lpsz[MAX_PATH+1];
// get the edit box text
m_edit.GetLine(0,lpsz, MAX_PATH);
This works perfectly on computers running Windows XP and earlier. I have not tested this in Vista, but on Windows 7, lpsz gets junk unicode characters inserted into it (as well as the actual text sometimes). Any idea as to what is going on here?
Since you're using MFC, why aren't you taking advantage of its CString class? That's one of the reasons many programmers were drawn to MFC, because it makes working with strings so much easier.
For example, you could simply write:
int len = m_edit.LineLength(m_edit.LineIndex(0));
CString path;
LPTSTR p = path.GetBuffer(len);
m_edit.GetLine(0, p, len);
path.ReleaseBuffer();
(The above code is tested to work fine on Windows 7.)
Note that the copied line does not contain a null-termination character (see the "Remarks" section in the documentation). That could explain the nonsense characters you're seeing in later versions of Windows.
It's not null terminated. You need to do this:
int count = m_edit.GetLine(0, lpsz, MAX_PATH);
lpsz[count] = 0;