non-ASCII file paths Windows - c++

I work on Windows and have file paths with non-ASCII symbols. For non-ASCII symbols windows using wstring. I am doing the conversion and pass them to luaL_dofile but it fails with can not find a file.
Here is my example of code:
std::wstring wstr_path = "non-ASCII path"
using convert_type = std::codecvt_utf8_utf16<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;
std::string str_path = converter.to_bytes(wstr_path);
luaL_dofile(mRoot, str_path.c_str());

I know nothing about luaL_dofile, but it's rather unlikely it uses UTF-8. Windows file API for Unicode unaware programs uses the ANSI codepage (which corresponds to the system default locale). The ANSI codepage on English/US systems is 1252, but other system default locales have different codepages. Central European is 1250, Cyrillic is 1251, etc.
Also, you could try generating the short name for the file (see the GetShortPathName API) and feed that.

Related

Changing Resource encoding from Windows 1252 to UNICODE UTF-8 with signatures

I am working on an older project (compiled with UNICODE defined) and came across a problem within the .rc. For example, a static text element which includes “©” defined in a DIALOGEX resource by
LTEXT "Copyright ©”,IDC_COPYRIGHT_STATIC,7,154,110,8
The resource file, probably created by MSVC application wizard many years ago and migrated forward with each release, now looks like this:
#if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_ENU)
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
#pragma code_page(1252) //present for over 10 years
#endif
For many years the © display correctly but recently appeared as “Å©” or even “½¿”. Obviously, an encoding issue, but I needed to understand how and why before making changes. So, after researching, these three properties in the .rc play a part in the bug and the encoding:
The presence or absence of “#pragma code_page(…)” in the .rc
The encoding used to Save with Encoding… the .rc file
Save with Encoding… .rc “with signature” or “without signature” (meaning BOM?)
As an empirical test, changing these things in the .rc and looking at the result text in dialogue
#pragma code_page(…)
Save with Encoding
Signature(BOM)
Text in Dlg
code_page(1252)
Original file
n/a
Å©
code_page(1252)
Windows 1252
n/a
©
code_page(1252)
UTF-8 65001
BOM
Å©
code_page(1252)
UTF-8 65001
No BOM
Å©
code_page(65001)
Windows 1252
n/a
©
code_page(65001)
UTF-8 65001
BOM
©
code_page(65001)
UTF-8 65001
No BOM
©
No code_page in .rc
UTF-8 65001
BOM
©
No code_page in .rc
UTF-8 65001
No BOM
Å©
I can explicitly Save with Encoding all .rc files encoding as Windows (1252) OR encoding as UNICODE UTF-8 with signatures (and delete the #pragma code_pages). The specific bug will go away, but is this the best solution?
It seems switching from Windows 1252 to UNICODE UTF-8 is a step forward and the right way to go long term. Is there any problem with this? Or better solutions?

Read and write binary files with UTF-8 file names on windows (with QT) [duplicate]

This question already has answers here:
How to open an std::fstream (ofstream or ifstream) with a unicode filename?
(7 answers)
Closed 1 year ago.
I am trying to read and write to files with UTF-8 file names such as files with Arabic or Persian names ("سلام.jpg") in C++. I am using the following code and it works fine on *nix systems (I checked it up on Android NDK). But it fails on Windows. This code snippet just tries to read the length of the file via tellg but on Windows it tells that file isn't open and tellg returns -1. I am using QT MingW so the VS open doesn't apply for me.
QString inFileAdd)
this->inFileAdd=inFileAdd;
ifstream infile;
infile.open(this->inFileAdd.toUtf8().constData(),ios::binary|ios::in|ios::ate);
infile.seekg (0, infile.end);
cout<<"is open: ";
if(infile.is_open()){
cout<<"true"<<endl;
}else{
cout<<"false"<<endl;
}
this->size=infile.tellg();
cout<<this->inFileAdd.toUtf8().constData()<<endl<<"file Len:"<<this->size<<endl;
this->fileContent=nullptr;
infile.close();
ifstream (and ofstream) simply does not support UTF-8 file paths on Windows 1.
However, at least in Visual Studio, they have non-standard constructors (and open() overloads) that accept wchar_t* file paths, so you can convert your UTF-8 to UTF-16 (or, in your case, just use this->inFileAdd.toStdWString().c_str() instead) to open files that use Unicode paths/names.
1: Windows 10 has experimental support for UTF-8 as a user locale. Programs can manually opt-in to enable UTF-8 in ANSI-based APIs, like CreateFileA(), which ifstream/ofstream are likely to use internally.

Wide Characters not printing while using Ncurses (C++)

The below code fails to print the wide character:
#include <ncurses.h>
using namespace std;
int main(void){
initscr();
printw("█");
getch();
endwin();
}
This code seems to work on some computers and not others, although all the libraries are installed correctly.
(The terminal is capable of displaying extended char!)
I compiled this using:
g++ -std=c++14 widechartest.cpp -o widechar -lncursesw
Could somebody let me know what the problem is?
Thanks 8)
You didn't initialize the locale. The manual page points this out:
The library uses the locale which the calling program has initialized.
That is normally done with setlocale:
setlocale(LC_ALL, "");
If the locale is not initialized, the library assumes that characters
are printable as in ISO-8859-1, to work with certain legacy programs.
You should initialize the locale and not rely on specific details of
the library when the locale has not been setup.

Making code (more) cross platform

I am trying to make my (currently working on windows) code a bit more platform independent. This small snippet is for opening a text file (for reading/writing).
As with modern pcs many files are stored as unicode. Now In this question I am especially wondering what the right way to open such a file is. (Where the filename may be unicode).
On windows, using microsoft visual studio, I used
char* filename("c:\\\xD0\x92.txt");
#ifdef _WIN32
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
#else
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
//non windows wstring = utf8
#endif
std::wstring wfilename(converter.from_bytes(filename));
std::fstream fstream(wfilename);
However I just now realized this isn't standard C++11, instead a visual studio (only?) extension. However when I try on windows to first convert the widestring to an UTF-8 std::string and open the file that way the fstream doesn't recognize the filename.
So what is the cross platform solution? - Should I just expect std::fstream to always accept a widestring input? Or should I expect this on windows always?
And if possix systems don't use widestrings for their directories; what do they use?

cout<< "привет"; or wcout<< L"привет";

Why
cout<< "привет";
works well while
wcout<< L"привет";
does not? (in Qt Creator for linux)
GCC and Clang defaults to treat the source file as UTF-8. Your Linux terminal is most probably configured to UTF-8 as well. So with cout<< "привет" there is a UTF-8 string which is printed in a UTF-8 terminal, all is well.
wcout<< L"привет" depends on a proper Locale configuration in order to convert the wide characters into the terminal's character encoding. The Locale needs to be initialized in order for the conversion to work (the default "classic" aka "C" locale doesn't know how to convert the wide characters). Use std::locale::global (std::locale ("")) for the Locale to match the environment configuration or std::locale::global (std::locale ("en_US.UTF-8")) to use a specific Locale (similar to this C example).
Here's the full source of the working program:
#include <iostream>
#include <locale>
using namespace std;
int main() {
std::locale::global (std::locale ("en_US.UTF-8"));
wcout << L"привет\n";
}
With g++ test.cc && ./a.out this prints "привет" (on Debian Jessie).
See also this answer about dangers of using wide characters with standard output.