SDL2 loading files with special characters - c++

I got a problem, that is: in a Windows application using SDL2 & SDL2_Image, it opens image files, for later saving them with modifications on the image data.
When it opens an image without special characters (like áéíóúñ, say, "buenos aires.jpg") it works as intended. But, if there is any special character as mentioned (say, "córdoba.jpg"), SDL_Image generates an error saying "Couldn't open file". Whatever, if i use the std::ifstream flux with the exact file name that i got from the CSV file (redundant, as "córdoba.jpg" or "misiónes.jpg"), the ifstream works well... Is it an error using the special characters? UNICODE, UTF, have something to do?
A little information about the environment: Windows 10 (spanish, latin american), SDL2 & SDL2_Image (up to date versions), GCC compiler using Mingw64 7.1.0
About the software I'm trying to make: it uses a CSV form, with the names of various states of Argentina, already tried changing encoding on the .CSV. It loads images based on the names found on the CSV, changes them, and saves.
I know maybe I am missing something basic, but already depleted my resources.

IMG_Load() forwards its file argument directly to SDL_RWFromFile():
// http://hg.libsdl.org/SDL_image/file/8fee51506499/IMG.c#l125
SDL_Surface *IMG_Load(const char *file)
{
SDL_RWops *src = SDL_RWFromFile(file, "rb");
const char *ext = SDL_strrchr(file, '.');
if(ext) {
ext++;
}
if(!src) {
/* The error message has been set in SDL_RWFromFile */
return NULL;
}
return IMG_LoadTyped_RW(src, 1, ext);
}
And SDL_RWFromFile()'s file argument should be a UTF-8 string:
SDL_RWops* SDL_RWFromFile(const char* file,
const char* mode)
Function Parameters:
file: a UTF-8 string representing the filename to open
mode: an ASCII string representing the mode to be used for opening the file; see Remarks for details
So pass UTF-8 paths into IMG_Load().
C++11 has UTF-8 string literal support built-in via the u8 prefix:
IMG_Load( u8"córdoba.jpg" );

Related

Extracting file from zip using wide string file path in C++

How can you read a file from a zip by opening the zip with a wide string file path? I only saw libraries and code examples with std::string or const char * file paths but I suppose they may fail on Windows with non-ASCII characters. I found this but I'm not using gzip.
Attempts
minizip:
const auto zip_file = unzOpen(jar_file_path.string().c_str()); // No wide string support
if (zip_file == nullptr)
{
throw std::runtime_error("unzOpen() failed");
}
libzippp:
libzippp::ZipArchive zip_archive(jar_file_path.string()); // No wide string support
const auto file_opened_successfully = zip_archive.open(libzippp::ZipArchive::ReadOnly);
if (!file_opened_successfully)
{
throw std::runtime_error("Failed to open the archive file");
}
Zipper does not seem to support wide strings either. Is there any way it can currently be done?
You might be in luck with minizip. I haven't tested this, but I found the following code in mz_strm_os_win32.c:
int32_t mz_stream_os_open(void *stream, const char *path, int32_t mode) {
...
path_wide = mz_os_unicode_string_create(path, MZ_ENCODING_UTF8);
if (path_wide == NULL)
return MZ_PARAM_ERROR;
#ifdef MZ_WINRT_API
win32->handle = CreateFile2(path_wide, desired_access, share_mode,
creation_disposition, NULL);
#else
win32->handle = CreateFileW(path_wide, desired_access, share_mode, NULL,
creation_disposition, flags_attribs, NULL);
#endif
mz_os_unicode_string_delete(&path_wide);
...
So it looks very much as if the author catered explicitly for Windows' lack of built-in UTF-8 support for the 'narrow string' file IO functions. It's worth a try at least, let's just hope that that function actually gets called when you try to open a zip file.
Regarding Minizip library, API function unzOpen() works well with UTF-8 only on Unix systems, but on Windows, path will be processed only in the current CodePage. For get full Unicode support, need to use new API functions unzOpen2_64() and zipOpen2_64() that allows to pass structure with set of functions for work with file system. Please see my answer with details in the similar question.

VS2019 compiler misinterprets UTF8 without BOM file as ANSI

I used to compile my C++ wxWidgets-3.1.1 application (Win10x64) with VS2015 Express. I wanted to upgrade my IDE to VS2019 community, which seemed to work quite well.
My project files are partly from older projects, so their encoding differs (Windows-1252, UTF-8 without BOM, ANSI).
With VS2015 I was able to compile and give out messages (hardcoded in my .cpp files), which displayed unicode characters correctly.
The same app compiled with VS2019 community shows for example the german word "übergabe" as "übergabe" which is uninterpreted UTF8.
Saving the .cpp file, which contains the unicode, explicitly as UTF8 WITH BOM solves this issue. But I don't want run through all files in all projects. Can I change the expected input from a "without BOM" file to UTF-8 to get the same behaviour that VS2015 had?
[EDIT]
It seems there is no such option. As I said before, converting all .cpp/.h files to UTF-8-BOM is a solution.
Thus, so far the only suitable way is to loop through the directory, rewrite the files in UTF-8 while prepending the BOM.
Using C++ wxWidgets, this is (part of) my attempt to automate the process:
//Read in the file, convert its content to UTF8 if necessary
wxFileInputStream fis(fileFullPath);
wxFile file(fileFullPath);
size_t dataSize = file.Length();
void* data = malloc(dataSize);
if (!fis.ReadAll(data, dataSize))
{
wxString sErr;
sErr << "Couldn't read file: " << fileFullPath;
wxLogError(sErr);
}
else
{
wxString sData((char*)data, dataSize);
wxString sUTF8Data;
if (wxEmptyString == wxString::FromUTF8(sData))
{
sUTF8Data = sData.ToUTF8();
}
else
{
sUTF8Data = sData;
}
wxFFileOutputStream out(fileFullPath);
wxBOM bomType = wxConvAuto::DetectBOM(sUTF8Data, sUTF8Data.size());
if (wxBOM_UTF8 != bomType)
{
if (wxBOM_None == bomType)
{
unsigned char utf8bom[] = { 0xEF,0xBB,0xBF };
out.Write((char*)utf8bom, sizeof(utf8bom));
}
else
{
wxLogError("File already contains a different BOM: " + fileFullPath);
}
}
}
Note that this can not convert all encodings, basically afaik it can only convert ANSI files or add the BOM to UTF-8 files without BOM. For all other encodings, I open the project in VS2019, select the file and go (freely translated into english, names might differ):
-> File -> XXX.cpp save as... -> Use the little arrow in the "Save" button -> Save with encoding... -> Replace? Yes! -> "Unicode (UTF-8 with signature) - Codepage 65001"
(Don't take "UTF-8 without signature" which is also Codepage 65001, though!)
The option /utf-8 specifies both the source character set and the execution character set as UTF-8.
Check the Microsoft docs
The C++ team blog that explains the charset problem

CStdioFile problems with encoding on read file

I can't read a file correctly using CStdioFile.
I open notepad.exe, I type àèìòùáéíóú and I save twice, once I set codification as ANSI (really is CP-1252) and other as UTF-8.
Then I try to read it from MFC with the following block of code
BOOL ReadAllFileContent(const CString &FilePath, CString *fileContent)
{
CString sLine;
BOOL isSuccess = false;
CStdioFile input;
isSuccess = input.Open(FilePath, CFile::modeRead);
if (isSuccess) {
while (input.ReadString(sLine)) {
fileContent->Append(sLine);
}
input.Close();
}
return isSuccess;
}
When I call it, with ANSI file I've got the expected result àèìòùáéíóú
but when I try to read the UTF8 encoded file I've got à èìòùáéíóú
I would like my function works with all files regardless of the encoding.
Why I need to implement?
.EDIT.
Unfortunately, in the real app, files come from external app so change the file encoding isn't an option.I must be able to read both UTF-8 and CP-1252 files.
Any file is valid ANSI, what notepad told ANSI is really Windows-1252 encode.
I've figured out a way to read UTF-8 and CP-1252 right based on the example provided here. Although it works, I need to pass the file encode which I don't know in advance.
Thnks!
I personally use the class as advertised here:
https://www.codeproject.com/Articles/7958/CTextFileDocument
It has excellent support for reading and writing text files of various encodings including unicode in its various flavours.
I have not had a problem with it.

Trying to test unicode filename retrieval/conversion to/from UTF8-UTF16

I've looked at a lot of examples for WideCharToMultiByte, etc. This question is more about testing.
I downloaded another language set, Chinese, Simplified China to my machine. Then using the virtual keyboard I created a directory on C:\ with some Chinese characters in the path, and placed a file inside the directory.
I'm trying to see that I get the correct filename by testing _wfopen with my path. I also have the same file in another location for testing:
//setlocale(LC_ALL, "zh-CN");
//setlocale(LC_ALL, "Chinese_China.936");
setlocale(LC_ALL, "");
wchar_t* outfilename = L"C:\\特殊他\\和阿涛和润\\bracket3holes.sat";
//wchar_t* outfilename = L"C:\\heather\\bracket3holes.sat";
wchar_t w[] = L"r";
FILE* foo = _wfopen(outfilename, w);
First I tried without setting locale, then I tried various combinations of setting locale to the language I downloaded (therefore the language of the path).
_wfopen works fine with the C:\heather path, but always returns a NULL pointer with the unicode path.
What am I missing? Any insight would be greatly appreciated. Note my code must be compilable back to vc9.
--- Based on the feedback, I saved the file as UTF-8 with BOM, added const before the wchar_t declarations, and now in the debugger I do see the right string and the file pointer is no longer null.
Thank you for your help. I'm still trying to wrap my head around this all, we're trying to transition from const char* to unicode-friendly.

c++ fstreams open file with utf-16 name

At first I built my project on Linux and it was built around streams.
When I started moving to Windows I ran into some problems.
I have a name of the file that I want to open in UTF-16 encoding.
I try to do it using fstream:
QString source; // content of source is shown on image
char *op= (char *) source.data();
fstream stream(op, std::ios::in | std::ios::binary);
But file cannot be opened.
When I check it,
if(!stream.is_open())
{} // I always get that it's not opened. But file indeed exists.
I tried to do it with wstream. But result is the same, because wstream accepts only char * too. As I understand it's so , because string , that is sent as char * , is truncated after the first zero and only one symbol of the file's name is sent, so file is never found. I know wfstream in Vissual studio can accept wchar_t * line as name, but compiler of my choice is MinGW and it doesn't have such signature for wstring constructor.
Is there any way to do it with STL streams?
ADDITION
That string can contaion not only Ascii symbols, it can contain Russian, German, Chinese symbols simultaneously. I don't want limit myself only to ASCII or local encoding.
NEXT ADDITION
Also data can be different, not only ASCII, otherwise I wouldn't bother myself with Unicode at all.
E.g.
Thanks in advance!
Boost::Filesystem especially the fstream.hpp header may help.
If you are using MSVC and it's implementation of the c++ standard library, something like this should work:
QString source; // content of source is shown on image
wchar_t *op= source.data();
fstream stream(op, std::ios::in | std::ios::binary);
This works because the Microsoft c++ implementation has an extension to allow fstream to be opened with a wide character string.
Convert the UTF-16 string using WideCharToMultiByte with CP_ACP before passing the filename to fstream.