fopen file name with UTF8 string in windows - c++

When I used opencv's API cvLoadImage(const char *filename, int iscolor) It accepts const char * as file name. When the file name is not ASCII-character, I tried to convert it to UTF8 string. It fails because fopen() called in cvLoadImage() can not interpret the characters of the file name literally as ASCII string. I may used _wfopen() if tried to open file names, but if fopen() is called in the third-party library, is there any method to handle this problem?
Use GetShortPathName. It will return an old (8.3) name for the file, which you should be able to convert to char*, as it should not contain any non ASCII characters.
I've just tested it with some language specific characters and it worked as I described. I've successfully opened a file from C:\łęłęł\ąóąóą.tsttgbb using fopen.

setlocale(LC_ALL, ".65001");
fopen(u8"中文路径.txt", "rb"); //window7(中文) vs2017 ok

A quick search came up with nothing but people saying it can't be done. If you can't change cvLoadImage (which is reasonable, you don't want to mess with that), you can try to trick it.
You can create a link to the file, using the CreateSymbolicLink. I'm not sure it'll work, though, because the MKLINK command line utility requires administrative privileges.
If you can't create a symbolic link, you can always copy the file to a different location with an ASCII-only name.
If you really don't want to copy the file and symlinks don't work, you can create a file-proxy - created a named pipe with an ASCII only name, and translate each read from the pipe to a read from the file.
I would go with options 1 or 2, though - a lot simpler.

Here's a late contribution to this problem. I looked through the source of the runtime library (which Microsoft kindly supply) and found that I could replace the routine used by fopen to map an ANSI string with the following code (just link this into your exe and it will replace the routine in the runtime library).
The version listed works for Visual Studio 2017 using the v141_xp toolkit. I haven't tested it for other versions but I imagine some minor changes (such as the name of the routine itself) might be needed. It won't work of course if the offending library is a DLL. Make of it what you will.
#ifdef _DEBUG
#define _NORMAL_BLOCK 1
#define _CRT_BLOCK 2
#define _malloc_crt(s) (_malloc_dbg (s, _CRT_BLOCK, __FILE__, __LINE__))
#define _malloc_crt _malloc_base
// A hack to make fopen et al accept UTF8 strings (as at Visual Studio 2017), see:
// D:\Program Files (x86)\Windows Kits\10\Source\10.0.10240.0\ucrt\internal\string_utilities.cpp
// D:\Program Files (x86)\Windows Kits\10\Source\10.0.10240.0\ucrt\inc\corecrt_internal_traits.h
extern "C" BOOL __cdecl __acrt_copy_path_to_wide_string (char const* const path, wchar_t** const result)
#if _MSC_VER != 1910
#define STRINGIZE_HELPER(x) #x
__pragma (message (__FILE__ "(" STRINGIZE (__LINE__) ") : Error: Code not tested for this version of Visual Studio"));
assert (path);
assert (result);
// Compute the required size of the wide character buffer:
int length = MultiByteToWideChar (CP_UTF8, 0, path, -1, nullptr, 0);
assert (length > 0);
*result = (wchar_t *) _malloc_crt (T2B (length));
// Do the conversion:
length = MultiByteToWideChar (CP_UTF8, 0, path, -1, *result, length);
assert (length);
return TRUE;


Extracting file from zip using wide string file path in C++

How can you read a file from a zip by opening the zip with a wide string file path? I only saw libraries and code examples with std::string or const char * file paths but I suppose they may fail on Windows with non-ASCII characters. I found this but I'm not using gzip.
const auto zip_file = unzOpen(jar_file_path.string().c_str()); // No wide string support
if (zip_file == nullptr)
throw std::runtime_error("unzOpen() failed");
libzippp::ZipArchive zip_archive(jar_file_path.string()); // No wide string support
const auto file_opened_successfully =;
if (!file_opened_successfully)
throw std::runtime_error("Failed to open the archive file");
Zipper does not seem to support wide strings either. Is there any way it can currently be done?
You might be in luck with minizip. I haven't tested this, but I found the following code in mz_strm_os_win32.c:
int32_t mz_stream_os_open(void *stream, const char *path, int32_t mode) {
path_wide = mz_os_unicode_string_create(path, MZ_ENCODING_UTF8);
if (path_wide == NULL)
win32->handle = CreateFile2(path_wide, desired_access, share_mode,
creation_disposition, NULL);
win32->handle = CreateFileW(path_wide, desired_access, share_mode, NULL,
creation_disposition, flags_attribs, NULL);
So it looks very much as if the author catered explicitly for Windows' lack of built-in UTF-8 support for the 'narrow string' file IO functions. It's worth a try at least, let's just hope that that function actually gets called when you try to open a zip file.
Regarding Minizip library, API function unzOpen() works well with UTF-8 only on Unix systems, but on Windows, path will be processed only in the current CodePage. For get full Unicode support, need to use new API functions unzOpen2_64() and zipOpen2_64() that allows to pass structure with set of functions for work with file system. Please see my answer with details in the similar question.

Understanding Multibyte/Unicode

I'm just getting back into Programming C++, MFC, Unicode. Lots have changed over the past 20 years.
Code on another project compiled just fine, but had errors when I paste it into my code. It took me 1-1/2 days of wasted time to solve the function call below:
enter code here
CString CFileOperation::ChangeFileName(CString sFileName)
char drive[MAX_PATH], dir[MAX_PATH], name[MAX_PATH], ext[MAX_PATH];
_splitpath_s(sFileName, drive, dir, name, ext); //error
------- other code
After reading help, I changed the CString sFileName to use a cast:
enter code here
_splitpath_s((LPTCSTR)sFileName, drive, dir, name, ext); //error
This created an error too. So then I used GetBuffer() which is really the same as above.
enter code here
char* s = sFileName.GetBuffer(300);
_splitpath_s(s, drive, dir, name, ext); //same error for the 3rd time
At this point I was pretty upset, but finally realized that I needed to change the CString to Ascii (I think because I'm set up as Unicode).
enter code here
CT2A strAscii(sFileName); //convert CString to ascii, for splitpath()
then use strAscii.m_pz in the function _splitpath_s()
This finally worked. So after all this, to make a story short, I need help focusing on:
1. Unicode vs Mulit-Byte (library calls)
2. Variables to uses
I'm willing to purchase another book, please recommend.
Also, is there a way to filter my help on VS2015 so that when I'm on a variable and press F1, it only gives me help for Unicode and ways to convert old code to unicode or convert Mylti-Byte to Unicode.
Hope this is not to confusing, but I have some catching up to do. Be patient if my verbiage is not perfect.
Thanks in advance.
The documentation of _splitpath lists a Unicode (wchar_t based) version _wsplitpath. That's the one you should be using. Don't convert to ASCII or Windows ANSI, that will in general lose information and not produce a valid path when you recombine the pieces.
Modern Windows programming is Unicode based.
A Visual Studio C++ project is Unicode-based by default, in particular it defines the macro symbol UNICODE, which affects the declarations from <windows.h>.
All supported versions of Windows use Unicode internally throughout, and your application should, too. Windows uses UTF-16 encoding.
To make your application Unicode-enabled you need to perform the following steps:
Set up your project's Character Set to "Use Unicode Character Set" (if it's currently set to "Use Multi-Byte Character Set"). This is not strictly required, but it deals with those cases, where you aren't using the Unicode version explicitly.
Use wchar_t (in place of char or TCHAR) for your strings.
Use wide character string literals (L"..." in place of "...").
Use CStringW (in place of CStringA or CString) in an MFC project.
Explicitly call the Unicode version of the CRT (e.g. wcslen in place of strlen or _tcslen).
Explicitly call the Unicode version of any Windows API call where it exists (e.g. CreateWindowExW in place of CreateWindowExA or CreateWindowEx).
Try using _tsplitpath_s and TCHAR.
So the final code looks something like:
CString CFileOperation::ChangeFileName(CString sFileName)
TCHAR drive[MAX_PATH], dir[MAX_PATH], name[MAX_PATH], ext[MAX_PATH];
_tsplitpath_s(sFileName, drive, dir, name, ext); //error
------- other code
This will enable C++ compiler to use the correct character width during build time depending on the project settings

'CreateDirectoryW' : cannot convert parameter 1 from 'const char *' to 'LPCWSTR' in OpenCV 2.4.5 and VS 2010

I was trying the sample code bagofwords_classification.cpp from openCV 2.4.5 to Visual Studio 2010 (VC++ based). But I found the error code :
error C2664: 'CreateDirectoryW' : cannot convert parameter 1 from 'const char *' to 'LPCWSTR'
Can you help me to give me the solution about that problem? Thanks. :)
Update v1:
static void makeDir( const string& dir )
#if defined WIN32 || defined _WIN32
CreateDirectory( dir.c_str(), 0 );
mkdir( dir.c_str(), S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH );
static void makeUsedDirs( const string& rootPath )
makeDir(rootPath + bowImageDescriptorsDir);
makeDir(rootPath + svmsDir);
makeDir(rootPath + plotsDir);
You have code that calls CreateDirectory. When UNICODE is defined, that symbol is actually a macro for CreateDirectoryW; the intention is for you to use "ambiguous" function names when you're also using TCHAR instead of char or wchar_t, so you can switch between compiling for Unicode or Ansi programs.
However, std::string doesn't change according to UNICODE; it's always Ansi, so its c_str method always returns a char*, never wchar_t*. When you have a parameter that's always Ansi, you should explicitly call functions that are always Ansi, too. In this case, call CreateDirectoryA.
You could also consider using std::basic_string<TCHAR>, but that's probably heading in a direction you don't wish to go.
A quick fix would be to adjust your project settings so that UNICODE is no longer defined. Consult the documentation for your tool set to find out how to do that, or explore your IDE's project options.
CreateDirectory will be defined as CreateDirectoryW which expects its parameters to be "wide" strings (UTF-16 encoded WCHAR*).
To create a wide string you can prepend L to a regular string.
CreateDirectory(L"mydir", NULL);
Alternatively, you can switch your project to multibyte encoding in the properties. This will mean that calling CreateDirectory will automatically use the CreateDirectoryA version of the function which accepts char* strings. These are expected to be in the multibyte encoding of the active codepage.
CreateDirectoryW accepts wide char, if you are using Unicode mode it is ok. You probably should use CreateDirectory or CreateDirectoryA.

