Array stores name retrieved from GetVolumeInformation weirdly in Visual C++? - c++

I would like to use the GetVolumeInformation call to retrieve the name of a removable device. I can retrieve the name just fine and store into a TCHAR array variable szVolNameBuff. Here is my code for that:
// Get Volume Information to check for NTFS or FAT
TCHAR szFileSys[256];
TCHAR szVolNameBuff[256];
DWORD dwSerial = 0;
DWORD dwMFL = 0;
DWORD dwSysFlags = 0;
bool bSuccess;
char fileType[255];
int bSuccessdebug = 0;
//LPCTSTR temp = _T("E:\\"); For debugging only
bSuccess = GetVolumeInformation(drivePath,
szVolNameBuff,
sizeof(szVolNameBuff),
&dwSerial,
&dwMFL,
&dwSysFlags,
szFileSys,
sizeof(szFileSys));
When i try to print the contents of the variable with the line:
printf("szVolNameBuff holds: %s \n", &szVolNameBuff);
I get an output of "T" instead of the name "Transcend" which is the name of the drive. I debugged it with Visual Studio 2008 and found out that the TCHAR array stores the name as:
[0] 'T'
[1] 0
[2] 'R'
[3] 0
[4] 'A'
[5] 0
[6] 'N'
[7] 0
and so on and so forth. Why is that? I want the array to store the word as just:
[0] 'T'
[1] 'R'
[2] 'A'
[3] 'N'
[4] 'S'
to later use it for string concatenation. Is there a way to fix this?

It looks like you are using the unicode Win32 APIs. You should use _tprintf so that the appropriate function (printf or wprintf) is used according to the character type.
If you don't know unicode - here's a quick overview. The reason this is happening is that the unicode for the regular ascii characters is a null byte followed by the ascii character. That's why you are seeing the string padded with nulls.
Note that when using TCHAR, you should also wrap all strings in the _T() macro, so that they are also declared of the correct type. If you follow this consistently, converting from unicode to ansi is just a matter of changing a preprocessor directive.

The why is that you're using the Unicode versions of the Win32 APIs.
And there are 2 fixes. The first is to use the standard versions of the API which if you are using Visual Studio can be done by changing your project's Character Set to 'Not set' in Projects-> Properties-> Configuration Properties-> General-> Character Set, or by making sure UNICODE is not #defined before including windows.h if you aren't using VS.
The second fix, as mdma said, is to use the Unicode text manipulation function wprintf or use %S in the standard library. This is the preferred fix as your program would then become internationalisation friendly and work whatever character set the file names were using. However it would mean that all down stream functions would need to use Unicode too, which might mean a lot of work, depending on the size of the project.

Related

Printing em-dash to console window using printf? [duplicate]

This question already has answers here:
Is it possible to cout an EM DASH on Linux and Windows? [duplicate]
(2 answers)
Closed 5 years ago.
A simple problem: I'm writing a chatroom program in C++ (but it's primarily C-style) for a class, and I'm trying to print, “#help — display a list of commands...” to the output window. While I could use two hyphens (--) to achieve roughly the same effect, I'd rather use an em-dash (—). printf(), however, doesn't seem to support printing em-dashes. Instead, the console just prints out the character, ù, in its place, despite the fact that entering em-dashes directly into the prompt works fine.
How do I get this simple Unicode character to show up?
Looking at Windows alt key codes, I find it interesting how alt+0151 is "—" and alt+151 is "ù". Is this related to my problem, or a simple coincidence?
the windows is unicode (UTF-16) system. console unicode as well. if you want print unicode text - you need (and this is most effective) use WriteConsoleW
BOOL PrintString(PCWSTR psz)
{
DWORD n;
return WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), psz, (ULONG)wcslen(psz), &n, 0);
}
PrintString(L"—");
in this case in your binary file will be wide character — (2 bytes 0x2014) and console print it as is.
if ansi (multi-byte) function is called for output console - like WriteConsoleA or WriteFile - console first translate multi-byte string to unicode via MultiByteToWideChar and in place CodePage will be used value returned by GetConsoleOutputCP. and here (translation) can be problem if you use characters > 0x80
first of all compiler can give you warning: The file contains a character that cannot be represented in the current code page (number). Save the file in Unicode format to prevent data loss. (C4819). but even after you save source file in Unicode format, can be next:
wprintf(L"ù"); // no warning
printf("ù"); //warning C4566
because L"ù" saved as wide char string (as is) in binary file - here all ok and no any problems and warning. but "ù" is saved as char string (single byte string). compiler need convert wide string "ù" from source file to multi-byte string in binary (.obj file, from which linker create pe than). and compiler use for this WideCharToMultiByte with CP_ACP (The current system default Windows ANSI code page.)
so what happens if you say call printf("ù"); ?
unicode string "ù" will be converted to multi-byte
WideCharToMultiByte(CP_ACP, ) and this will be at compile time. resulting multi-byte string will be saved in binary file
the console it run-time convert your multi-byte string to
wide char by MultiByteToWideChar(GetConsoleOutputCP(), ..) and
print this string
so you got 2 conversions: unicode -> CP_ACP -> multi-byte -> GetConsoleOutputCP() -> unicode
by default GetConsoleOutputCP() == CP_OEMCP != CP_ACP even if you run program on computer where you compile it. (on another computer with another CP_OEMCP especially)
problem in incompatible conversions - different code pages used. but even if you change console code page to your CP_ACP - convertion anyway can wrong translate some characters.
and about CRT api wprintf - here situation is next:
the wprintf first convert given string from unicode to multi-byte by using it internal current locale (and note that crt locale independent and different from console locale). and then call WriteFile with multi-byte string. console convert back this multi-bytes string to unicode
unicode -> current_crt_locale -> multi-byte -> GetConsoleOutputCP() -> unicode
so for use wprintf we need first set current crt locale to GetConsoleOutputCP()
char sz[16];
sprintf(sz, ".%u", GetConsoleOutputCP());
setlocale(LC_ALL, sz);
wprintf(L"—");
but anyway here i view (on my comp) - on screen instead —. so will be -— if call PrintString(L"—"); (which used WriteConsoleW) just after this.
so only reliable way print any unicode characters (supported by windows) - use WriteConsoleW api.
After going through the comments, I've found eryksun's solution to be the simplest (...and the most comprehensible):
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
int main()
{
//other stuff
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"#help — display a list of commands...");
Portability isn't a concern of mine, and this solves my initial problem—no more ù—my beloved em-dash is on display.
I acknowledge this question is essentially a duplicate of the one linked by sata300.de. Albeit, with printf in the place of cout, and unnecessary ramblings in the place of relevant information.

Understanding Multibyte/Unicode

I'm just getting back into Programming C++, MFC, Unicode. Lots have changed over the past 20 years.
Code on another project compiled just fine, but had errors when I paste it into my code. It took me 1-1/2 days of wasted time to solve the function call below:
enter code here
CString CFileOperation::ChangeFileName(CString sFileName)
{
char drive[MAX_PATH], dir[MAX_PATH], name[MAX_PATH], ext[MAX_PATH];
_splitpath_s(sFileName, drive, dir, name, ext); //error
------- other code
}
After reading help, I changed the CString sFileName to use a cast:
enter code here
_splitpath_s((LPTCSTR)sFileName, drive, dir, name, ext); //error
This created an error too. So then I used GetBuffer() which is really the same as above.
enter code here
char* s = sFileName.GetBuffer(300);
_splitpath_s(s, drive, dir, name, ext); //same error for the 3rd time
sFileName.ReleaseBuffer();
At this point I was pretty upset, but finally realized that I needed to change the CString to Ascii (I think because I'm set up as Unicode).
hence;
enter code here
CT2A strAscii(sFileName); //convert CString to ascii, for splitpath()
then use strAscii.m_pz in the function _splitpath_s()
This finally worked. So after all this, to make a story short, I need help focusing on:
1. Unicode vs Mulit-Byte (library calls)
2. Variables to uses
I'm willing to purchase another book, please recommend.
Also, is there a way to filter my help on VS2015 so that when I'm on a variable and press F1, it only gives me help for Unicode and ways to convert old code to unicode or convert Mylti-Byte to Unicode.
Hope this is not to confusing, but I have some catching up to do. Be patient if my verbiage is not perfect.
Thanks in advance.
The documentation of _splitpath lists a Unicode (wchar_t based) version _wsplitpath. That's the one you should be using. Don't convert to ASCII or Windows ANSI, that will in general lose information and not produce a valid path when you recombine the pieces.
Modern Windows programming is Unicode based.
A Visual Studio C++ project is Unicode-based by default, in particular it defines the macro symbol UNICODE, which affects the declarations from <windows.h>.
All supported versions of Windows use Unicode internally throughout, and your application should, too. Windows uses UTF-16 encoding.
To make your application Unicode-enabled you need to perform the following steps:
Set up your project's Character Set to "Use Unicode Character Set" (if it's currently set to "Use Multi-Byte Character Set"). This is not strictly required, but it deals with those cases, where you aren't using the Unicode version explicitly.
Use wchar_t (in place of char or TCHAR) for your strings.
Use wide character string literals (L"..." in place of "...").
Use CStringW (in place of CStringA or CString) in an MFC project.
Explicitly call the Unicode version of the CRT (e.g. wcslen in place of strlen or _tcslen).
Explicitly call the Unicode version of any Windows API call where it exists (e.g. CreateWindowExW in place of CreateWindowExA or CreateWindowEx).
Try using _tsplitpath_s and TCHAR.
So the final code looks something like:
CString CFileOperation::ChangeFileName(CString sFileName)
{
TCHAR drive[MAX_PATH], dir[MAX_PATH], name[MAX_PATH], ext[MAX_PATH];
_tsplitpath_s(sFileName, drive, dir, name, ext); //error
------- other code
}
This will enable C++ compiler to use the correct character width during build time depending on the project settings

Why does my colon character disappear when I go from char[] to string?

In an old Windows application I'm working on I need to get a path from an environment variable and then append onto it to build a path to a file. So the code looks something like this:
static std::string PathRoot; // Private variable stored in class' header file
char EnvVarValue[1024];
if (! GetEnvironmentVariable(L"ENV_ROOT", (LPWSTR) EnvVarValue, 1024))
{
cout << "Could not retrieve the ROOT env variable" << endl;
return;
}
else
{
PathRoot = EnvVarValue;
}
// Added just for testing purposes - Returning -1
int foundAt = PathRoot.find_first_of(':');
std::string FullFilePath = PathRoot;
FullFilePath.append("\\data\\Config.xml");
The environment value for ENV_ROOT is set to "c:\RootDir" in the Windows System Control Panel. But when I run the program I keep ending up with a string in FullFilePath that is missing the colon char and anything that followed in the root folder. It looks like this: "c\data\Config.xml".
Using the Visual Studio debugger I looked at EnvVarValue after passing the GetEnvironmentVariable line and it shows me an array that seems to have all the characters I'd expect, including the colon. But after it gets assigned to PathRoot, mousing over PathRoot only shows the C and drilling down it says something about a bad ptr. As I noted the find_first_of() call doesn't find the colon char. And when the append is done it only keeps the initial C and drops the rest of the RootDir value.
So there seems to be something about the colon character that is confusing the string constructor. Yes, there are a number of ways I could work around this by leaving the colon out of the env variable and adding it later in the code. But I'd prefer to find a way to have it read and used properly from the environment variable as it is.
You cannot simply cast a char* to a wchar_t* (by casting to LPWSTR) and expect things to work. The two are fundamentally distinct types, and in Windows, they signify different encoding.
You obviously have WinAPI defines set such that GetEnvironmentVariable resolves to GetEnvironmentVariableW, which uses UTF-16 to encode the string. In practice, this means a 0 byte follows every ASCII character in memory.
You then construct a std::string out of this, so it takes the first 0 byte (at char index 1) as the string terminator, so you get just "c".
You have several options:
Use std::wstring and wchar_t EnvVarValue[1024];
Call GetEnvironmentVariableA() (which uses char and ASCII)
Use wchar_t EnvVarValue[1024]; and convert the returned value to a std::string using something like wcstombs.
It seems you are building with wide-character functions (as indicated by your cast to LPWSTR). This means that the string in EnvVarValue is a wide-character string, and you you should be using wchar_t and std::wstring instead.
I would guess that the contents in the array array after the GetEnvironmentVariable call is actually the ASCII values 0x43 0x00 0x3a 0x00 0x5c 0x00 etc. (that is the wide-char representation of "C:\"). The first zero acts as the string terminator for a narrow-character string, which is why the narrow-character string PathRoot only contains the 'C'.
The problem might be that EnvVarValue is not a wchar. Try using wchar_t and std::wstrîng.

How to view the value of a unicode CString in VC6?

I'm using Visual Studio 6 to debug a C++ application. The application is compiled for unicode string support. The CString type is used for manipulating strings. When I am using the debugger, the watch window will display the first character of the string, but will not display the full string. I tried using XDebug, but this tool does not handle unicode strings properly. As a work around, I can create a custom watch for each character of the string by indexing into the private array the CString maintains, but this is very tedious.
How can I view the full, unicode value of a CString in the VC6 debugger?
Go to tools->options->Debug, and check the "Display unicode string" check-box. That would probably fix the problem. Two other options:
In the watch window, if you have a Unicode string variable named szText, add it to the watch as szText,su. This will tell VS to interpret it as a Unicode string (See Symbols for Watch Variables for more of this sort).
Worst comes to worst, you can have a global ANSI string buffer, and a global function that will get a Unicode CString and store its content as ANSI, in that global variable. Then, when need call that function with the string whose content you'd like to see in the watch window, and watch the ANSI buffer.
But the "Display unicode string" thing is probably the problem...

How do I store value to string with RegOpenKeyEx?

I need to grab the path from the registry. The following code works except for the last part where I'm storing the path to the string. Running the debugger in Visual Studio 2008 the char array has the path, but every other character is a zero. This results in the string only being assigned the first letter. I've tried changing char res[1024] to char *res = new char[1024] and this just makes it store the first letter in the char array instead of the string. The rest of the program needs the path as a string datatype so it can't stay as a char array. What am I missing here?
unsigned long type=REG_SZ, size=1024;
string path;
char res[1024];
HKEY key;
if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, _T("SOFTWARE\\Classes\\dsn\\shell\\open\\command"), NULL, KEY_READ, &key)==ERROR_SUCCESS){
RegQueryValueEx(key,
NULL,// YOUR value
NULL,
&type,
(LPBYTE)res,
&size);
RegCloseKey(key);
path = string(res);
}
You're getting back a Unicode string, but assigning it to a char-based string.
You could switch path's class to being a 'tstring' or 'wstring', or use RegQueryValueExA (A for ASCII).
You are compiling in Unicode. Go to Project Settings>Configuration Properties>General and change "Character Set" to "Not Set", and rebuild your project.
RegOpenKey is actually a macro defined in the WINAPI headers. If Unicode is enabled, it resolves to RegOpenKeyW, if not then it resolves to RegOpenKeyA. If you want to continue to compile under unicode, then you can just call RetgOpenKeyA directly instead of using the macro.
Otherwise, you'll need to deal with Unicode strings which, if needed, we can help you with also.
For C++, you may prefer to access the Registry using the ATL helper class CRegKey. The method for storing string values is QueryStringValue. There are other (somewhat) typesafe methods for retrieving and setting different registry value types.
It's not the best C++ interface (eg. no std::string support) but a little smoother than native Win32.