I am trying to learn a little c++ and I have a silly question. Consider this code:
TCHAR tempPath[255];
GetTempPath(255, tempPath);
Why does windows need the size of the var tempPath? I see that the GetTempPath is declared something like:
GetTempPath(dword size, buf LPTSTR);
How can windows change the buf value without the & operator? Should not the function be like that?
GetTempPath(buf &LPTSTR);
Can somebody provide a simple GetTempPath implementation sample so I can see how size is used?
EDIT:
Thanks for all your answers, they are all correct and I gave you all +1. But what I meant by "Can somebody provide a simple GetTempPath implementation) is that i have tried to code a function similar to the one windows uses, as follow:
void MyGetTempPath(int size, char* buf)
{
buf = "C:\\test\\";
}
int main(int argc, char *argv[])
{
char* tempPath = new TCHAR[255];
GetTempPathA(255, tempPath);
MessageBoxA(0, tempPath, "test", MB_OK);
return EXIT_SUCCESS;
}
But it does not work. MessageBox displays a "##$' string. How should MyGetTempPath be coded to work properly?
Windows needs the size as a safety precaution. It could crash the application if it copies characters past the end of the buffer. When you supply the length, it can prevent that.
Array variables work like pointers. They point to the data in the array. So there is no need for the & operator.
Not sure what kind of example you are looking for. Like I said, it just needs to verify it doesn't write more characters than there's room for.
An array cannot be passed into functions by-value. Instead, it's converted to a pointer to the first element, and that's passed to the function. Having a (non-const) pointer to data allows modification:
void foo(int* i)
{
if (i) (don't dereference null)
*i = 5; // dereference pointer, modify int
}
Likewise, the function now has a pointer to a TCHAR it can write to. It takes the size, then, so it knows exactly how many TCHAR's exist after that initial one. Otherwise it wouldn't know how large the array is.
GetTempPath() outputs into your "tempPath" character array. If you don't tell it how much space there is allocated in the array (255), it has no way of knowing whether or not it will have enough room to write the path string into tempPath.
Character arrays in C/C++ are pretty much just pointers to locations in memory. They don't contain other information about themselves, like instances of C++ or Java classes might. The meat and potatoes of the Windows API was designed before C++ really had much inertia, I think, so you'll often have to use older C style techniques and built-in data types to work with it.
Following wrapper can be tried, if you want to avoid the size:
template<typename CHAR_TYPE, unsigned int SIZE>
void MyGetTempPath (CHAR_TYPE (&array)[SIZE]) // 'return' value can be your choice
{
GetTempPath(SIZE, array);
}
Now you can use like below:
TCHAR tempPath[255];
MyGetTempPath(tempPath); // No need to pass size, it will count automatically
In your other question, why we do NOT use following:
GetTempPath(buf &LPTSTR);
is because, & is used when you want to pass a data type by reference (not address). I am not aware what buf is typecasted to but it should be some pointer type.
Can somebody provide a simple
GetTempPath implementation sample so I
can see how size is used?
First way (based on MAX_PATH constant):
TCHAR szPath[MAX_PATH];
GetTempPath(MAX_PATH, szPath);
Second way (based on GetTempPath description):
DWORD size;
LPTSTR lpszPath;
size = GetTempPath(0, NULL);
lpszPath = new TCHAR[size];
GetTempPath(size, lpszPath);
/* some code here */
delete[] lpszPath;
How can windows change the buf value without the & operator?
& operator is not needed because array name is the pointer to first array element (or to all array). Try next code to demonstrate this:
TCHAR sz[1];
if ((void*)sz == (void*)&sz) _tprintf(TEXT("sz equals to &sz \n"));
if ((void*)sz == (void*)&(sz[0])) _tprintf(TEXT("sz equals to &(sz[0]) \n"));
As requested, a very simple implementation.
bool MyGetTempPath(size_t size, char* buf)
{
const char* path = "C:\\test\\";
size_t len = strlen(path);
if(buf == NULL)
return false;
if(size < len + 1)
return false;
strncpy(buf, path, size);
return true;
}
An example call to the new function:
char buffer[256];
bool success = MyGetTempPath(256, buffer);
from http://msdn.microsoft.com/en-us/library/aa364992(v=vs.85).aspx
DWORD WINAPI GetTempPath(
__in DWORD nBufferLength,
__out LPTSTR lpBuffer
);
so GetTempPath is defined something like
GetTempPath(DWORD nBufferLength, LPTSTR& lpBuffer);
What mean, that compiler passes the value lpBuffer by referenece.
Related
I've tried implementing a function like this, but unfortunately it doesn't work:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t wc[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
My main goal here is to be able to integrate normal char strings in a Unicode application. Any advice you guys can offer is greatly appreciated.
In your example, wc is a local variable which will be deallocated when the function call ends. This puts you into undefined behavior territory.
The simple fix is this:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t* wc = new wchar_t[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
Note that the calling code will then have to deallocate this memory, otherwise you will have a memory leak.
Use a std::wstring instead of a C99 variable length array. The current standard guarantees a contiguous buffer for std::basic_string. E.g.,
std::wstring wc( cSize, L'#' );
mbstowcs( &wc[0], c, cSize );
C++ does not support C99 variable length arrays, and so if you compiled your code as pure C++, it would not even compile.
With that change your function return type should also be std::wstring.
Remember to set relevant locale in main.
E.g., setlocale( LC_ALL, "" ).
const char* text_char = "example of mbstowcs";
size_t length = strlen(text_char );
Example of usage "mbstowcs"
std::wstring text_wchar(length, L'#');
//#pragma warning (disable : 4996)
// Or add to the preprocessor: _CRT_SECURE_NO_WARNINGS
mbstowcs(&text_wchar[0], text_char , length);
Example of usage "mbstowcs_s"
Microsoft suggest to use "mbstowcs_s" instead of "mbstowcs".
Links:
Mbstowcs example
mbstowcs_s, _mbstowcs_s_l
wchar_t text_wchar[30];
mbstowcs_s(&length, text_wchar, text_char, length);
You're returning the address of a local variable allocated on the stack. When your function returns, the storage for all local variables (such as wc) is deallocated and is subject to being immediately overwritten by something else.
To fix this, you can pass the size of the buffer to GetWC, but then you've got pretty much the same interface as mbstowcs itself. Or, you could allocate a new buffer inside GetWC and return a pointer to that, leaving it up to the caller to deallocate the buffer.
Andrew Shepherd 's answer.
Andrew Shepherd 's answer is Good for me, I add up some fix :
1, remove the ending char L'\0', casue sometime it will trouble.
2, use mbstowcs_s
std::wstring wtos(std::string& value){
const size_t cSize = value.size() + 1;
std::wstring wc;
wc.resize(cSize);
size_t cSize1;
mbstowcs_s(&cSize1, (wchar_t*)&wc[0], cSize, value.c_str(), cSize);
wc.pop_back();
return wc;
}
The question has several problems, but so do some of the answers. The idea of returning a pointer to allocated memory "and leaving it up to the caller to de-allocate" is asking for trouble. As a rule the best pattern is always to allocate and de-allocate within the same function. For example, something like:
wchar_t* buffer = new wchar_t[get_wcb_size(str)];
mbstowcs(buffer, str, get_wcb_size(str) + 1);
...
delete[] buffer;
In general, this requires two functions, one the caller calls to find out how much memory to allocate and a second to initialize or fill the allocated memory.
Unfortunately, the basic idea of using a function to return a "new" object is problematic -- not inherently, but because of the C++ inheritance of C memory handling. Using C++ and STL's strings/wstrings/strstreams is a better solution, but I felt the memory allocation thing needed to be better addressed.
Your problem has nothing to do with encodings, it's a simple matter of understanding basic C++. You are returning a pointer to a local variable from your function, which will have gone out of scope by the time anyone can use it, thus creating undefined behaviour (i.e. a programming error).
Follow this Golden Rule: "If you are using naked char pointers, you're Doing It Wrong. (Except for when you aren't.)"
I've previously posted some code to do the conversion and communicating the input and output in C++ std::string and std::wstring objects.
I did something like this. The first 2 zeros are because I don't know what kind of ascii type things this command wants from me. The general feeling I had was to create a temp char array. pass in the wide char array. boom. it works. The +1 ensures that the null terminating character is in the right place.
char tempFilePath[MAX_PATH] = "I want to convert this to wide chars";
int len = strlen(tempFilePath);
// Converts the path to wide characters
int needed = MultiByteToWideChar(0, 0, tempFilePath, len + 1, strDestPath, len + 1);
auto Ascii_To_Wstring = [](int code)->std::wstring
{
if (code>255 || code<0 )
{
throw std::runtime_error("Incorrect ASCII code");
}
std::string s{ char(code) };
std::wstring w{ s.begin(),s.end() };
return w;
};
I have a Function which returns a LPSTR/const char * and I need to convert it to a std::string. This is how I am doing it.
std::string szStr(foo(1));
It works just fine in all the cases just when foo returns a 32 characters long string it fails. With this approach I get "". So I thought it had to do something with the length. So I changed it a bit.
std::string szStr(foo(1) , 32);
This gives me "0"
Then I tried another tedious method
const char * cstr_a = foo(1);
const char * cstr_b = foo(2);
size_t ln_a = strlen(cstr_a);
size_t ln_b = strlen(cstr_b);
std::string szStr_a( cstr_a , ln_a );
std::string szStr_b( cstr_b , ln_b );
But strangely enough in this method both the pointers are getting the same value, viz foo(1) should return abc and foo(2) should return xyz. But here cstr_a is first getting abc but the moment cstr_b gets xyz, the value of both cstr_a and cstr_b becomes xyz. I am dazed and confused with this.
And yes, I cannot use std::wstring.
What is foo?
foo is basically reading a value from the registry and returning it as a LPSTR. Now one the value in the registry which I need to read is a MD5 hashed string (32 charecters) That's where it fails.
The Actual Foo function:
LPCSTR CRegistryOperation::GetRegValue(HKEY hHeadKey, LPCSTR szPath, LPCSTR szValue)
{
HKEY hKey;
CHAR szBuff[255] = ("");
DWORD dwBufSize = 255;
::RegOpenKeyEx(hHeadKey, (LPCSTR)szPath, 0, KEY_READ, &hKey);
::RegQueryValueEx(hKey, (LPCSTR)szValue, NULL, 0, (LPBYTE)szBuff, &dwBufSize);
::RegCloseKey(hKey);
LPCSTR cstr(szBuff);
return cstr;
}
The Original cast code:
StrResultMap RegValues;
std::string lid(CRegistryOperation::GetRegValue(HKEY_CURRENT_USER, REG_KEY_HKCU_PATH, "LicenseID"));
std::string mid(CRegistryOperation::GetRegValue(HKEY_CURRENT_USER, REG_KEY_HKCU_PATH, "MachineID"), 32);
std::string vtill(CRegistryOperation::GetRegValue(HKEY_CURRENT_USER, REG_KEY_HKCU_PATH, "ValidTill"));
std::string adate(CRegistryOperation::GetRegValue(HKEY_CURRENT_USER, REG_KEY_HKCU_PATH, "ActivateDT"));
std::string lupdate(CRegistryOperation::GetRegValue(HKEY_CURRENT_USER, REG_KEY_HKCU_PATH, "LastUpdate"));
RegValues["license_id"] = lid;
RegValues["machine_id"] = mid;
RegValues["valid_till"] = vtill;
RegValues["activation_date"] = adate;
RegValues["last_updated"] = lupdate;
Kindly help me get over it.
Thanks.
As a complement to Nordic Mainframe's anwser, there are 3 common ways to return a buffer from a C or C++ function :
use a static buffer - simple and nice until you have re-entrancy problems (multiple threads or recursivity)
pass the buffer as an input parameter, and simply return the number of characters written to it - ok if the size of buffer is really a constant
malloc the buffer in the function (it is in the heap and not in the stack) and document in flashing red that it must be freed by caller
But as you tagged your question as C++, you could create the std::string in the function and return it. C++ functions are allowed to return std::string because the different operators (copy constructor, affectation, ...) take care automatically of the allocation problem.
You can avoid returning a pointer to a buffer which has gone out of scope by returning a std::string directly.
std::string CRegistryOperation::GetRegValue(HKEY hHeadKey, LPCSTR szPath, LPCSTR szValue)
{
HKEY hKey = 0;
CHAR szBuff[255] = { 0 };
DWORD dwBufSize = sizeof(szBuf);
if (::RegOpenKeyEx(hHeadKey, (LPCSTR)szPath, 0, KEY_READ, &hKey) == ERROR_SUCCESS) {
::RegQueryValueEx(hKey, (LPCSTR)szValue, NULL, 0, (LPBYTE)szBuff, &dwBufSize);
::RegCloseKey(hKey);
}
return std::string(szBuf);
}
The GetRegValue function returns a pointer to a buffer in GetRegValue's stack frame. This does not work: after GetRegValue terminates the pointer cstr points to undefined values somewhere in the Stack. Try to make szBuff static and see if that helps.
LPCSTR CRegistryOperation::GetRegValue(HKEY hHeadKey, LPCSTR szPath, LPCSTR szValue)
{
HKEY hKey;
//Here:
static CHAR szBuff[255] = ("");
szBuff[0]=0;
DWORD dwBufSize = 255;
::RegOpenKeyEx(hHeadKey, (LPCSTR)szPath, 0, KEY_READ, &hKey);
::RegQueryValueEx(hKey, (LPCSTR)szValue, NULL, 0, (LPBYTE)szBuff, &dwBufSize);
::RegCloseKey(hKey);
LPCSTR cstr(szBuff);
return cstr;
}
UPDATE: I did not mandate to return std::string, or pass a buffer in, because that would change the interface. Returning a pointer to a static buffer is a common idiom and mostly unproblematic if the lifetime of the returned pointer value is limited to a few scopes (Like for building a std::string from the buffer value).
Multithreading isn't really an issue aynmore, because almost every compiler now has some form of thread local storage support right in the language. __declspec(thread) static CHAR szBuff[255] = (""); for example, should work for Microsoft compilers, __thread for gcc. C++11 even has a new storage class specifier for this (thread_local). You shouldn't call GetRegValue from a signal handler though, but that's OK - you can't do too much there anyway (for example allocate memory from the heap!).
UPDATE: Commenters argue, that I should not suggest this, because the pointer to the static buffer will point to invalid data when GetRegValue is called again. While this is obviously true, I think it is wrong to make an argument from that. Why? Look at these examples:
A pointer returned from strdup() is valid until free()
A pointer to something created with new is valid until deleted.
A const char * returned from string::c_str() is valid as long as the string is not modified.
A std::vector iterator is invalid, if an element from the std::vector is erased.
A std::list iterator is still valid, if an element from the std::vector is erased, unless it points to the erased element.
A pointer returned from GetRegValue is valid until GetRegValue is called again.
a std::ifstream is valid when,..you know, good(), fail() and so on.
There is no point in saying, "look, the thing gets invalid when you are not careful enought", because programming is not about being careless. We are handling objects, which have conditions under which they are valid or not and if an object has well defined conditions under which it is valid or not, then we can write programs with well defined behaviour. Returning a pointer to a static buffer (that is thread-local) has a well defined meaning and a developer can use this to write a well defined program. Unless said developer is negligent or too lazy to read the documentation of the routine of course.
I am doing some static analysis work on some old C++ code and my C++ is not the strongest. I have this piece of code:
void NIDP_clDPLogger::log(TCHAR *logString)
{
TCHAR temp_logString[1024] = {0};
_tcsncpy(temp_logString,logString,1024);
temp_logString[1023] = NULL;
...
The static analysis tool is complaining here of indexing logString (the parameter passed in to function) at 1024 when it may be shorter (the size varies, 1024 is the max. size I guess). So I guess my fix is to check the size of logString and use that, like this:
void NIDP_clDPLogger::log(TCHAR *logString)
{
size_t tempSize = sizeof(logString);
TCHAR temp_logString[tempSize] = {0};
_tcsncpy(temp_logString,logString,tempSize);
temp_logString[tempSize-1] = NULL;
I am just wondering, will this work OK? Can anybody see any flaws/problems? Building and testing this project is slightly difficult so I am basically just looking for a sanity check before I go through all that. Or is there a better way for me to do it? Can I pass a size_t value in to _tcsncpy, because a hardcoded int was there before?
Thanks for all help.
sizeof(logString) will return the size of a TCHAR*, not the size of the array passed as arrays decay to pointers when passed as argument.
If it is guaranteed that logString is null terminated you could obtain its length using _tcslen(). Otherwise, the only way to know the size of logString is to pass it into the function as another argument.
I've tried implementing a function like this, but unfortunately it doesn't work:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t wc[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
My main goal here is to be able to integrate normal char strings in a Unicode application. Any advice you guys can offer is greatly appreciated.
In your example, wc is a local variable which will be deallocated when the function call ends. This puts you into undefined behavior territory.
The simple fix is this:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t* wc = new wchar_t[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
Note that the calling code will then have to deallocate this memory, otherwise you will have a memory leak.
Use a std::wstring instead of a C99 variable length array. The current standard guarantees a contiguous buffer for std::basic_string. E.g.,
std::wstring wc( cSize, L'#' );
mbstowcs( &wc[0], c, cSize );
C++ does not support C99 variable length arrays, and so if you compiled your code as pure C++, it would not even compile.
With that change your function return type should also be std::wstring.
Remember to set relevant locale in main.
E.g., setlocale( LC_ALL, "" ).
const char* text_char = "example of mbstowcs";
size_t length = strlen(text_char );
Example of usage "mbstowcs"
std::wstring text_wchar(length, L'#');
//#pragma warning (disable : 4996)
// Or add to the preprocessor: _CRT_SECURE_NO_WARNINGS
mbstowcs(&text_wchar[0], text_char , length);
Example of usage "mbstowcs_s"
Microsoft suggest to use "mbstowcs_s" instead of "mbstowcs".
Links:
Mbstowcs example
mbstowcs_s, _mbstowcs_s_l
wchar_t text_wchar[30];
mbstowcs_s(&length, text_wchar, text_char, length);
You're returning the address of a local variable allocated on the stack. When your function returns, the storage for all local variables (such as wc) is deallocated and is subject to being immediately overwritten by something else.
To fix this, you can pass the size of the buffer to GetWC, but then you've got pretty much the same interface as mbstowcs itself. Or, you could allocate a new buffer inside GetWC and return a pointer to that, leaving it up to the caller to deallocate the buffer.
I did something like this. The first 2 zeros are because I don't know what kind of ascii type things this command wants from me. The general feeling I had was to create a temp char array. pass in the wide char array. boom. it works. The +1 ensures that the null terminating character is in the right place.
char tempFilePath[MAX_PATH] = "I want to convert this to wide chars";
int len = strlen(tempFilePath);
// Converts the path to wide characters
int needed = MultiByteToWideChar(0, 0, tempFilePath, len + 1, strDestPath, len + 1);
Andrew Shepherd 's answer.
Andrew Shepherd 's answer is Good for me, I add up some fix :
1, remove the ending char L'\0', casue sometime it will trouble.
2, use mbstowcs_s
std::wstring wtos(std::string& value){
const size_t cSize = value.size() + 1;
std::wstring wc;
wc.resize(cSize);
size_t cSize1;
mbstowcs_s(&cSize1, (wchar_t*)&wc[0], cSize, value.c_str(), cSize);
wc.pop_back();
return wc;
}
The question has several problems, but so do some of the answers. The idea of returning a pointer to allocated memory "and leaving it up to the caller to de-allocate" is asking for trouble. As a rule the best pattern is always to allocate and de-allocate within the same function. For example, something like:
wchar_t* buffer = new wchar_t[get_wcb_size(str)];
mbstowcs(buffer, str, get_wcb_size(str) + 1);
...
delete[] buffer;
In general, this requires two functions, one the caller calls to find out how much memory to allocate and a second to initialize or fill the allocated memory.
Unfortunately, the basic idea of using a function to return a "new" object is problematic -- not inherently, but because of the C++ inheritance of C memory handling. Using C++ and STL's strings/wstrings/strstreams is a better solution, but I felt the memory allocation thing needed to be better addressed.
Your problem has nothing to do with encodings, it's a simple matter of understanding basic C++. You are returning a pointer to a local variable from your function, which will have gone out of scope by the time anyone can use it, thus creating undefined behaviour (i.e. a programming error).
Follow this Golden Rule: "If you are using naked char pointers, you're Doing It Wrong. (Except for when you aren't.)"
I've previously posted some code to do the conversion and communicating the input and output in C++ std::string and std::wstring objects.
auto Ascii_To_Wstring = [](int code)->std::wstring
{
if (code>255 || code<0 )
{
throw std::runtime_error("Incorrect ASCII code");
}
std::string s{ char(code) };
std::wstring w{ s.begin(),s.end() };
return w;
};
In the CString header file (be it Microsoft's or Open Foundation Classes - http://www.koders.com/cpp/fid035C2F57DD64DBF54840B7C00EA7105DFDAA0EBD.aspx#L77 ), there is the following code snippet
struct CStringData
{
long nRefs;
int nDataLength;
int nAllocLength;
TCHAR* data() { return (TCHAR*)(&this[1]); };
...
};
What does the (TCHAR*)(&this[1]) indicate?
The CStringData struct is used in the CString class (http :// www.koders.com/cpp/fid100CC41B9D5E1056ED98FA36228968320362C4C1.aspx).
Any help is appreciated.
CString has lots of internal tricks which make it look like a normal string when passed e.g. to printf functions, despite actually being a class - without having to cast it to LPCTSTR in the argument list, e.g., in the case of varargs (...) in e.g. a printf. Thus trying to understand a single individual trick or function in the CString implementation is bad news. (The data function is an internal function which gets the 'real' buffer associated with the string.)
There's a book, MFC Internals that goes into it, and IIRC the Blaszczak book might touch it.
EDIT: As for what the expression actually translates to in terms of raw C++:-
TCHAR* data() { return (TCHAR*)(&this[1]); };
this says "pretend you're actually the first entry in an array of items allocated together. Now, the second item isnt actually a CString, it's a normal NUL terminated buffer of either Unicode or normal characters - i.e., an LPTSTR".
Another way of expressing the same thing is:
TCHAR* data() { return (TCHAR*)(this + 1); };
When you add 1 to a pointer to T, you actually add 1* sizeof T in terms of a raw memory address. So if one has a CString located at 0x00000010 with sizeof(CString) = 4, data will return a pointer to a NUL terminated array of chars buffer starting at 0x00000014
But just understanding this one thing out of context isnt necessarily a good idea.
Why do you need to know?
It returns the memory area that is immediately after the CStringData structure as an array of TCHAR characters.
You can understand why they are doing this if you look at the CString.cpp file:
static const struct {
CStringData data;
TCHAR ch;
} str_empty = {{-1, 0, 0}, 0};
CStringData* pData = (CStringData*)mem_alloc(sizeof(CStringData) + size*sizeof(TCHAR));
They do this trick, so that CString looks like a normal data buffer, and when you ask for the getdata it skips the CStringData structure and points directly to the real data buffer like char*