Init std::string with single copy - c++

I have the following code in C++ on Win32. It's simply a C++ warp on some Win32 API that returns a CHAR *:
wstring expandEnvironmentVariables(const wstring & str)
{
DWORD neededSize = ExpandEnvironmentStrings(str.c_str(), nullptr, 0);
vector<WCHAR> expandedStr(neededSize);
if (0 == ExpandEnvironmentStrings(str.c_str(), expandedStr.data(), static_cast<DWORD>(expandedStr.size()))) {
return wstring(str);
}
return wstring(expandedStr.data());
}
What bothers me about this code, is the double copy of the result.
by the API into a vector of WCHARs.
from the vector into std::wstring.
Is there a way to implement this code with just a single copy, and without a major change to the signature of the function.
This is a specific example, but I'm more interested in the general solution and the right way to work with std::wstring/std::string, because this pattern shows itself in many places in the code.

Regarding the C++ side you can just use a wstring directly as a result variable.
To get a pointer to the buffer of a wstring of non-zero size, just use &s[0].
Just like std::vector, std::basic_string has a guaranteed contiguous buffer.
For the return it will probably get Return Value Optimization (RVO), and if not then it will be moved.
Disclaimer: I haven't checked the documentation of the API functions. I do not know if this code correct or even meaningful. I'm just assuming that.

wstring expandEnvironmentVariables(const wstring & str)
{
wstring expandedStr;
DWORD neededSize = ExpandEnvironmentStrings(str.c_str(),
nullptr, 0);
if (neededSize)
{
expandedStr.resize(neededSize);
if (0 == ExpandEnvironmentStrings(str.c_str(),
&expandedStr[0],
neededSize))
{
// pathological case requires a copy
expandedStr = str;
}
}
// RVO here
return expandedStr;
}
EDIT:
On reflection, since we're using c++ let's go the whole hog and put in proper error detection and report errors with an informative nested exception chain:
DWORD check_not_zero(DWORD retval, const char* context)
{
if(!retval)
throw std::system_error(GetLastError(),
std::system_category(),
context);
return retval;
}
std::wstring expandEnvironmentVariables(const std::wstring & str)
try
{
DWORD neededSize = check_not_zero(ExpandEnvironmentStrings(str.c_str(),
nullptr,
0),
"ExpandEnvironmentStrings1");
std::wstring expandedStr(neededSize, 0);
check_not_zero(ExpandEnvironmentStrings(str.c_str(),
&expandedStr[0],
neededSize),
"ExpandEnvironmentStrings2");
// RVO here
return expandedStr;
}
catch(...)
{
std::throw_with_nested(std::runtime_error("expandEnvironmentVariables() failed"));
}

Related

Convert CComPtr<IShelltem2> to LPWSTR*?

I'm using a variable of type CComPtr and I need to modify a LPWSTR* variable. The function I use extracts metadata about file description for executable files. I am not sure about how I should allocate memory for the LPWSTR* and how to change its value to the one of the CComPtr. lpszFileDesc must get the value of description.
BOOL ExeDescription(LPWSTR* lpszFileDesc, LPCWSTR filePath)
{
CComPtr<IShellItem2> item;
HRESULT hr = CoInitialize(nullptr);
*lpszFileDesc = NULL;
BOOL fResult = TRUE;
hr = SHCreateItemFromParsingName(filePath, nullptr, IID_PPV_ARGS(&item));
if (FAILED(hr))
{
fResult = FALSE;
}
else
{
CComPtr<WCHAR> description;
hr = item->GetString(PKEY_FileDescription, &description);
if (FAILED(hr))
{
fResult = FALSE;
}
else
{
if (!description)
{
*lpszFileDesc = PathFindFileNameW(filePath);
}
else
{
// here I want to copy the contents of description
// into lpszFileDesc but I don't know how
}
if (!*lpszFileDesc)
{
fResult = FALSE;
}
}
}
CoUninitialize();
return fResult;
}
Also, when I call this function how do I deallocate the memory for lpszFileDesc after calling the function?
For example if in wmain() I have:
LPWSTR* lpszFileDesc;
ExeDescription(LPWSTR* lpszFileDesc, LPCWSTR filePath);
How do I deallocate the memory if I don't need the file description after that?
Basic Errors
HRESULT hr = CoInitialize(nullptr);
...
CoUninitialize();
COM should be initialized only once at startup of the thread, because it defines the concurrency model of the thread (amongst other things). It's not up to your function to decide how COM will be initialized for the thread. Once COM is initialized for a thread, subsequent calls to CoInitialize[Ex] within that thread will fail anyway. So remove this code and put it into WinMain or the main function of the thread where you are using COM.
CComPtr<WCHAR> description;
Using CComPtr is wrong here, because IShellItem2::GetString() does not return an interface, but a simple C string. Such "raw" memory allocated by COM API must be freed using CoTaskMemFree(), which can be automated by using CComHeapPtr.
Preferred solution - change the interface
how do I deallocate the memory for lpszFileDesc
Do yourself a favor and use std::wstring instead of raw C string pointer to return a string from your function. The std::wstring destructor takes care of deallocation automatically. Manually managing the memory of C strings is too cumbersome and error-prone. When someone else reads your code and sees std::wstring, there will be no question about how the memory is managed.
I suggest to change your interface like this:
BOOL ExeDescription(std::wstring& fileDesc, LPCWSTR filePath);
... and the assignment within the function body becomes:
if (!description)
{
fileDesc = PathFindFileNameW(filePath);
}
else
{
fileDesc = description;
}
CComHeapPtr<WCHAR> has a conversion operator to WCHAR*, that's why the assignment to std::wstring simply works.
Call the function like this:
std::wstring fileDesc;
ExeDescription(fileDesc, filePath);
// No worries about deallocation of fileDesc!
Solution using original interface
That being said, here is a solution using your original interface. You can either use the COM allocator, as IShellItem2::GetString() already uses it (and there will be no copying in the common case) or use a different allocator (then you always have to copy). In both cases, the caller is responsible to call the right deallocation function, which you have to document (another reason why I would prefer the std::wstring solution).
Example of using the COM allocator:
BOOL ExeDescription(LPWSTR* lpszFileDesc, LPCWSTR filePath)
{
// ... other code ...
// GetString() uses CoTaskMemAlloc() internally
hr = item->GetString(PKEY_FileDescription, lpszFileDesc);
// ... other code ...
if (! *lpszFileDesc )
{
LPCWSTR fileName = PathFindFileNameW(filePath);
// Allocate buffer using the COM allocator and copy fileName to it.
std::size_t const len = wcslen(fileName);
*lpszFileDesc = reinterpret_cast<LPWSTR>(CoTaskMemAlloc(len * sizeof(WCHAR)));
if(*lpszFileDesc)
wcscpy_s(*lpszFileDesc, len, fileName);
}
// ... more code ...
}
Usage at the caller site:
LPWSTR fileDesc = nullptr;
ExeDescription(&fileDesc, filePath);
// ... use fileDesc ...
CoTaskMemFree(fileDesc);
Simplified usage with CComHeapPtr:
CComHeapPtr<WCHAR> fileDesc;
ExeDescription(&fileDesc, filePath);
// ... use fileDesc ...
// Deallocation happens automatically through CComHeapPtr's destructor

Converting a char* returned by C API to C++ string

I found this code in a C++ header-only wrapper around a C API I'm working with:
static string GetString(const char* chString)
{
string strValue;
if (NULL != chString)
{
strValue.swap(string (chString));
releaseMemory((void*&)chString);
chString = NULL;
}
return strValue;
}
I suppose the author is trying to give the string strValue ownership of chString and then free the empty buffer. I suspect this is very wrong (including it being const char*), but it actually seems to work with MSVC 12. At least I haven't seen it crash spectacularly yet.
Assuming that the C API and the C++ library are using the same heap (so that the string can reallocate the buffer if necessary and eventually release it), is there a way to properly achieve this? How about this?
template <typename T> struct Deleter { void operator()(T o) { releaseMemory((void*&)o); } };
static std::string GetString(char* chString)
{
if (NULL == chString)
return std::string();
return std::string(std::unique_ptr<char[], Deleter<char[]>>(chString).get());
}
Again, assuming the C API is using the same heap as std::string.
If that's also very wrong, then is there an immutable, owning C-style string wrapper? Something like string_view but immutable (so const char* input would be ok) and owning (so it deletes the C string, possibly with a custom deleter, in its dtor)?
I suppose the author is trying to give the string strValue ownership of chString and then free the empty buffer.
No. It makes an (inefficient and error-prone) copy of the character data pointed to by chString, then releases the memory pointed to by chString (which will be skipped if the copy throws an exception), and then returns the copy.
Assuming that the C API and the C++ library are using the same heap
That is not a correct assumption, or even a necessary one. The copy can use whatever heap it wants.
is there a way to properly achieve this? How about this?
You are on the right track to use a std::unique_ptr with a custom deleter, but there is no reason to use the T[] array specialization of std::unique_ptr.
The code can be simplified to something more like this:
void Deleter(char* o) { releaseMemory((void*&)o); }
static std::string GetString(char* chString)
{
std::string strValue;
if (chString) {
std::unique_ptr<char, decltype(&Deleter)>(chString, &Deleter);
strValue = chString;
}
return strValue;
}
Or, just get rid of the check for chString being null, it is not actually needed. std::string can be constructed from a null char*, and std::unique_ptr will not call its deleter with a null pointer:
void Deleter(char* o) { releaseMemory((void*&)o); }
static std::string GetString(char* chString)
{
std::unique_ptr<char, decltype(&Deleter)>(chString, &Deleter);
return std::string(chString);
}
Does this seem like a good solution for my last question (and ultimate goal of being able to use a char* like a string without copying it)?
template <typename DeleterT = std::default_delete<const char*>>
class c_str_view
{
public:
unique_ptr<const char*, DeleterT> strPtr_;
size_t len_;
c_str_view() {}
c_str_view(const char* charPtr) : strPtr_(charPtr), len_(strlen(charPtr)) {}
c_str_view(const char* charPtr, size_t len) : strPtr_(charPtr), len_(len) {}
operator std::string_view () const
{
return string_view(strPtr_.get(), len_);
}
};
If so, is there a good reason this isn't in the upcoming standard since string_view is coming? It only makes sense with string_view of course, since any conversion to std::string would cause a copy and make the whole exercise pointless.
Here's a test:
http://coliru.stacked-crooked.com/a/9046eb22b10a1d87

Will CString's ReleaseBuffer release Shell Allocated CoTaskMemAlloc String?

I believe, from viewing this article, I can safely use CStrings to store the returned string results of certain Windows API functions.
For example, I can do the following (not my code, from the article I linked above):
//GetCurrentDirectory gets LPTSTR
CString strCurDir;
::GetCurrentDirectory(MAX_PATH, strCurDir.GetBuffer(MAX_PATH));
strCurDir.ReleaseBuffer();
GetCurrentDirectory allocates the data in the "regular" way. I know I could also use an STL wstring to do this as well.
Now my question is, can I safely do this?
int main()
{
CString profileRootPath;
HRESULT result = SHGetKnownFolderPath(FOLDERID_Profile, 0, nullptr, (PWSTR*)&profileRootPath);
wcout << profileRootPath.GetString();
profileRootPath.ReleaseBuffer();
Sleep(10000);
return 0;
}
According to SHGetKnownFolderPath's MSDN page, the data output by SHGetKnownFolderPath needs to be de-allocated with a call to CoTaskMemFree. Is the call to ReleaseBuffer invalid because of this? Or will that work properly? Is it not a good idea to use any string class in this case and just use a plain C style array to hold the data, and then use CoTaskMemFree on the array? If the code is invalid, what is the most correct way to do this?
With ATL the code snippet might be as simple as:
CComHeapPtr<WCHAR> pszPath;
HRESULT result = SHGetKnownFolderPath(FOLDERID_Profile, 0, nullptr, (PWSTR*) &pszPath);
CString sPath(pszPath);
wcout << sPath.GetString();
~CComHeapPtr will do CoTaskMemFree going out of scope, and CString constructor will take the value as const WCHAR*.
Without CComHeapPtr you can do it like this:
WCHAR* pszPath = nullptr;
HRESULT result = SHGetKnownFolderPath(FOLDERID_Profile, 0, nullptr, (PWSTR*) &pszPath);
CString sPath(pszPath);
CoTaskMemFree(pszPath);
wcout << sPath.GetString();
GetCurrentDirectory simply takes your memory pointer to store the string to, so it makes sense to use stack variable because it has zero initialization and cleanup cost. If you need a string, you can build it from stack character array - this eliminates necessity in ReleaseBuffer call:
TCHAR pszPath[MAX_PATH];
GetCurrentDirectory(_countof(pszPath), pszPath);
CString sPath(pszPath);
The answer to my question is simply no, which I figured it would be, since CoTaskMemAlloc is a special way to allocate memory. I'll just stick with the regular way to do things.
int main()
{
WCHAR* profileRootPath = nullptr;
HRESULT result = SHGetKnownFolderPath(FOLDERID_Profile, 0, nullptr, &profileRootPath);
wcout << profileRootPath;
CoTaskMemFree(profileRootPath);
Sleep(10000);
return 0;
}

How to make this code less memory leak prone?

As an introduction, note that I am a Java programmer still getting used to the memory management issues in C++.
We have a base class which is used to encoded objects to a string of ASCII characters. Essentially, the class is using a stringstream class member to convert different datatypes to one long string, and then returns a char* to the caller which contains the encoded object data.
In testing for memory leaks, I am seeing that the implementation we are using seems prone to create memory leaks, because the user has to always remember to delete the return value of the method. Below is an excerpt of the relevant parts of the code:
char* Msg::encode()
{
// clear any data from the stringstream
clear();
if (!onEncode()) {
return 0;
}
// need to convert stringstream to char*
string encoded = data.str();
// need to copy the stringstream to a new char* because
// stringstream.str() goes out of scope when method ends
char* encoded_copy = copy(encoded);
return encoded_copy;
}
bool Msg::onEncode(void)
{
encodeNameValue(TAG(MsgTags::TAG_USERID), companyName);
encodeNameValue(TAG(MsgTags::TAG_DATE), date);
return true;
}
bool EZXMsg::encodeNameValue(string& name, int value)
{
if(empty(value))
{
return true;
}
// data is stringstream object
data << name << TAG_VALUE_SEPARATOR << value << TAG_VALUE_PAIRS_DELIMITER;
return true;
}
char* copy(string& source) {
char *a=new char[source.length() +1];
a[source.length()]=0;
memcpy(a,source.c_str(),source.length());
return a;
}
UPDATE
Well - I should have been more accurate about how the result of encode() is consumed. It is passed to boost:async_write, and program is crashing because I believe the string goes out of scope before async_write complete. It seems like I need to copy the returned string to a class member which is alive for life time of the class which sends the message (?).
This is the way the encode() method is actually used (after I changed the return value of to string):
void iserver_client::send(ezx::iserver::EZXMsg& msg) {
string encoded = msg.encode();
size_t bytes = encoded.length();
boost::asio::async_write(socket_, boost::asio::buffer(encoded, bytes), boost::bind(&iserver_client::handle_write, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred));
}
It looks like the proper way to do this is to maintain a queue/list/vector of the strings to async write. As noted here (and also in the boost chat_client sample). (But that is a separate issue.)
For this question:
in your copy function you return a pointer to a heap memory!So user maybe create memory leak,I think you can not use this copy function,you can do just like this in your encode func:
return data.str();
If you want to get a char*, you can use the member function of string:c_str(),
just like this:
string ss("hello world");
const char *p = ss.c_str();
If you use a stack string object you will not create memory leak,
You could just return a std::string. You have one there anyway:
string Msg::encode()
{
// clear any data from the stringstream
clear();
if (!onEncode()) {
return string{};
}
return data.str();
}
Then the caller would look like:
Msg msg;
msg.userID = 1234;
send(msg.encode().c_str());
The only way of achieving "automatic" deletion is with a stack variable (at some level) going out of scope. In fact, this is in general the only way of guaranteeing deletion even in case of an exception, for example.
As others mentioned std::string works just fine, since the char * is owned by the stack-allocated string, which will delete the char *.
This will not work in general, for example with non char * types.
RAII (Resource Acquisition is Initialization) is a useful idiom for dealing with such issues as memory management, lock acquisition/release, etc.
A good solution would be to use Boost's scoped_array as follows:
{
Msg msg;
msg.userID = 1234;
scoped_array<char> encoded(msg.encode());
send(encoded.get());
// delete[] automatically called on char *
}
scoped_ptr works similarly for non-array types.
FYI: You should have used delete[] encoded to match new char[source.length() +1]
While using a std::string works adequately for your specific problem, the general solution is to return a std::unique_ptr instead of a raw pointer.
std::unique_ptr<char[]> Msg::encode() {
:
return std::unique_ptr<char[]>(encoded_copy);
}
The user will then get a new unique_ptr when they call it:
auto encoded = msg.encode();
send(encoded.get());
and the memory will be freed automatically when encoded goes out of scope and is destroyed.

Safely reading string from Lua stack

How is it possible to safely read string value from Lua stack? The functions lua_tostring and lua_tolstring both can raise a Lua error (longjmp / exception of a strange type). Therefore the functions should be called in protected mode using lua_pcall probably. But I am not able to find a nice solution how to do that and get the string value from Lua stack to C++. Is it really needed to call lua_tolstring in protected mode using lua_pcall?
Actually using lua_pcall seems bad, because the string I want to read from Lua stack is an error message stored by lua_pcall.
Use lua_type before lua_tostring: If lua_type returns LUA_TSTRING, then you can safely call lua_tostring to get the string and no memory will be allocated.
lua_tostring only allocates memory when it needs to convert a number to a string.
Ok, When you call lua_pcall failed, it will return an error code. When you call lua_pcall successfully, you will get zero. So, first you should see the returned value by lua_pcall, then use the lua_type to get the type, at last, use the lua_to* functions the get the right value.
int iRet = lua_pcall(L, 0, 0, 0);
if (iRet)
{
const char *pErrorMsg = lua_tostring(L, -1); // error message
cout<<pErrorMsg<<endl;
lua_close(L);
return 0;
}
int iType = lua_type(L, -1);
switch (iType)
{
//...
case LUA_TSTRING:
{
const char *pValue = lua_tostring(L, -1);
// ...
}
}
It's all.
Good luck.
You can use the lua_isstring function to check if the value can be converted to a string without an error.
Here's how it's done in OpenTibia servers:
std::string LuaState::popString()
{
size_t len;
const char* cstr = lua_tolstring(state, -1, &len);
std::string str(cstr, len);
pop();
return str;
}
Source: https://github.com/opentibia/server/blob/master/src/lua_manager.cpp