Safe string copying over DLL boundaries - c++

I am trying to make a DLL that exposes a certain API and thus wanted to implement a safe way to copy strings over the DLL boundaries. The DLL implementation is very straight forward - all the functions that return string values take two arguments - char* and size_t&. If the size is big enough I memcpy the contents of the string from within the DLL to the given pointer, set the size to the actual one and return a successful return code. If it is not I set the size to what it should be and return an error code. The DLL side is very straightforward.
Now what is more complicated - how to make a nice template function which given a pointer to some function from the DLL would do all the correct manipulations to fill out an instance of std::string. This is what I came down to:
template<typename I>
CErrorCode GetStringValue(const I& Instance, CErrorCode(I::*pMethod)(char*, size_t&) const, std::string& sValue)
{
std::string sTemporaryValue;
size_t nValueLength = sTemporaryValue.capacity();
sTemporaryValue.resize(nValueLength);
do
{
auto eErrorCode = (Instance.*pMethod)(const_cast<char*>(sTemporaryValue.c_str()), nValueLength);
if (eErrorCode == CErrorCode::BufferTooSmall)
{
sTemporaryValue.resize(nValueLength);
}
else
{
if (eErrorCode == CErrorCode::NoError)
{
sTemporaryValue.resize(nValueLength);
sValue = std::move(sTemporaryValue);
}
return eErrorCode;
}
}
while (true);
}
So I did the initial resize because I can't do that after the first call (since then it would erase the content, since the string is initially empty). But resize fills the string with zero characters, which kind of upsets me (I mean I know I am going to be doing the filling myself anyway). And then I need the resize even after a successful run since if the length was actually smaller I need to resize down. Any suggestions if this can be done I a nicer way?

Since you are using DLLs, my understanding is that you are on Windows, so I'm going to suggest a Windows-specific solution.
Well, the problem of safely passing string across module boundaries has already been solved on Windows by COM memory allocator and BSTR.
So, instead of passing a string pointer and a size pointer, and check if the caller's allocated buffer is large enough, and if not return the required size, etc. it seems much simpler to me to just return BSTRs from the DLL.
BSTRs are allocated using a COM allocator, and so you can allocate them and free them across different module boundaries.
There are also nice C++ wrapper classes around a BSTR, e.g. _bstr_t or ATL's CComBSTR. You can use those at the caller's site.
Once you have a BSTR instance, possibly properly wrapped in a convenient RAII wrapper at the caller's site, you can easily convert it to a std::wstring.
Or, if you want to use the Unicode UTF-8 encoding and std::string, you can convert from the BSTR's native UTF-16 encoding to UTF-8, using WideCharToMultiByte() (this MSDN Magazine article can come in handy).
Bonus reading: Eric’s Complete Guide To BSTR Semantics
P.S.
If you don't want to use the BSTR type, you can still use a common memory allocator provided by COM: e.g. you can allocate your (UTF-8) string memory using CoTaskMemAlloc() in the DLL, and free it using CoTaskMemFree() at the caller's site.

Related

How to safely return objects from DLL calls

I am fairly new to C++ and working with DLLs. I have a main application that aggregates results from different measurements. As the measurements are different from case to case I decided to put them into external DLLs so they can be loaded at runtime (they simply all export the same function). The idea is to just load them like this so the aggregator can be extended depending on the runtime needs:
typedef int (*measure)(measurement &dataHolder);
int callM() {
[...]
measurement dataHolder;
lib = LoadLibraryA("measureDeviceTypeA.dll");
measure measureFunc = (measure)GetProcAddress(lib, "measureFunc");
pluginFunc(dataHolder);
[...] // close the lib and load the next one depending on found Devices
}
This works pretty well for simple datatypes (depending on the actual definition of the struct "measurement") such as this:
typedef struct measurement {
DWORD realPBS;
DWORD imaginaryPBS;
int a;
} measurement;
Now there also may be a string of arbitrary length (char representations of results). I would like to put them into the measurement struct as well and fill them inside the actual worker function inside the DLL. My first assumption was that it would be easy to just use std::string, which works sometimes and sometimes not (as it will reallocate memory on std::string().append() and this might break (access violation) depending on the actual runtime environment of the program and the dll). I read here and here that returning a string from a function is a bad idea.
So what would be the "proper" C++ way of returning arbitrary length strings from such a call? Is it helpful at all to pass a struct to the DLL or should I split it into separate calls? I don't want to have pointers dangling around or unfreed memory when I close the DLL again.
This won't work with std::string, as noted by Dani in the comments. The problem is that std::string is a type that belongs to your implementation, and different C++ implementations have different std::strings.
For DLL's specifically (Microsoft), you do have another alternative. COM is an ancient technology, but it still works today and is unlikely to go away ever. And it has its own string type, BSTR. Visual Studio provides a helper C++ class bstr_t for your own code, but on the interface you'd use the plain BSTR from _bstr_t::GetBSTR.
BSTR relies on the Windows allocator SysAllocString from OleAut32.dll
The problem is, that the string data is often allocated on the heap, so it has to be freed / managed somehow.
You could think, hey std::string is returned by value - so why I need to care about memory management. The problem is that usually only very small strings are stored "inside" the class. For larger strings the string class contains a pointer to some "heap-storage".
Dlls can be used from and with different programming languages - which is the reason that dlls do not share a "memory manager", freeing in the dll would fail.
To solve this you need to have two function calls, one which returns a pointer / handle to the data and one to free it. Or the caller could give the callee some pointer where it wants the data to be stored. You need for that a maximum-byte-count, too.
As you can see, there are some reasons why you should avoid these APIs - but it is not always possible. See for example the Windows API (there you can find both approaches).
Another approach would be to ensure a shared memory manager, but this is tricky somehow because it must be done really early!

C++: GetPrivateProfileStringA - random values?

Let me start off by saying it's been several years (over 8) since I have had to mess with C++. I am very rusty again and it looks like I'll have to start doing some side projects to get the hang of it.
I know this is really simple and I'm sure it's something very small that I am doing incorrectly... I am receiving garbage characters from my returned value and I can't figure out why. Maybe it has to do with my encoding? Not sure.
A simple function to retrieve the value from an INI file in Windows:
LPCTSTR getConfigValue(LPCTSTR key) {
char retval[256];
DWORD dw;
if ((dw = GetPrivateProfileStringA("RLWELF", "DestinationIP", NULL, retval, 256, ".\\rlwelf.ini")) > 0) {
return NULL;
}
OutputDebugStringA(retval);
return (LPCTSTR)retval;
}
Thank you in advance!
return (LPCTSTR)retval;
You are returning the address of a local variable. That local variable is destroyed as soon as the function returns. It is therefore undefined behaviour for the caller to de-reference the returned pointer. You would need to:
Have the caller provide a buffer into which the function can write.
Have the function allocate memory on the heap. The caller would need to deallocate it.
Start using the C++ standard library and return a std::string.
Option 3 is head and shoulders above the others.
Beyond that the cast is dubious. I suspect that you are compiling with UNICODE defined. Which means that LPCTSTR expands to const wchar_t*. That in turn means that your cast would erroneous. You would be lying to the compiler. The compiler will always exact its revenge when you lie to it.
You should get out of the habit of using TCHAR. Compile with UNICODE defined and use the platform native UTF-16 character encoding for text.
Finally, GetPrivateProfileString is a really ancient API that is not encouraged to be used nowadays. The documentation says:
This function is provided only for compatibility with 16-bit Windows-based applications.
If you must use INI files, then use a good C++ library. This will result in a code writing experience that is many orders of magnitude cleaner, safer and quicker.
retval is a local variable, it is destroyed at the end of the function, so when you try to return it, the caller receives a pointer to garbage.
Instead, you could malloc a buffer and return it, take an output pointer in argument, or return a unique_ptr to a buffer.

Deallocating a char* returned from a function

When I create a function that returns a char* or const char*, is it assumed that the calling function must deallocate the returned value when it is finished with it? Otherwise, how would it get deallocated? I'm looking some other code which calls a function that returns a char* and there is not a delete statement in the calling function. Example:
char* foo();
void bar()
{
char* result = foo();
//I should have "delete result" here right?
}
EDIT:
So In my application here is foo:
LPTSTR GetTempPath(LPCTSTR fileName)
{
LPTSTR tempPath = new TCHAR[500];
GetTempPath(500,tempPath);
printf("Temp Path %ls\n",tempPath);
PathAppend(tempPath, fileName);
_tprintf(_T("New temp path: %s\n"), tempPath);
return tempPath;
}
I wasn't sure how to write this without the new TCHAR.
I'm assuming this has to be deleted? Is there a better way to write it?
If you're using C++ and not C, you should use std::string to get rid of the problem.
There are (at least) two common conventions when a char* is returned by a function. You cannot tell which is in force without reading the documentation of the function.
The function returns a pointer to statically allocated memory. In which case the caller does not need to deallocate it.
The function returns a pointer to heap allocated memory. In which case the caller does need to deallocate it. The documentation for the function must specify how the caller must deallocate the function (free, delete etc.)
Now, since you are in charge of writing your own functions, you can choose whatever protocol you like. And in your case you should not return a char* from your functions. Choose a third way. Return a std::string and let the standard library take care of allocation and deallocation. Do it this way to make life easier for the consumer of the library.
In fact, since you are writing C++, you should be shunning char*. Sure you have to use C string when interacting with the Windows API. But leave it at that. Don't pass the pain on to the consumer of your library. Hide that complexity away.
In your situation I would make sure that you have a function that can combine two std::string instances. This could perhaps be implemented using PathAppend, but it's easy enough to roll your own. Then the only interaction you need with the Windows API is a function that returns the temporary directory in a string. That looks like this:
string GetTempDir()
{
char buff[MAX_PATH+1];
DWORD count = GetTempPath(MAX_PATH+1, buff); // I've omitted error checking
return string(buff, count);
}
The code in this function is the only code that needs to deal with C strings. You can now forget all about them in the rest of your code which can treat this as a black box. Don't let the implementation details of the lowest common denominator C interface of Win32 leak into your nicely factored C++ code.
If foo() is in an external dynamic library, the library should provide some explicit way to delete result,if delete is required, or some other way to close the working session and so on.
To answer the question you posed ("is it assumed that the calling function must deallocate the returned value when it is finished with it? Otherwise, how would it get deallocated?"):
No, the caller does not necessarily have to deallocate the string. Looking at man ctime:
char *ctime(const time_t *timep);
The return value points to a statically allocated string which might be overwritten by subsequent calls to any of the date and time functions.
Which means you do not delete nor free the string it returns.

CString to char*

We are using the CString class throughout most of our code. However sometimes we need to convert to a char *. at the moment we have been doing this using variable.GetBuffer(0) and this seems to work ( this mainly happens when passing the Csting into a function where the function requires a char *). The function accepts this and we keep going.
However we have lately become worried about how this works, and whether there is a better way to do it.
The way i understand it to work is it passes a char pointer into the function that points at the first character in the CString and all works well.
I Guess we are just worried about memory leaks or any unforseen circumstances where this might not be a good idea.
If your functions only require reading the string and not modifying it, change them to accept const char * instead of char *. The CString will automatically convert for you, this is how most of the MFC functions work and it's really handy. (Actually MFC uses LPCTSTR, which is a synonym for const TCHAR * - works for both MBC and Unicode builds).
If you need to modify the string, GetBuffer(0) is very dangerous - it won't necessarily allocate enough memory for the resulting string, and you could get some buffer overrun errors.
As has been mentioned by others, you need to use ReleaseBuffer after GetBuffer. You don't need to do that for the conversion to const char *.
# the OP:
>>> I Guess we are just worried about memory leaks or any ...
Hi, calling the GetBuffer method won't lead to any memory leaks. Because the destructor is going to deallocate the buffer anyway. However, others have already warned you about the potential issues with calling this method.
#Can >>> when you call the getbuffer function it allocates memory for you.
This statement is not completely true. GetBuffer(0) does NOT allocate any memory. It merely returns a pointer to the internal string buffer that can be used to manipulate the string directly from "outside" the CString class.
However, if you pass a number, say N to it like GetBuffer(N), and if N is larger than the current length of the buffer, then the function ensures that the returned buffer is at least as large as N by allocating more memory.
Cheers,
Rajesh.
MVP, Visual ++.
when you call the getbuffer function it allocates memory for you.
when you have done with it, you need to call releasebuffer to deallocate it
try the documentation at http://msdn.microsoft.com/en-us/library/awkwbzyc.aspx for help on that.

How do you efficiently copy BSTR to wchar_t[]?

I have a BSTR object that I would like to convert to copy to a wchar__t object. The tricky thing is the length of the BSTR object could be anywhere from a few kilobytes to a few hundred kilobytes. Is there an efficient way of copying the data across? I know I could just declare a wchar_t array and alway allocate the maximum possible data it would ever need to hold. However, this would mean allocating hundreds of kilobytes of data for something that potentially might only require a few kilobytes. Any suggestions?
First, you might not actually have to do anything at all, if all you need to do is read the contents. A BSTR type is a pointer to a null-terminated wchar_t array already. In fact, if you check the headers, you will find that BSTR is essentially defined as:
typedef BSTR wchar_t*;
So, the compiler can't distinguish between them, even though they have different semantics.
There is are two important caveat.
BSTRs are supposed to be immutable. You should never change the contents of a BSTR after it has been initialized. If you "change it", you have to create a new one assign the new pointer and release the old one (if you own it).
[UPDATE: this is not true; sorry! You can modify BSTRs in place; I very rarely have had the need.]
BSTRs are allowed to contain embedded null characters, whereas traditional C/C++ strings are not.
If you have a fair amount of control of the source of the BSTR, and can guarantee that the BSTR does not have embedded NULLs, you can read from the BSTR as if it was a wchar_t and use conventional string methods (wcscpy, etc) to access it. If not, your life gets harder. You will have to always manipulate your data as either more BSTRs, or as a dynamically-allocated array of wchar_t. Most string-related functions will not work correctly.
Let's assume you control your data, or don't worry about NULLs. Let's assume also that you really need to make a copy and can't just read the existing BSTR directly. In that case, you can do something like this:
UINT length = SysStringLen(myBstr); // Ask COM for the size of the BSTR
wchar_t *myString = new wchar_t[length+1]; // Note: SysStringLen doesn't
// include the space needed for the NULL
wcscpy(myString, myBstr); // Or your favorite safer string function
// ...
delete myString; // Done
If you are using class wrappers for your BSTR, the wrapper should have a way to call SysStringLen() for you. For example:
CComBString use .Length();
_bstr_t use .length();
UPDATE: This is a good article on the subject by someone far more knowledgeable than me:
"Eric [Lippert]'s Complete Guide To BSTR Semantics"
UPDATE: Replaced strcpy() with wcscpy() in the example.
BSTR objects contain a length prefix, so finding out the length is cheap. Find out the length, allocate a new array big enough to hold the result, process into that, and remember to free it when you're done.
There is never any need for conversion. A BSTR pointer points to the first character of the string and it is null-terminated. The length is stored before the first character in memory. BSTRs are always Unicode (UTF-16/UCS-2). There was at one stage something called an 'ANSI BSTR' - there are some references in legacy APIs - but you can ignore these in current development.
This means you can pass a BSTR safely to any function expecting a wchar_t.
In Visual Studio 2008 you may get a compiler error, because BSTR is defined as a pointer to unsigned short, while wchar_t is a native type. You can either cast or turn off wchar_t compliance with /Zc:wchar_t.
One thing to keep in mind is that BSTR strings can, and often do, contain embedded nulls. A null does not mean the end of the string.
Use ATL, and CStringT then you can just use the assignment operator. Or you can use the USES_CONVERSION macros, these use heap alloc, so you will be sure that you won't leak memory.