CString to char* - c++

We are using the CString class throughout most of our code. However sometimes we need to convert to a char *. at the moment we have been doing this using variable.GetBuffer(0) and this seems to work ( this mainly happens when passing the Csting into a function where the function requires a char *). The function accepts this and we keep going.
However we have lately become worried about how this works, and whether there is a better way to do it.
The way i understand it to work is it passes a char pointer into the function that points at the first character in the CString and all works well.
I Guess we are just worried about memory leaks or any unforseen circumstances where this might not be a good idea.

If your functions only require reading the string and not modifying it, change them to accept const char * instead of char *. The CString will automatically convert for you, this is how most of the MFC functions work and it's really handy. (Actually MFC uses LPCTSTR, which is a synonym for const TCHAR * - works for both MBC and Unicode builds).
If you need to modify the string, GetBuffer(0) is very dangerous - it won't necessarily allocate enough memory for the resulting string, and you could get some buffer overrun errors.
As has been mentioned by others, you need to use ReleaseBuffer after GetBuffer. You don't need to do that for the conversion to const char *.

# the OP:
>>> I Guess we are just worried about memory leaks or any ...
Hi, calling the GetBuffer method won't lead to any memory leaks. Because the destructor is going to deallocate the buffer anyway. However, others have already warned you about the potential issues with calling this method.
#Can >>> when you call the getbuffer function it allocates memory for you.
This statement is not completely true. GetBuffer(0) does NOT allocate any memory. It merely returns a pointer to the internal string buffer that can be used to manipulate the string directly from "outside" the CString class.
However, if you pass a number, say N to it like GetBuffer(N), and if N is larger than the current length of the buffer, then the function ensures that the returned buffer is at least as large as N by allocating more memory.
Cheers,
Rajesh.
MVP, Visual ++.

when you call the getbuffer function it allocates memory for you.
when you have done with it, you need to call releasebuffer to deallocate it

try the documentation at http://msdn.microsoft.com/en-us/library/awkwbzyc.aspx for help on that.

Related

C++. Do type casted arrays need to be memory managed?

The code looks like this:
std::string str = (char *) xmlTextReaderReadString(*xml);
Memory management for str works fine. str makes a deep copy of the character array.
What is being made when typecasting is what I am worried about. xmlTextReaderReadString is returning a typedefed unsigned char * or how they defined it an (xmlChar*).
I couldn't find a viable question on stack overflow, but if you did, just link it and I'll delete this post.
So forth with an example like this:
xmlTextReaderGetAttribute (*xml, (xmlChar*)"type"))
Do I need to memory manage the typecasted "type"
Yes, you need to call xmlFree() on the returned value from xmlTextReaderReadString, after building your std::string.
I never used that library, but a google search turned up the answer in seconds.
It also has nothing to do with the cast. You got a value, you can use it, but you need to free it afterward. If you use the value, in the meantime to build a std::string, the std::string will manage its own copy, you still need to manage yours.
The second API you use is also labelled
The string must be deallocated by the caller
but that applies to the returned value, not to "type". This one is a literal that should not be freed.
http://www.mit.edu/afs.new/sipb/user/yoz/libxml2-2.7.3/doc/html/libxml-xmlreader.html

C++ strings - How to avoid obtaining invalid pointer?

In our C++ code, we have our own string class (for legacy reasons). It supports a method c_str() much like std::string. What I noticed is that many developers are using it incorrectly. I have reduced the problem to the following line:
const char* x = std::string("abc").c_str();
This seemingly innocent code is quite dangerous in the sense that the destructor on std::string gets invoked immediately after the call to c_str(). As a result, you are holding a pointer to a de-allocated memory location.
Here is another example:
std::string x("abc");
const char* y = x.substr(0,1).c_str();
Here too, we are using a pointer to de-allocated location.
These problems are not easy to find during testing as the memory location still contains valid data (although the memory location itself is invalid).
I am wondering if you have any suggestions on how I can modify class/method definition such that developers can never make such a mistake.
The modern part of the code should not deal with raw pointers like that.
Call c_str only when providing an argument to a legacy function that takes const char*. Like:
legacy_print(x.substr(0,1).c_str())
Why would you want to create a local variable of type const char*? Even if you write a copying version c_str_copy() you will just get more headache because now the client code is responsible for deleting the resulting pointer.
And if you need to keep the data around for a longer time (e.g. because you want to pass the data to multiple legacy functions) then just keep the data wrapped in a string instance the whole time.
For the basic case, you can add a ref qualifier on the "this" object, to make sure that .c_str() is never immediately called on a temporary. Of course, this can't stop them from storing in a variable that leaves scope before the pointer does.
const char *c_str() & { return ...; }
But the bigger-picture solution is to replace all functions from taking a "const char *" in your codebase with functions that take one of your string classes (at the very least, you need two: an owning string and a borrowed slice) - and make sure that none of your string class does cannot be implicitly constructed from a "const char *".
The simplest solution would be to change your destructor to write a null at the beginning of the string at destruction time. (Alternatively, fill the entire string with an error message or 0's; you can have a flag to disable this for release code.)
While it doesn't directly prevent programmers from making the mistake of using invalid pointers, it will definitely draw attention to the problem when the code doesn't do what it should do. This should help you flush out the problem in your code.
(As you mentioned, at the moment the errors go unnoticed because for the most part the code will happily run with the invalid memory.)
Consider using Valgrind or Electric Fence to test your code. Either of these tools should trivially and immediately find these errors.
I am not sure that there is much you can do about people using your library incorrectly if you warn them about it. Consider the actual stl string library. If i do this:
const char * lala = std::string("lala").c_str();
std::cout << lala << std::endl;
const char * lala2 = std::string("lalb").c_str();
std::cout << lala << std::endl;
std::cout << lala2 << std::endl;
I am basically creating undefined behavior. In the case where i run it on ideone.com i get the following output:
lala
lalb
lalb
So clearly the memory of the original lala has been overwritten. I would just make it very clear to the user in the documentation that this sort of coding is bad practice.
You could remove the c_str() function and instead provide a function that accepts a reference to an already created empty smart pointer that resets the value of the smart pointer to a new copy of the string. This would force the user to create a non temporary object which they could then use to get the raw c string and it would be destructed and free the memory when exiting the method scope.
This assumes though that your library and its users would be sharing the same heap.
EDIT
Even better, create your own smart pointer class for this purpose whose destructor calls a library function in your library to free the memory so it can be used across DLL boundaries.

Can I do a zero-copy std::string allocation in C++ from a const char * array?

Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?
EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.
If the string isn't going to change and if its lifetime is guaranteed to be longer than you are going to use the string, then don't use std::string.
Instead, consider a simple C string wrapper, like the proposed string_ref<T>.
Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:
struct binbuf_desc {
uint8_t* addr;
size_t len;
binbuf_desc(addr,len) : addr(addr), len(len) {}
}
You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.
There is no portable solution. If you tell us what toolchain you're using, someone might know a trick specific to your library implementation. But for the most part, the std::string destructor (and assignment operator) is going to free the string content, and you can't free a string literal. (It's not impossible to have exceptions to this, and in fact the small string optimization is a common case that skips deallocation, but these are implementation details.)
A better approach is to not use std::string when you don't need/want dynamic allocation. const char* still works just fine in modern C++.
Since C++17, std::string_view may be your way. It can be initialized both from a bare C string (with or without a length), or a std::string
There is no constraint that the data() method returns a zero-terminated string though.
If you need this "zero-terminated on request" behaviour, there are alternatives such as str_view from Adam Sawicki that looks satisfying (https://github.com/sawickiap/str_view)
Seems that using const char * instead of std::string is the best way to go for you. But you should also consider how you are using strings. It may be possible that there could be going on implicit conversion from char pointers to std::string objects. This could happen during function calls, for example.

How to detect if memory is dynamic or static, from a callee perspective?

Note: when I say "static string" here I mean memory that can not be handled by realloc.
Hi, I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc. As is, the procedure is a 'heavy' string processor, so being ignorant and duplicating the string whether or not it is static will surely cause some memory overhead/processing issues in the future.
I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.
I tried to use exception handlers to call realloc on a static variable... Glib reports that it can't find some private information to a structure (I'm sure) I don't know about and obviously calls abort on the program which means its not an exception that can be caught with longjmp/setjmp OR C++ try, catch finally.
I'm pretty sure there must be a way to do this reasonably. For instance dynamic memory most likely is not located anywhere near the static memory so if there is a way to divulge this information from the address... we might just have a bingo..
I'm not sure if there are any macros in the C/C++ Preprocessors that can identify the source and type of a macro argument, but it would be pretty dumb if it didn't. Macro Assemblers are pretty smart about things like that. Judging by the lack of robust error handling, I would not be a bit surprised if it did not.
C does not provide a portable way to tell statically allocated memory blocks from dynamically allocated ones. You can build your own struct with a string pointer and a flag indicating the type of memory occupied by the object. In C++ you can make it a class with two different constructors, one per memory type, to make your life easier.
As far as aborting your program goes, trying to free or re-allocate memory that has not been allocated dynamically is undefined behavior, so aborting is a fair game.
You may be able to detect ranges of memory and do some pointer comparisons. I've done this in some garbage collection code, where I need to know whether a pointer is in the stack, heap, or elsewhere.
If you control all allocation, you can simply keep min and max bounds based on every dynamic pointer that ever came out of malloc, calloc or realloc. A pointer lower than min or greater than max is probably not in the heap, and this min and max delimited region is unlikely to intersect with any static area, ever. If you know that a pointer is either static or it came from malloc, and that pointer is outside of the "bounding box" of malloced storage, then it must be static.
There are some "museum" machines where that sort of stuff doesn't work and the C standard doesn't give a meaning to comparisons of pointers to different objects using the relational operators, other than exact equality or inequality.
Any solution you would get would be platform specific, so you might want to specify the platform you are running on.
As for why a library should call abort when you pass it unexpected parameters, that tends to be safer than continuing execution. It's more annoying, certainly, but at that point the library knows that the code calling into it is in an state that cannot be recovered from.
I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc.
Fundamentally, the problem is that you want to do memory management based on information that isn't available in the scope you're operating in. Obviously you know if the string is on the stack or heap when you create it, but that information is lost by the time you're inside your function. Trying to fix that is going to be nearly impossible and definitely outside of the Standard.
I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.
As already mentioned, C doesn't have exceptions. C++ could do this, but the C++ Standards Committee believes that having C functions behave differently in C++ would be a nightmare.
I'm pretty sure there must be a way to do this reasonably.
You could have your application replace the default stack with one you created (and, as such, know the range of addresses in) using ucontext.h or Windows Fibers, and check if the address is inside the that range. However, (1) this puts a huge burden on any application using your library (of course, if you wrote the only application using your library, then you may be willing to accept that burden); and (2) doesn't detect memory that can't be realloced for other reasons (allocated using static, allocated using a custom allocator, allocated using SysAlloc or HeapAlloc on Windows, allocated using new in C++, etc.).
Instead, I would recommend having your function take a function pointer that would point at a function used to reallocate the memory. If the function pointer is NULL, then you duplicate the memory. Otherwise, you call the function.
original poster here. I neglected to mention that I have a working solution to the problem, it is not as robust as I would have hoped for. Please do not be upset, I appreciate everyone participating in this Request For Comments and Answers. The 'procedure' in question is variadic in nature and expects no more than 63 anonymous char * arguments.
What it is: a multiple string concatenator. It can handle many arguments but I advise the developer against passing more than 20 or so. The developer never calls the procedure directly. Instead a macro known as 'the procedure name' passes the arguments along with a trailing null pointer, so I know when I have met the end of statistics gathering.
If the function recieves only two arguments, I create a copy of the first argument and return that pointer. This is the string literal case. But really all it is doing is masking strdup
Failing the single valid argument test, we proceed to realloc and memcpy, using record info from a static database of 64 records containing each pointer and its strlen, each time adding the size of the memcopy to a secondary pointer (memcpy destination) that began as a copy of the return value from realloc.
I've written a second macro with an appendage of 'd' to indicate that the first argument is not dynamic, therefore a dynamic argument is required, and that macro uses the following code to inject a dynamic argument into the actual procedure call as the first argument:
strdup("")
It is a valid memory block that can be reallocated. Its strlen returns 0 so when the loop adds the size of it to the records, it affects nothing. The null terminator will be overwritten by memcpy. It works pretty damned well I should say. However being new to C in only the past few weeks, I didn't understand that you can't 'fool proof' this stuff. People follow directions or wind up in DLL hell I suppose.
The code works great without all of these extra shenanigans do-hickies and whistles, but without a way to reciprocate a single block of memory, the procedure is lost on loop processing, because of all the dynamic pointer mgmt. involved. Therefore the first argument must always be dynamic. I read somehwere someone had suggested using a c-static variable holding the pointer in the function, but then you can't use the procedure to do other things in other functions, such as would be needed in a recursive descent parser that decided to compile strings as it went along.
If you would like to see the code just ask!
Happy Coding!
mkstr.cpp
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
struct mkstr_record {
size_t size;
void *location;
};
// use the mkstr macro (in mkstr.h) to call this procedure.
// The first argument to mkstr MUST BE dynamically allocated. i.e.: by malloc(),
// or strdup(), unless that argument is the sole argument to mkstr. Calling mkstr()
// with a single argument is functionally equivalent to calling strdup() on the same
// address.
char *mkstr_(char *source, ...) {
va_list args;
size_t length = 0, item = 0;
mkstr_record list[64]; /*
maximum of 64 input vectors. this goes beyond reason!
the result of this procedure is a string that CAN be
concatenated by THIS procedure, or further more reallocated!
We could probably count the arguments and initialize properly,
but this function shouldn't be used to concatenate more than 20
vectors per call. Unless you are just "asking for it".
In any case, develop a workaround. Thank yourself later.
*/// Argument Range Will Not Be Validated. Caller Beware!!!
va_start(args, source);
char *thisArg = source;
while (thisArg) {
// don't validate list bounds here.
// an if statement here is too costly for
// for the meager benefit it can provide.
length += list[item].size = strlen(thisArg);
list[item].location = thisArg;
thisArg = va_arg(args, char *);
item++;
}
va_end(args);
if (item == 1) return strdup(source); // single argument: fail-safe
length++; // final zero terminator index.
char *str = (char *) realloc(source, length);
if (!str) return str; // don't care. memory error. check your work.
thisArg = (str + list[0].size);
size_t count = item;
for (item = 1; item < count; item++) {
memcpy(thisArg, list[item].location, list[item].size);
thisArg += list[item].size;
}
*(thisArg) = '\0'; // terminate the string.
return str;
}
mkstr.h
#ifndef MKSTR_H_
#define MKSTR_H_
extern char *mkstr_(char *string, ...);
// This macro ensures that the final argument to "mkstr" is null.
// arguments: const char *, ...
// limitation: 63 variable arguments max.
// stipulation: caller must free returned pointer.
#define mkstr(str, args...) mkstr_(str, ##args, NULL)
#define mkstrd(str, args...) mkstr_(strdup(str), ##args, NULL)
/* calling mkstr with more than 64 arguments should produce a segmentation fault
* this is not a bug. it is intentional operation. The price of saving an in loop
* error check comes at the cost of writing code that looks good and works great.
*
* If you need a babysitter, find a new function [period]
*/
#endif /* MKSTR_H_ */
Don't for get to mention me in the credits. She's fine and dandy.

How do you efficiently copy BSTR to wchar_t[]?

I have a BSTR object that I would like to convert to copy to a wchar__t object. The tricky thing is the length of the BSTR object could be anywhere from a few kilobytes to a few hundred kilobytes. Is there an efficient way of copying the data across? I know I could just declare a wchar_t array and alway allocate the maximum possible data it would ever need to hold. However, this would mean allocating hundreds of kilobytes of data for something that potentially might only require a few kilobytes. Any suggestions?
First, you might not actually have to do anything at all, if all you need to do is read the contents. A BSTR type is a pointer to a null-terminated wchar_t array already. In fact, if you check the headers, you will find that BSTR is essentially defined as:
typedef BSTR wchar_t*;
So, the compiler can't distinguish between them, even though they have different semantics.
There is are two important caveat.
BSTRs are supposed to be immutable. You should never change the contents of a BSTR after it has been initialized. If you "change it", you have to create a new one assign the new pointer and release the old one (if you own it).
[UPDATE: this is not true; sorry! You can modify BSTRs in place; I very rarely have had the need.]
BSTRs are allowed to contain embedded null characters, whereas traditional C/C++ strings are not.
If you have a fair amount of control of the source of the BSTR, and can guarantee that the BSTR does not have embedded NULLs, you can read from the BSTR as if it was a wchar_t and use conventional string methods (wcscpy, etc) to access it. If not, your life gets harder. You will have to always manipulate your data as either more BSTRs, or as a dynamically-allocated array of wchar_t. Most string-related functions will not work correctly.
Let's assume you control your data, or don't worry about NULLs. Let's assume also that you really need to make a copy and can't just read the existing BSTR directly. In that case, you can do something like this:
UINT length = SysStringLen(myBstr); // Ask COM for the size of the BSTR
wchar_t *myString = new wchar_t[length+1]; // Note: SysStringLen doesn't
// include the space needed for the NULL
wcscpy(myString, myBstr); // Or your favorite safer string function
// ...
delete myString; // Done
If you are using class wrappers for your BSTR, the wrapper should have a way to call SysStringLen() for you. For example:
CComBString use .Length();
_bstr_t use .length();
UPDATE: This is a good article on the subject by someone far more knowledgeable than me:
"Eric [Lippert]'s Complete Guide To BSTR Semantics"
UPDATE: Replaced strcpy() with wcscpy() in the example.
BSTR objects contain a length prefix, so finding out the length is cheap. Find out the length, allocate a new array big enough to hold the result, process into that, and remember to free it when you're done.
There is never any need for conversion. A BSTR pointer points to the first character of the string and it is null-terminated. The length is stored before the first character in memory. BSTRs are always Unicode (UTF-16/UCS-2). There was at one stage something called an 'ANSI BSTR' - there are some references in legacy APIs - but you can ignore these in current development.
This means you can pass a BSTR safely to any function expecting a wchar_t.
In Visual Studio 2008 you may get a compiler error, because BSTR is defined as a pointer to unsigned short, while wchar_t is a native type. You can either cast or turn off wchar_t compliance with /Zc:wchar_t.
One thing to keep in mind is that BSTR strings can, and often do, contain embedded nulls. A null does not mean the end of the string.
Use ATL, and CStringT then you can just use the assignment operator. Or you can use the USES_CONVERSION macros, these use heap alloc, so you will be sure that you won't leak memory.