Runtime dependency for std::string concatenation - c++

std::string sAttr("");
sAttr = sAttr+VAL_TAG_OPEN+sVal->c_str()+VAL_TAG_CLOSE;
else where in the code I have defined
const char VAL_TAG_OPEN[] = "<value>";
sVal is a variable that is retrieved off of a array of string pointers. This works fine in most of the system, windows and linux. However at a customer site, where to my belief has a version of linux on which we had done some extensive testing, produce a result as if I have never used the VAL_TAG_OPEN and VAL_TAG_CLOSE. The results I recieve is for
sAttr = sAttr+sVal->c_str();
Whats going on ?. Does std::string concatenation varies across runtime ?

Why the ->c_str()? If sVal is a std::string, try removing this call. Remember that the order of evaluation is undefined, so you may end up adding pointers instead of concatenating strings, because VAL_TAG_OPEN, sVal->c_str() and VAL_TAG_CLOSE are all plain C strings. I suggest you use the addition assignment operator +=, e.g. :
sAttr += VAL_TAG_OPEN;
sAttr += *sVal; /* sVal->c_str() ? */
sAttr += VAL_TAG_CLOSE;
(which should be faster anyway).

No, std::string concatenation should definitely not depend upon runtime, but somehow VAL_TAG_OPEN and VAL_TAG_CLOSE seem to be empty strings.
I'd guess you've some kind of buffer overflow or invalid pointer arithmetic somewhere, so that your program overwrites the memory containing those "constant" values. Wherever your memory ends up being is indeed runtime (and therefore OS version) specific. I've been trapped by similar things in the past by switching compilers or optimizer options.
As you mention keeping raw pointers to std::string instances in raw arrays, such errors are indeed not all to improbable, but may be difficult to detect, as using a DEBUG build won't give you any iterator checks with all these all to RAW things... Good luck.

I don't thing its the order of evaluation that is causing the issue. Its because of the constant char arrays at the beginning and end
const char VAL_TAG_OPEN[] = "<value>";
const char VAL_TAG_CLOSE[] = "</value>"
The concatenation operator thought VAL_TAG_OPN and VAL_TAG_CLOSE as not a null terminator string. Hence the optimizer just ignored them thinking it as garbage.
sAttr += std::string(VAL_TAG_OPEN);
sAttr += *sVal;
sAttr += std::string(VAL_TAG_CLOSE);
This does solve it.

sAttr = sAttr+VAL_TAG_OPEN+sVal->c_str()+VAL_TAG_CLOSE;
Like fbonnet said, it's an order of evaluation issue.
If that line gets evaluated strictly left to right, then the result of each addition is a std::string object, which has an operator overload for addition and things work as you expect.
If it doesn't get evaluated left to right, then you wind up adding pointers together and who knows what that will get you.
Avoid this construct and just use the += operator on std::string.

Related

How do I find the memory address of a string?

I am having a mental block and I know I should know this but I need a little help.
If I declare a string variable like this:
string word = "Hello";
How do I find the memory address of "Hello"?
Edit: This is what I am trying to do...
Write a function that takes one argument, the address of a string, and prints that string once. (Note: you will need to use a pointer to complete this part.)
However, if a second argument, type int, is provided and is nonzero, the function should print the string a number of times equal to the number of times that function has been called at that point. (Note that the number of times the string is printed is not equal to the value of the second argument; it is equal to the number of times the function has been called so far.)
Use either:
std::string::data() if your data isn't null-terminated c-string like.
or
std::string::c_str() if you want the data and be guaranteed to get the null-termination.
Note that the pointer returned by either of these calls doesn't have to be the underlying data the std::string object is manipulating.
Take the address of the first character is the usual way to do it. &word[0]. However, needing to do this if you're not operating with legacy code is usually a sign that you're doing something wrong.
I guess you want a pointer to a plain old C-string? Then use word.c_str(). Note that this is not guaranteed to point to the internal storage of the string, it's just a (constant) C-string version you can work with.
You can use the c_str() function to get a pointer to the C string (const char *); however note that the pointer is invalidated whenever you modify the string; you have to invoke c_str() again as the old string may have been deallocated.
OK, so, I know this question is old but I feel like the obvious answer here is actually:
std::string your_string{"Hello"};
//Take the address of the beginning
auto start_address = &(*your_string.begin())
//Take the address at the end
auto end_address = &(*your_string.end())
In essence this will accomplish the same thing as using:
auto start_address = your_string.c_str();
auto end_address = your_string.c_str() + strlen(your_string.c_str());
However I would prefer the first approach (taking the address of the dereferenced iterator) because:
a) Guaranteed to work with begin/end compatible containers which might not have the c_str method. So for example, if you decided you wanted a QString (the string QT uses) or AWS::String or std::vector to hold your characters, the c_str() approach wouldn't work but the one above would.
b) Possibly not as costly as c_str()... which generally speaking should be implemented similarly to the call I made in the second line of code to get the address but isn't guaranteed to be implemented that way (E.g. if the string you are using is not null terminated it might require the reallocation and mutation of your whole string in order to add the null terminator... which will suddenly make it thread unsafe and very costly, this is not the case for std::string but might be for other types of strings)
c) It communicates intent better, in the end what you want is an address and the first bunch of code expresses exactly that.
So I'd say this approach is better for:
Clarity, compatibility and efficiency.
Edit:
Note that with either approach, if you change your initial string after taking the address the address will be invalidated (since the string might be rellocated). The compiler will not warn you against this and it could cause some very nasty bugs :/
Declare a pointer to the variable and then view it how you would.

Does std::string use string interning?

I'm especially interested of windows, mingw.
Thanks.
Update:
First, I thought everyone is familiar with string interning.
http://en.wikipedia.org/wiki/String_interning
Second, my problem is in detail:
I knocked up a string class for practice. Nothing fancy you know, i just store the size and a char * in a class.
I use memcpy for the assignment.
When i do this to measure the assignment speed of std::string and my string class:
string test1 = " 65 kb text ", test2;
for(int i=0; i<1000000; i++)
{
test2 = test1;
}
mystring test3 = "65 kb text", test4;
for (int i=0; i<1000000; i++)
{
test4 = test3
}
The std::string is a winner by a large margin. I do not do anything in the assignment operator (in my class) but copy with memcpy. I do not even create a new array with the "new" operator, cause i check for size equality, and only request new if needed. How come?
For small strings, there is no problem. I cant see how can std::string assign values faster than memcpy, i bet it uses it too in the background, or something similar, so that's why i asked interning.
Update2:
by modifying the loops with a single character assignment like this: test2[15] = 78, I avoided the effect of copy-on-write of std::string. Now both codes takes exactly the same time (okay, there is an 1-2% difference, but that is negligible). So if I am not entirely mistaken, the mingw std::string must use COW.
Thank you all for your help.
Simply put, no. String interning is not feasible with mutable strings, such as all std::string-objects.
String interning may be done by the compiler only for string literals appearing in the code. If you initialise std:strings with string literals, and some of the literals occur multiple times, the compiler may store only one copy of this string in your binary. There is no string interning at run time. mingw supports compile time string interning as explained before.
Not so much, since std::string is modifiable.
Implementations have been known to attempt the use of copy-on-write, but that causes such problems in multi-threaded code that I think it's out of fashion. It's also very hard to implement correctly - perhaps impossible? If someone takes a pointer to a character in the string, and then modifies another character, I'm not sure that this is permitted to invalidate the first pointer. If it's not allowed, then COW is out of the question too, I think, but I can't remember how it works out.
No, there is no string interning in the STL. It doesn't fit the C++ design philosophy to have such a feature.
Two ideas:
Is myclass a template class? The std::string class is a typedef of the template class basic_string. This means that the complete source of basic_string instead of just the header is accessible to the compiler when your test function is compiled. This additional information enables more optimisations in exchange for higher compilation time.
Most c++ standard library implementations are highly optimised (and sadly almost unreadable).

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!

Why does MSVC++ consider "std::strcat" to be "unsafe"? (C++)

When I try to do things like this:
char* prefix = "Sector_Data\\sector";
char* s_num = "0";
std::strcat(prefix, s_num);
std::strcat(prefix, "\\");
and so on and so forth, I get a warning
warning C4996: 'strcat': This function or variable may be unsafe. Consider using strcat_s instead.
Why is strcat considered unsafe, and is there a way to get rid of this warning without using strcat_s?
Also, if the only way to get rid of the warning is to use strcat_s, how does it work (syntax-wise: apparently it does not take two arguments).
If you are using c++, why not avoid the whole mess and use std::string. The same example without any errors would look like this:
std::string prefix = "Sector_Data\\sector";
prefix += "0";
prefix += "\\"
no need to worry about buffer sizes and all that stuff. And if you have an API which takes a const char *, you can just use the .c_str() member;
some_c_api(prefix.c_str());
Because the buffer, prefix, could have less space than you are copying into it, causing a buffer overrun.
Therefore, a hacker could pass in a specially crafted string which overwrites the return address or other critical memory and start executing code in the context of your program.
strcat_s solves this by forcing you to pass in the length of the buffer into which you are copying the string; it will truncate the string if necessary to make sure that the buffer is not overrun.
google strcat_s to see precisely how to use it.
You can get rid of these warning by adding:
_CRT_SECURE_NO_WARNINGS
and
_SCL_SECURE_NO_WARNINGS
to your project's preprocessor definitions.
That's one of the string-manipulation functions in C/C++ that can lead to buffer overrun errors.
The problem is that the function doesn't know what the size of the buffers are. From the MSDN documentation:
The first argument, strDestination,
must be large enough to hold the
current strDestination and strSource
combined and a closing '\0';
otherwise, a buffer overrun can occur.
strcat_s takes an extra argument telling it the size of the buffer. This allows it to validate the sizes before doing the concat, and will prevent overruns. See http://msdn.microsoft.com/en-us/library/d45bbxx4.aspx
Because it has no means of checking to see if the destination string (prefix) in your case will be written past its bounds. strcat essentially works by looping, copying byte-by-byte the source string into the destination. Its stops when it sees a value "0" (notated by '\0') called a null terminal. Since C has no built in bounds checking, and the dest str is just a place in memory, strcat will continue going ad-infinidium even if it blows past the source str or the dest. str doesn't have a null terminal.
The solutions above are platform-specific to your windows environment. If you want something platform independent, you have to wrangle with strncat:
strncat(char* dest, const char* src, size_t count)
This is another option when used intelligently. You can use count to specify the max number of characters to copy. To do this, you have to figure out how much space is available in dest (how much you allocated - strlen(dest)) and pass that as count.
To turn the warning off, you can do this.
#pragma warning(disable:4996)
btw, I strongly recommend that you use strcat_s().
There are two problems with strcat. First, you have to do all your validation outside the function, doing work that is almost the same as the function:
if(pDest+strlen(pDest)+strlen(pScr) < destSize)
You have to walk down the entire length of both strings just to make sure it will fit, before walking down their entire length AGAIN to do the copy. Because of this, many programmers will simply assume that it will fit and skip the test. Even worse, it may be that when the code is first written it is GUARANTEED to fit, but when someone adds another strcat, or changes a buffer size or constant somewhere else in the program, you now have issues.
The other problem is if pSrc and pDst overlap. Depending on your compiler, strcat may very well be simple loop that checks a character at a time for a 0 in pSrc. If pDst overwrites that 0, then you will get into a loop that will run until your program crashes.

Getting a char* from a _variant_t in optimal time

Here's the code I want to speed up. It's getting a value from an ADO recordset and converting it to a char*. But this is slow. Can I skip the creation of the _bstr_t?
_variant_t var = pRs->Fields->GetItem(i)->GetValue();
if (V_VT(&var) == VT_BSTR)
{
char* p = (const char*) (_bstr_t) var;
The first 4 bytes of the BSTR contain the length. You can loop through and get every other character if unicode or every character if multibyte. Some sort of memcpy or other method would work too. IIRC, this can be faster than W2A or casting (LPCSTR)(_bstr_t)
Your problem (other than the possibility of a memory copy inside _bstr_t) is that you're converting the UNICODE BSTR into an ANSI char*.
You can use the USES_CONVERSION macros which perform the conversion on the stack, so they might be faster. Alternatively, keep the BSTR value as unicode if possible.
to convert:
USES_CONVERSION;
char* p = strdup(OLE2A(var.bstrVal));
// ...
free(p);
remember - the string returned from OLE2A (and its sister macros) return a string that is allocated on the stack - return from the enclosing scope and you have garbage string unless you copy it (and free it eventually, obviously)
This creates a temporary on the stack:
USES_CONVERSION;
char *p=W2A(var.bstrVal);
This uses a slightly newer syntax and is probably more robust. It has a configurable size, beyond which it will use the heap so it avoids putting massive strings onto the stack:
char *p=CW2AEX<>(var.bstrVal);
_variant_t var = pRs->Fields->GetItem(i)->GetValue();
You can also make this assignment quicker by avoiding the fields collection all together. You should only use the Fields collection when you need to retrieve the item by name. If you know the fields by index you can instead use this.
_variant_t vara = pRs->Collect[i]->Value;
Note i cannot be an integer as ADO does not support VT_INTEGER, so you might as well use a long variable.
Ok, my C++ is getting a little rusty... but I don't think the conversion is your problem. That conversion doesn't really do anything except tell the compiler to consider _bstr_t a char*. Then you're just assigning the address of that pointer to p. Nothing's actually being "done."
Are you sure it's not just slow getting stuff from GetValue?
Or is my C++ rustier than I think...