How do I find the memory address of a string? - c++

I am having a mental block and I know I should know this but I need a little help.
If I declare a string variable like this:
string word = "Hello";
How do I find the memory address of "Hello"?
Edit: This is what I am trying to do...
Write a function that takes one argument, the address of a string, and prints that string once. (Note: you will need to use a pointer to complete this part.)
However, if a second argument, type int, is provided and is nonzero, the function should print the string a number of times equal to the number of times that function has been called at that point. (Note that the number of times the string is printed is not equal to the value of the second argument; it is equal to the number of times the function has been called so far.)

Use either:
std::string::data() if your data isn't null-terminated c-string like.
or
std::string::c_str() if you want the data and be guaranteed to get the null-termination.
Note that the pointer returned by either of these calls doesn't have to be the underlying data the std::string object is manipulating.

Take the address of the first character is the usual way to do it. &word[0]. However, needing to do this if you're not operating with legacy code is usually a sign that you're doing something wrong.

I guess you want a pointer to a plain old C-string? Then use word.c_str(). Note that this is not guaranteed to point to the internal storage of the string, it's just a (constant) C-string version you can work with.

You can use the c_str() function to get a pointer to the C string (const char *); however note that the pointer is invalidated whenever you modify the string; you have to invoke c_str() again as the old string may have been deallocated.

OK, so, I know this question is old but I feel like the obvious answer here is actually:
std::string your_string{"Hello"};
//Take the address of the beginning
auto start_address = &(*your_string.begin())
//Take the address at the end
auto end_address = &(*your_string.end())
In essence this will accomplish the same thing as using:
auto start_address = your_string.c_str();
auto end_address = your_string.c_str() + strlen(your_string.c_str());
However I would prefer the first approach (taking the address of the dereferenced iterator) because:
a) Guaranteed to work with begin/end compatible containers which might not have the c_str method. So for example, if you decided you wanted a QString (the string QT uses) or AWS::String or std::vector to hold your characters, the c_str() approach wouldn't work but the one above would.
b) Possibly not as costly as c_str()... which generally speaking should be implemented similarly to the call I made in the second line of code to get the address but isn't guaranteed to be implemented that way (E.g. if the string you are using is not null terminated it might require the reallocation and mutation of your whole string in order to add the null terminator... which will suddenly make it thread unsafe and very costly, this is not the case for std::string but might be for other types of strings)
c) It communicates intent better, in the end what you want is an address and the first bunch of code expresses exactly that.
So I'd say this approach is better for:
Clarity, compatibility and efficiency.
Edit:
Note that with either approach, if you change your initial string after taking the address the address will be invalidated (since the string might be rellocated). The compiler will not warn you against this and it could cause some very nasty bugs :/

Declare a pointer to the variable and then view it how you would.

Related

Difference between str.clear() and str = ""

I have a C++ std::string str that I've set to some string, and now want to reset so it can be used again. Is there a difference between calling str.clear() vs str = ""?
EDIT. To clarify: I'm reusing str by appending a char array buffer to it: str.append(buf)
There is no effective difference. Depending on the implementation, using clear() may be faster than assigning to a char pointer to zero. Even if this were not the case, though, prefer the method that more clearly expresses your intent. If you want to clear the string, use clear(). If you want to assign an empty string, use = "".
Though I will note, you said, "so I can use it again." Use it again for what? If you are just assigning it to something else, there's no need to clear it beforehand.

Why is wrong to modify the contents of a pointer to a string litteral?

If I write:
char *aPtr = "blue"; //would be better const char *aPtr = "blue"
aPtr[0]='A';
I have a warning. The code above can work but isn't standard, it has a undefined behavior because it's read-only memory with a pointer at string litteral. The question is:
Why is it like this?
with this code rather:
char a[]="blue";
char *aPtr=a;
aPtr[0]='A';
is ok. I want to understand under the hood what happens
The first is a pointer to a read-only value created by the compiler and placed in a read-only section of the program. You cannot modify the characters at that address because they are read-only.
The second creates an array and copies each element from the initializer (see this answer for more details on that). You can modify the contents of the array, because it's a simple variable.
The first one works the way it does because doing anything else would require dynamically-allocating a new variable, and would require garbage collection to free it. That is not how C and C++ work.
The primary reason that string literals can't be modified (without undefined behavior) is to support string literal merging.
Long ago, when memory was much tighter than today, compiler authors noticed that many programs had the same string literals repeated many times--especially things like mode strings being passed to fopen (e.g., f = fopen("filename", "r");) and simple format strings being passed to printf (e.g., printf("%d\n", a);).
To save memory, they'd avoid allocating separate memory for each instance of these strings. Instead, they'd allocate one piece of memory, and point all the pointers at it.
In a few cases, they got even trickier than that, to merge literals that were't even entirely identical. For example consider code like this:
printf("%s\t%d\n", a);
/* ... */
printf("%d\n", b);
In this case, the string literals aren't entirely identical, but the second one is identical part of the end of the first. In this case, they'd still allocate one piece of memory. One pointer would point to the beginning of the memory, and the other to the position of the %d in that same block of memory.
With a possibility (but no requirement for) string literal merging, it's essentially impossible to say what behavior you'll get when you modify a string literal. If string literals are merged, modifying one string literal might modify others that are identical, or end identically. If string literals are not merged, modifying one will have no effect on any other.
MMUs added another dimension: they allowed memory to be marked as read-only, so attempting to modify a string literal would result in a signal of some sort--but only if the system had an MMU (which was often optional at one time) and also depending on whether the compiler/linker decided to put the string literals in memory they'd marked constant or not.
Since they couldn't define what the behavior would be when you modified a string literal, they decided that modifying a string literal would produce undefined behavior.
The second case is entirely different. Here you've defined an array of char. It's clear that if you define two separate arrays, they're still separate, regardless of content, so modifying one can't possibly affect the other. The behavior is clear and always has been, so doing so gives defined behavior. The fact that the array in question might be initialized from a string literal doesn't change that.

c++ best way to call function with const char* parameter type

what is the best way to call a function with the following declaration
string Extract(const char* pattern,const char* input);
i use
string str=Extract("something","input text");
is there a problem with this usage
should i use the following
char pattern[]="something";
char input[]="input";
//or use pointers with new operator and copy then free?
the both works but i like the first one but i want to know the best practice.
A literal string (e.g. "something") works just fine as a const char* argument to a function call.
The first method, i.e. passing them literally in, is usually preferable.
There are occasions though where you don't want your strings hard-coded into the text. In some ways you can say that, a bit like magic numbers, they are magic words / phrases. So you prefer to use constant identifier to store the values and pass those in instead.
This would happen often when:
1. a word has a special meaning, and is passed in many times in the code to have that meaning.
or
2. the word may be cryptic in some way and a constant identifier may be more descriptive
Unless you plain to have duplicates of the same strings, or alter those strings, I'm a fan of the first way (passing the literals directly), it means less dotting about code to find what the parameters actually are, it also means less work in passing parameters.
Seeing as this is tagged for C++, passing the literals directly allows you to easily switch the function parameters to std::string with little effort.

What is the internal structure of an object of the (EDIT: MFC) CString class?

I need to strncpy() (effectively) from a (Edit: MFC) CString object to a C string variable. It's well known that strncpy() sometimes fails (depending on the source length **EDIT and the length specified in the call) to terminate the dest C string correctly. To avoid that evil, I'm thinking to store a NUL char inside the CString source object and then to strcpy() or memmove() that guy.
Is this a reasonable way to go about it? If so, what must I manipulate inside the CString object? If not, then what's an alternative that will guarantee a properly-terminated destination C string?
strncpy() only "fails" to null-terminate the destination string when the source string is longer than the length limit you specify. You can ensure that the destination is null-terminated by setting its last character to null yourself. For example:
#define DEST_STR_LEN 10
char dest_str[DEST_STR_LEN + 1]; // +1 for the null
strncpy(dest_str, src_str, DEST_STR_LEN);
dest_str[DEST_STR_LEN] = '\0';
If src_str is more than DEST_STR_LEN characters long, dest_str will be a properly-terminated string of DEST_STR_LEN characters. If src_str is shorter than that, strncpy() will put a null terminator somewhere within dest_str, so the null at the very end is irrelevant and harmless.
CSimpleStringT::GetString gives a pointer to a null-terminated string. Use this as the soure for strncpy. As this is C++, you should only use C-style strings when interfacing with legacy APIs. Use std::string instead.
One of the alternative ways would be to zero string first and then cast or memcpy from CString.
I hope they don't changed from when I used them: that was many years ago :)
They used an interesting 'trick' to handle the refcount and the very fast and efficient automatic conversion to char*: i.e the pointer is to LPCSTR, but some back byte is reserved to keep the implementation state.
So the struct can be used with the older windows API (LPCSTR without overhead). I found at the time the idea interesting!
Of course the key ìs the availability of allocators: they simply offsets the pointer when mallocing/freeing.
I remember there was a buffer request to (for instance) modify the data available: GetBuffer(0), followed by ReleaseBuffer().
HTH
If you are not compiling with _UNICODE enabled, then you can get a const char * from a CString very easily. Just cast it to an LPCTSTR:
CString myString("stuff");
const char *byteString = (LPCTSTR)myString;
This is guaranteed to be NULL-terminated.
If you have built with _UNICODE, then CString is a UTF-16 encoded string. You can't really do anything directly with that.
If you do need to copy the data from the CString, this very easy, even using C-style code. Just make sure that you allocate sufficient memory and are copying the right length:
CString myString("stuff");
char *outString = (char*)malloc(myString.Length() + 1);
strncpy(outString, (LPCTSTR)myString, myString.Length());
CString ends with NULL so as long as your text is correct (no NULL characters inside) then copying should be safe. You can write:
char szStr[256];
strncpy(szStr, (LPCSTR) String, 3);
szStr[3]='\0'; /// b-cos no null-character is implicitly appended to the end of destination
if you store null somehere inside CString object you will probably cause yourself more problems, CString stores its lenght internally.
Another alternative solution would rather involve support from CPU or compiler, as it's much better approach - simply make sure that when copying memory in "safe" mode, at any time after every atomic operation there is zero added on the end, so when whole loop fails, the destination string will still be terminated, without need to zero it fully before making copy.
There could be also support for fast zero - just mark start and stop of zeroed region and it's instantly cleared in RAM, this would make things a lot easier.

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!