I wanted to check that if my string resized, will the address of string change or not. So I wrote the below program whereby initial capacity was 1, and then it changed to 30, I'd assume that on capacity change the string would've moved addresses, but that didnt happen.
Can someone explain why that is?
string s = "1";
string& s1 = s;
cout << &s << " capacity is " << s.capacity() << endl;
cout << &s1 << endl;
s = "sdhflshdgfljasdjflkasdfhalsjdf";
cout << &s << " capacity is " << s.capacity() << endl;
cout << &s1 << endl;
Output is
0x7ffc11fc08d0 capacity is 1
0x7ffc11fc08d0
0x7ffc11fc08d0 capacity is 30
0x7ffc11fc08d0
The string variable will not move, but the buffer it holds a pointer to internally may move to a new address as it allocates more memory. This is not observable by taking the address of the variable though. If you print the pointer returned by the .data() member (by casting it to a void pointer) you may see a change (assuming the new size is enough to trigger reallocation - many strings use a small string optimization with a pre-allocated buffer, so you need to grow beyond that).
Related
int numRows = 5;
string s ="hellohi";
vector<string> rows(min(numRows, int(s.size())));
I think it is using the fill constructor. https://www.cplusplus.com/reference/vector/vector/vector/
but I don't know it creates a vector of NULL string or a vector of an empty string ?
And what is the size of the NULL ?
And what is the size of the empty string? 1 bytes ("/0"char) ?
The constructor you're using will create empty strings. For example you can check with:
// check the number of entries in rows, should be 5
std::cout << rows.size() << std::endl;
// check the number of characters in first string, should be 0
std::cout << rows[0].size() << std::endl;
// now the size should be 11, since there are 11 entries
rows[0] = "hello world";
std::cout << rows[0].size() << std::endl;
I believe the size of NULL is implementation defined, you could find it with:
std::cout << sizeof(nullptr) << std::endl;
I get 8 as the size (which is 64 bits)
Similar to the nullptr, the size of an empty string is probably also implementation defined, you can find it like:
std::string test_string;
std::cout << sizeof(test_string) << "\n";
std::cout << test_string.size() << "\n"; // should be 0 since the string is empty
test_string = "hello world"; // it doesn't matter how long the string is, it's the same size
std::cout << sizeof(test_string) << "\n";
std::cout << test_string.size() << "\n"; // should be 11 since the string has data now
I get 32 bytes for the size. The reason the size of the string doesn't change is due to how it works behind the scenes, instead of storing data (most of the time) it only stores a pointer to the data (which is always a fixed size).
My code:
#include <iostream>
using namespace std;
int main() {
char *test = (char*)malloc(sizeof(char)*9);
test = "testArry";
cout << &test << " | " << test << endl;
test++;
cout << &test << " | " << test << endl;
return 1;
}
Result:
004FF804 | testArry
004FF804 | estArry
I don't understand how it's possible that I had moved my array pointer and address didn't change.
The pointer did change. You're just not printing it. To print the pointer test:
cout << (void*) test << endl;
&test is the memory location where test is stored.
test is the value that you incremented with test++ (i.e., you didn't increment &test).
When you do cout << test the overload of operator<< that gets picked is one that takes a const char* and treats it as a C-style string, printing the characters it points to. The cast to void* avoids this behavior in order to print the actual value of test, instead of the value it points to.
In this statement
cout << &test << " | " << test << endl;
expression &test yields the address of the variable test itself that is evidently is not changed. It is the value stored in the variable was changed.
If you want to output the value of the variable test that is the value that points inside the string literal you should write
cout << ( void * )test << " | " << test << endl;
Take into account that there is a memory leak in the program because after allocating memory the pointer is reassigned. And sting literals have types of constant character arrays. So the pointer test should be declared like
const char *test;
int main(int argc, char const *argv[])
{
const char *s1 = "hello";
string s2;
s2 = s1;
s2.reserve(10);
s2[5] = '.';
s2[6] = 'o';
s2[7] = '\0';
cout << "[" << s1 << "] [" << s2 << "]" << endl;
return 0;
}
The above code does not print s2 correctly. Instead of hello.o it prints hello always. It seems like the size of s2 remains at 5 always after the first assignment. Why is this so?
operator[] does not resize the string. And your calls to it with indices 5, 6 and 7 are out of range and undefined behavior. Use resize to set the string to a specific size, or push_back or operator+= to append characters.
Also note that you do not need to zero terminate std::string manually. The class will handle that by itself. Although you are allowed to have embedded zeros in there if you really want them, and they will be considered as part of the length of the string.
s2.reserve(10); doesn't grow the string at all, it just tells the container to reserve enough memory for at least 10 characters. It does't fill the reserved space with anything.
Hence, when you index it s2[5] you essentially index outside the bounds of the "used" string (i.e. its size), it is undefined behaviour.
To resize, you can use s2.resize(10);. This will allocate and fill the string appropriately and it will have a size of 10. To allocate and insert a character at the same time, you could also use push_back() or operator+=.
On a side note: s2[7] = '\0'; is not needed. The string class manages internally any NUL terminations that are needed for methods such as c_str() etc. You don't need to add the NUL yourself.
You should use s2.resize() instead of s2.reserve().
std::string::reserve only allocates memory, but not resizes the string. In your example:
s2 = s1; // Resize string to 6 characters
s2.reserve(10); // Allocate another 4 char, but not resize
s2[5] = '.'; // Write '.' to some memory, but the string is still not resized.
Easy fix is to use std::string::resize instead of reserve.
Short answer: use resize(10) instead of reserve(10)
Long answer:
In the implementation of std::string, there are two variables size and capacity.
Capacity is how much memory you have allocated for the string.
Size is how many valid elements (in your case, char), are allowed in your string.
Note that capacity will always be smaller than or equal to size.
When you call reserve(), you're changing capacity.
When your call resize(), you might NOT only be changing size, but you will also changing capacity if size > capacity, in which this formula would then applies:
if (size > capacity){
capacity = max(size, capacity*2); //Why multiply capacity by 2 here? This is to achieve amortized O(1) while resizing
}
Here's a code example of what OP wants and some more code for a better explanation of size and capacity
#include <iostream>
#include <string.h>
using namespace std;
int main(int argc, char const *argv[])
{ const char *s1 = "hello";
string s2;
s2 = s1;
cout << "length of s2 before reserve: " << s2.length() << endl;
cout << "capacity of s2 before reserve: " << s2.capacity() << endl;
s2.reserve(10);
cout << "length of s2 after reserve: " << s2.length() << endl; //see how length of s2 didn't change?
cout << "capacity of s2 after reserve: " << s2.capacity() << endl;
s2.resize(8); //resize(10) works too, but it seems like OP you only need enough size for 8 elements
cout << "length of s2 after resize: " << s2.length() << endl; //size changed
cout << "capacity of s2 after resize: " << s2.capacity() << endl; //capacity didn't change because size <= capacity
s2[5] = '.';
s2[6] = 'o';
s2[7] = '\0';
cout << "[" << s1 << "] [" << s2 << "]" << endl;
// You're done
// The code below is for showing you how size and capacity works.
s2.append("hii"); // calls s2.resize(11), s[8] = 'h', s[9] = 'i', s[10] = 'i', size = 8 + 3 = 11
cout << "length of s2 after appending: " << s2.length() << endl; // size = 11
cout << "capacity of s2 after appending: " << s2.capacity() << endl; //since size > capacity, but <= 2*capacity, capacity = 2*capacity
cout << "After appending: [" << s1 << "] [" << s2 << "]" << endl;
return 0;
Result:
length of s2 before reserve: 5
capacity of s2 before reserve: 5
length of s2 after reserve: 5
capacity of s2 after reserve: 10
length of s2 after resize: 8
capacity of s2 after resize: 10
[hello] [hello.o]
length of s2 after appending: 11
capacity of s2 after appending: 20
After appending: [hello] [hello.ohii]
The cppreference page says about std::basic_string::swap that it has constant complexity. As I assume that means that copying contents cannot happen, only the swapping of pointers, or similar. I wrote a test code and experienced that it does move contents under VS2010. Test code:
std::string s1("almafa");
std::string s2("kortefa");
std::cout << "s1.c_str(): "<< (void*)s1.c_str() << std::endl;
std::cout << "s2.c_str(): "<< (void*)s2.c_str() << std::endl;
std::cout << "SWAP!" << std::endl;
s1.swap(s2);
std::cout << "s1.c_str(): "<< (void*)s1.c_str() << std::endl;
std::cout << "s2.c_str(): "<< (void*)s2.c_str() << std::endl;
Output on g++ 4.6.3
s1.c_str(): 0x22fe028
s2.c_str(): 0x22fe058
SWAP!
s1.c_str(): 0x22fe058
s2.c_str(): 0x22fe028
Output on VS2010
s1.c_str(): 000000000022E2D0
s2.c_str(): 000000000022E320
SWAP!
s1.c_str(): 000000000022E2D0
s2.c_str(): 000000000022E320
Is it a divergency from the standard or something is happening that I have no knowledge about?
Some implementation of std::string use the short string optimization:
From How is std::string implemented?:
a "short string optimization" (SSO) implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer.
So the swap in your case does a copy but of fixed size, so O(1).
I started noticing that sometimes when deallocating memory in some of my programs, they would inexplicably crash. I began narrowing down the culprit and have come up with an example that illustrates a case that I am having difficulty understanding:
#include <iostream>
#include <stdlib.h>
using namespace std;
int main() {
char *tmp = (char*)malloc(16);
char *tmp2 = (char*)malloc(16);
long address = reinterpret_cast<long>(tmp);
long address2 = reinterpret_cast<long>(tmp2);
cout << "tmp = " << address << "\n";
cout << "tmp2 = " << address2 << "\n";
memset(tmp, 1, 16);
memset(tmp2, 1, 16);
char startBytes[4] = {0};
char endBytes[4] = {0};
memcpy(startBytes, tmp - 4, 4);
memcpy(endBytes, tmp + 16, 4);
cout << "Start: " << static_cast<int>(startBytes[0]) << " " << static_cast<int>(startBytes[1]) << " " << static_cast<int>(startBytes[2]) << " " << static_cast<int>(startBytes[3]) << "\n";
cout << "End: " << static_cast<int>(endBytes[0]) << " " << static_cast<int>(endBytes[1]) << " " << static_cast<int>(endBytes[2]) << " " << static_cast<int>(endBytes[3]) << "\n";
cout << "---------------\n";
free(tmp);
memcpy(startBytes, tmp - 4, 4);
memcpy(endBytes, tmp + 16, 4);
cout << "Start: " << static_cast<int>(startBytes[0]) << " " << static_cast<int>(startBytes[1]) << " " << static_cast<int>(startBytes[2]) << " " << static_cast<int>(startBytes[3]) << "\n";
cout << "End: " << static_cast<int>(endBytes[0]) << " " << static_cast<int>(endBytes[1]) << " " << static_cast<int>(endBytes[2]) << " " << static_cast<int>(endBytes[3]) << "\n";
free(tmp2);
return 0;
}
Here is the output that I am seeing:
tmp = 8795380
tmp2 = 8795400
Start: 16 0 0 0
End: 16 0 0 0
---------------
Start: 17 0 0 0
End: 18 0 0 0
I am using Borland's free compiler. I am aware that the header bytes that I am looking at are implementation specific, and that things like "reinterpret_cast" are bad practice. The question I am merely looking to find an answer to is: why does the first byte of "End" change from 16 to 18?
The 4 bytes that are considered "end" are 16 bytes after tmp, which are 4 bytes before tmp2. They are tmp2's header - why does a call to free() on tmp affect this place in memory?
I have tried the same example using new [] and delete [] to create/delete tmp and tmp2 and the same results occur.
Any information or help in understanding why this particular place in memory is being affected would be much appreciated.
You will have to ask your libc implementation why it changes. In any case, why does it matter? This is a memory area that libc has not allocated to you, and may be using to maintain its own data structures or consistency checks, or may not be using at all.
Basically you are looking at memory you didn't allocate. You can't make any supposition on what happens to the memory outside what you requested (ie the 16 bytes you allocated). There is nothing abnormal going on.
The runtime and compilers are free to do whatever they want to do with them so you should not use them in your programs. The runtime probably change the values of those bytes to keep track of its internal state.
Deallocating memory is very unlikely to crash a program. On the other hand, accessing memory you have deallocated like in your sample is big programming mistake that is likely to do so.
A good way to avoid this is to set any pointers you free to NULL. Doing so you'll force your program to crash when accessing freed variables.
It's possible that the act of removing an allocated element from the heap modifies other heap nodes, or that the implementation reserves one or more bytes of headers for use as guard bytes from previous allocations.
The memory manager must remember for example what is the size of the memory block that has been allocated with malloc. There are different ways, but probably the simplest one is to just allocate 4 bytes more than the size requested in the call and store the size value just before the pointer returned to the caller.
The implementation of free can then subtract 4 bytes from the passed pointer to get a pointer to where the size has been stored and then can link the block (for example) to a list of free reusable blocks of that size (may be using again those 4 bytes to store the link to next block).
You are not supposed to change or even look at bytes before/after the area you have allocated. The result of accessing, even just for reading, memory that you didn't allocate is Undefined Behavior (and yes, you really can get a program to really crash or behave crazily just because of reading memory that wasn't allocated).