std::strings's capacity(), reserve() & resize() functions - c++

I wan to use std::string simply to create a dynamic buffer and than iterate through it using an index. Is resize() the only function to actually allocate the buffer?
I tried reserve() but when I try to access the string via index it asserts. Also when the string's default capacity seems to be 15 bytes (in my case) but if I still can't access it as my_string[1].
So the capacity of the string is not the actual buffer? Also reserve() also does't allocate the actual buffer?
string my_string;
// I want my string to have 20 bytes long buffer
my_string.reserve( 20 );
int i = 0;
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
// store the character in
my_string[i++] = ch; // this crashes
}
If I do resize() instead of reserve() than it works fine. How is it that the string has the capacity but can't really access it with []? Isn't that the point to reserve() size so you can access it?
Add-on
In response to the answers, I would like to ask stl folks, Why would anybody use reserve() when resize() does exactly the same and it also initialize the string? I have to say I don't appreciate the performance argument in this case that much. All that resize() does additional to what reserve() does is that it merely initialize the buffer which we know is always nice to do anyways. Can we vote reserve() off the island?

Isn't that the point to reserve() size so you can access it?
No, that's the point of resize().
reserve() only gives to enough room so that future call that leads to increase of the size (e.g. calling push_back()) will be more efficient.
From your use case it looks like you should use .push_back() instead.
my_string.reserve( 20 );
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
my_string.push_back(ch);
}
How is it that the string has the capacity but can't really access it with []?
Calling .reserve() is like blowing up mountains to give you some free land. The amount of free land is the .capacity(). The land is there but that doesn't mean you can live there. You have to build houses in order to move in. The number of houses is the .size() (= .length()).
Suppose you are building a city, but after building the 50th you found that there is not enough land, so you need to found another place large enough to fit the 51st house, and then migrate the whole population there. This is extremely inefficient. If you knew you need to build 1000 houses up-front, then you can call
my_string.reserve(1000);
to get enough land to build 1000 houses, and then you call
my_string.push_back(ch);
to construct the house with the assignment of ch to this location. The capacity is 1000, but the size is still 1. You may not say
my_string[16] = 'c';
because the house #16 does not exist yet. You may call
my_string.resize(20);
to get houses #0 ~ #19 built in one go, which is why
my_string[i++] = ch;
works fine (as long as 0 ≤ i ≤ 19).
See also http://en.wikipedia.org/wiki/Dynamic_array.
For your add-on question,
.resize() cannot completely replace .reserve(), because (1) you don't always need to use up all allocated spaces, and (2) default construction + copy assignment is a two-step process, which could take more time than constructing directly (esp. for large objects), i.e.
#include <vector>
#include <unistd.h>
struct SlowObject
{
SlowObject() { sleep(1); }
SlowObject(const SlowObject& other) { sleep(1); }
SlowObject& operator=(const SlowObject& other) { sleep(1); return *this; }
};
int main()
{
std::vector<SlowObject> my_vector;
my_vector.resize(3);
for (int i = 0; i < 3; ++ i)
my_vector[i] = SlowObject();
return 0;
}
Will waste you at least 9 seconds to run, while
int main()
{
std::vector<SlowObject> my_vector;
my_vector.reserve(3);
for (int i = 0; i < 3; ++ i)
my_vector.push_back(SlowObject());
return 0;
}
wastes only 6 seconds.
std::string only copies std::vector's interface here.

No -- the point of reserve is to prevent re-allocation. resize sets the usable size, reserve does not -- it just sets an amount of space that's reserved, but not yet directly usable.
Here's one example -- we're going to create a 1000-character random string:
static const int size = 1000;
std::string x;
x.reserve(size);
for (int i=0; i<size; i++)
x.push_back((char)rand());
reserve is primarily an optimization tool though -- most code that works with reserve should also work (just, possibly, a little more slowly) without calling reserve. The one exception to that is that reserve can ensure that iterators remain valid, when they wouldn't without the call to reserve.

The capacity is the length of the actual buffer, but that buffer is private to the string; in other words, it is not yours to access. The std::string of the standard library may allocate more memory than is required to storing the actual characters of the string. The capacity is the total allocated length. However, accessing characters outside s.begin() and s.end() is still illegal.
You call reserve in cases when you anticipate resizing of the string to avoid unnecessary re-allocations. For example, if you are planning to concatenate ten 20-character strings in a loop, it may make sense to reserve 201 characters (an extra one is for the zero terminator) for your string, rather than expanding it several times from its default size.

reserve(n) indeed allocates enough storage to hold at least n elements, but it doesn't actually fill the container with any elements. The string is still empty (has size 0), but you are guaranteed, that you can add (e.g. through push_back or insert) at least n elements before the string's internal buffer needs to be reallocated, whereas resize(n) really resizes the string to contain n elements (and deletes or adds new elements if neccessary).
So reserve is actually a mere optimization facility, when you know you are adding a bunch of elements to the container (e.g. in a push_back loop) and don't want it to reallocate the storage too often, which incurs memory allocation and copying costs. But it doesn't change the outside/client view of the string. It still stays empty (or keeps its current element count).
Likewise capacity returns the number of elements the string can hold until it needs to reallocate its internal storage, whereas size (and for string also length) returns the actual number of elements in the string.

Just because reserve allocates additional space does not mean it is legitimate for you to access it.
In your example, either use resize, or rewrite it to something like this:
string my_string;
// I want my string to have 20 bytes long buffer
my_string.reserve( 20 );
int i = 0;
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
// store the character in
my_string += ch;
}

std::vector instead of std::string might also be a solution - if there are no requirements against it.
vector<char> v; // empty vector
vector<char> v(10); // vector with space for 10 elements, here char's
Your example:
vector<char> my_string(20);
int i=0;
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
my_string[i++] = ch;
}

Related

set the size of a string object [duplicate]

I wan to use std::string simply to create a dynamic buffer and than iterate through it using an index. Is resize() the only function to actually allocate the buffer?
I tried reserve() but when I try to access the string via index it asserts. Also when the string's default capacity seems to be 15 bytes (in my case) but if I still can't access it as my_string[1].
So the capacity of the string is not the actual buffer? Also reserve() also does't allocate the actual buffer?
string my_string;
// I want my string to have 20 bytes long buffer
my_string.reserve( 20 );
int i = 0;
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
// store the character in
my_string[i++] = ch; // this crashes
}
If I do resize() instead of reserve() than it works fine. How is it that the string has the capacity but can't really access it with []? Isn't that the point to reserve() size so you can access it?
Add-on
In response to the answers, I would like to ask stl folks, Why would anybody use reserve() when resize() does exactly the same and it also initialize the string? I have to say I don't appreciate the performance argument in this case that much. All that resize() does additional to what reserve() does is that it merely initialize the buffer which we know is always nice to do anyways. Can we vote reserve() off the island?
Isn't that the point to reserve() size so you can access it?
No, that's the point of resize().
reserve() only gives to enough room so that future call that leads to increase of the size (e.g. calling push_back()) will be more efficient.
From your use case it looks like you should use .push_back() instead.
my_string.reserve( 20 );
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
my_string.push_back(ch);
}
How is it that the string has the capacity but can't really access it with []?
Calling .reserve() is like blowing up mountains to give you some free land. The amount of free land is the .capacity(). The land is there but that doesn't mean you can live there. You have to build houses in order to move in. The number of houses is the .size() (= .length()).
Suppose you are building a city, but after building the 50th you found that there is not enough land, so you need to found another place large enough to fit the 51st house, and then migrate the whole population there. This is extremely inefficient. If you knew you need to build 1000 houses up-front, then you can call
my_string.reserve(1000);
to get enough land to build 1000 houses, and then you call
my_string.push_back(ch);
to construct the house with the assignment of ch to this location. The capacity is 1000, but the size is still 1. You may not say
my_string[16] = 'c';
because the house #16 does not exist yet. You may call
my_string.resize(20);
to get houses #0 ~ #19 built in one go, which is why
my_string[i++] = ch;
works fine (as long as 0 ≤ i ≤ 19).
See also http://en.wikipedia.org/wiki/Dynamic_array.
For your add-on question,
.resize() cannot completely replace .reserve(), because (1) you don't always need to use up all allocated spaces, and (2) default construction + copy assignment is a two-step process, which could take more time than constructing directly (esp. for large objects), i.e.
#include <vector>
#include <unistd.h>
struct SlowObject
{
SlowObject() { sleep(1); }
SlowObject(const SlowObject& other) { sleep(1); }
SlowObject& operator=(const SlowObject& other) { sleep(1); return *this; }
};
int main()
{
std::vector<SlowObject> my_vector;
my_vector.resize(3);
for (int i = 0; i < 3; ++ i)
my_vector[i] = SlowObject();
return 0;
}
Will waste you at least 9 seconds to run, while
int main()
{
std::vector<SlowObject> my_vector;
my_vector.reserve(3);
for (int i = 0; i < 3; ++ i)
my_vector.push_back(SlowObject());
return 0;
}
wastes only 6 seconds.
std::string only copies std::vector's interface here.
No -- the point of reserve is to prevent re-allocation. resize sets the usable size, reserve does not -- it just sets an amount of space that's reserved, but not yet directly usable.
Here's one example -- we're going to create a 1000-character random string:
static const int size = 1000;
std::string x;
x.reserve(size);
for (int i=0; i<size; i++)
x.push_back((char)rand());
reserve is primarily an optimization tool though -- most code that works with reserve should also work (just, possibly, a little more slowly) without calling reserve. The one exception to that is that reserve can ensure that iterators remain valid, when they wouldn't without the call to reserve.
The capacity is the length of the actual buffer, but that buffer is private to the string; in other words, it is not yours to access. The std::string of the standard library may allocate more memory than is required to storing the actual characters of the string. The capacity is the total allocated length. However, accessing characters outside s.begin() and s.end() is still illegal.
You call reserve in cases when you anticipate resizing of the string to avoid unnecessary re-allocations. For example, if you are planning to concatenate ten 20-character strings in a loop, it may make sense to reserve 201 characters (an extra one is for the zero terminator) for your string, rather than expanding it several times from its default size.
reserve(n) indeed allocates enough storage to hold at least n elements, but it doesn't actually fill the container with any elements. The string is still empty (has size 0), but you are guaranteed, that you can add (e.g. through push_back or insert) at least n elements before the string's internal buffer needs to be reallocated, whereas resize(n) really resizes the string to contain n elements (and deletes or adds new elements if neccessary).
So reserve is actually a mere optimization facility, when you know you are adding a bunch of elements to the container (e.g. in a push_back loop) and don't want it to reallocate the storage too often, which incurs memory allocation and copying costs. But it doesn't change the outside/client view of the string. It still stays empty (or keeps its current element count).
Likewise capacity returns the number of elements the string can hold until it needs to reallocate its internal storage, whereas size (and for string also length) returns the actual number of elements in the string.
Just because reserve allocates additional space does not mean it is legitimate for you to access it.
In your example, either use resize, or rewrite it to something like this:
string my_string;
// I want my string to have 20 bytes long buffer
my_string.reserve( 20 );
int i = 0;
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
// store the character in
my_string += ch;
}
std::vector instead of std::string might also be a solution - if there are no requirements against it.
vector<char> v; // empty vector
vector<char> v(10); // vector with space for 10 elements, here char's
Your example:
vector<char> my_string(20);
int i=0;
for ( parsing_something_else_loop )
{
char ch = <business_logic>;
my_string[i++] = ch;
}

hashtable needs lots of memory

I've declared and defined the following HashTable class. Note that I needed a hashtable of hashtables so my HashEntry struct contains a HashTable pointer. The public part is not a big deal it has the traditional hash table functions so I removed them for simplicity.
enum Status{ACTIVE, DELETED, EMPTY};
enum Type{DNS_ENTRY, URL_ENTRY};
class HashTable{
private:
struct HashEntry{
std::string key;
Status current_status;
std::string ip;
int access_count;
Type entry_type;
HashTable *table;
HashEntry(
const std::string &k = std::string(),
Status s = EMPTY,
const std::string &u = std::string(),
const int &a = int(),
Type e = DNS_ENTRY,
HashTable *t = NULL
): key(k), current_status(s), ip(u), access_count(a), entry_type(e), table(t){}
};
std::vector<HashEntry> array;
int currentSize;
public:
HashTable(int size = 1181, int csz = 0): array(size), currentSize(csz){}
};
I am using quadratic probing and I double the size of the vector in my rehash function when I hit array.size()/2. The following list is used when a larger table size is needed.
int a[16] = {49663, 99907, 181031, 360461,...}
My problem is that this class consumes so much memory. I've just profiled it with massif and found out that it needs 33MB(33 million bytes!) of memory for 125000 insertion. To be clear, actually
1 insertion -> 47352 Bytes
8 insertion -> 48376 Bytes
512 insertion -> 76.27KB
1000 insertion 2MB (array size increased to 49663 here)
27000 insertion-> 8MB (array size increased to 99907 here)
64000 insertion -> 16MB (array size increased to 181031 here)
125000 insertion-> 33MB (array size increased to 360461 here)
These may be unnecessary but I just wanted to show you how memory usage changes with the input. As you can see, when rehashing is done, memory usage doubles. For example, our initial array size was 1181. And we have just seen that 125000 elements -> 33MB.
To debug the problem, I changed the initial size to 360461. Now 127000 insertion does not need rehashing. And I see that 20MB of memory is used with this initial value. That is still huge, but I think it suggests there is a problem with rehashing. The following is my rehash function.
void HashTable::rehash(){
std::vector<HashEntry> oldArray = array;
array.resize(nextprime(array.size()));
for(int j = 0; j < array.size(); j++){
array[j].current_status = EMPTY;
}
for(int i = 0; i < oldArray.size(); i++){
if(oldArray[i].current_status == ACTIVE){
insert(oldArray[i].key);
int pos = findPos(oldArray[i].key);
array[pos] = oldArray[i];
}
}
}
int nextprime(int arraysize){
int a[16] = {49663, 99907, 181031, 360461, 720703, 1400863, 2800519, 5600533, 11200031, 22000787, 44000027};
int i = 0;
while(arraysize >= a[i]){i++;}
return a[i];
}
This is the insert function used in rehashing and everywhere else.
bool HashTable::insert(const std::string &k){
int currentPos = findPos(k);
if(isActive(currentPos)){
return false;
}
array[currentPos] = HashEntry(k, ACTIVE);
if(++currentSize > array.size() / 2){
rehash();
}
return true;
}
What am I doing wrong here? Even if it's caused by rehashing, when no rehashing is done it is still 20MB and I believe 20MB is way too much for 100k items. This hashtable is supposed to contain like 8 million elements.
The fact that 360,461 HashEntry's take 20 MB is hardly surprising. Did you try looking at sizeof(HashEntry)?
Each HashEntry includes two std::strings, a pointer, and three int's. As the old joke has it, it's not easy to answer the question "How long is a string?", in this case because there are a large variety of string implementations and optimizations, so you might find that sizeof(std::string) is anywhere between 4 and 32 bytes. (It would only be 4 bytes on a 32-bit architecture.) In practice, a string requires three pointers and the string itself unless it happens to be empty. If sizeof(std::string) is the same as sizeof(void*), then you've probably got a not-too-recent GNU standard library, in which the std::string is an opaque pointer to a block containing two pointers, a reference count, and the string itself. If sizeof(std::string) is 32 bytes, then you might have a recent GNU standard library implementation in which there is a bit of extra space in the string structure for the short-string optimization. See the answer to Why does libc++'s implementation of std::string take up 3x memory as libstdc++? for some measurements. Let's just say 32 bytes per string, and ignore the details; it won't be off by much.
So two strings (32 bytes each) plus a pointer (8 bytes) plus three ints (another 12 bytes) and four bytes of padding because one of the ints is between two 8-byte aligned objects, and that's a total of 88 bytes per HashEntry. And if you have 360,461 hash entries, that would be 31,720,568 bytes, about 30 MB. The fact that you're "only" using 20MB is probably because you're using the old GNU standard library, which optimizes empty strings to a single pointer, and the majority of your strings are empty strings because half the slots have never been used.
Now, let's take a look at rehash. Reduced to its essentials:
void rehash() {
std::vector<HashEntry> tmp = array; /* Copy the entire array */
array.resize(new_size()); /* Internally does another copy */
for (auto const& entry : tmp)
if (entry.used()) array.insert(entry); /* Yet another copy */
}
At peak, we had two copies of the smaller array as well as the new big array. Even if the new array is only 20 MB, it's not surprising that peak memory usage was almost twice that. (Indeed, this is again surprisingly small, not surprisingly big. Possibly it was not actually necessary to change the address of the new vector because it was at the end of the current allocated memory space, which could just be extended.)
Note that we did two copies of all that data, and array.resize() potentially did another one. Let's see if we can do better:
void rehash() {
std::vector<HashEntry> tmp(new_size()); /* Make an array of default objects */
for (auto const& entry: array)
if (entry.used()) tmp.insert(entry); /* Copy into the new array */
std::swap(tmp, array); /* Not a copy, just swap three pointers */
}
This way, we only do one copy. Instead of a (possible) internal copy by resize, we do a bulk construction of the new elements, which should be similar. (It's just zeroing out the memory.)
Also, in the new version we only copy the actual strings once each, instead of twice each, which is the fiddliest part of the copy and thus probably quite a large saving.
Proper string management could reduce that overhead further. rehash doesn't actually need to copy the strings, since they are not changed. So we could keep the strings elsewhere, say in a vector of strings, and just use the index into the vector in the HashEntry. Since you are not expecting to hold billions of strings, only millions, the index could a four-byte int. By also shuffling the HashEntry fields around and reducing the enums to a byte instead of four bytes (in C++11, you can specify the underlying integer type of an enum), the HashEntry could be reduced to 24 bytes, and there wouldn't be a need to leave space for as many string descriptors.
Since you are using open addressing, half your hash slots have to be empty. Since HashEntry is quite large, storing a full HashEntry in each empty slot is terribly wasteful.
You should store your HashEntry structs somewhere else and put HashEntry* in your hash table, or switch to chaining with a much denser load factor. Either one will reduce this waste.
Also, if you're going to move HashEntry objects around, swap instead of copying, or use move semantics so you don't have to copy so many strings. Be sure to clear out the strings in any entries you're no longer using.
Also, even though you say you need HashTables of HashTables, you don't really explain why. It's usually more efficient to use one hash table with efficiently represented compound keys if small hash tables are not memory-efficient.
I have changed my structure a little bit just as you all suggested, but there is this one thing that nobody has noticed.
When rehashing/resizing is being done, my rehashfunction calls insert. In this insert function I am incrementing the currentSize, which holds how many elements a hashtable has. So each time a resizing is needed, currentSize doubles itself while it should have stayed the same. I removed that line and wrote the proper code for rehashing and now I think I'm okay.
I am using two different structs now, and the program consumes 1.6GB memory for 8 million elements, which is what expected due to multibyte strings, and integers. That number was like 7-8GB before.

Why std::bad_alloc is thrown?

I'm implementing a map/reduce parallel project. However, using an input file of (more or less) 1GB, for a word count toy example, with only one mapper (which maps the whole file) i receive a std::bad_alloc exception. Unfortunately this happens only on the remote Xeon Phi (with smaller RAM), so no deep debugging.
However, the memory is occupied in 2 places: when the mapper read (store) the whole file in a char *:
void getNextKeyValue() {
key = pos;//int
value = new char[file_size];//file_size only with 1 mapper
ssize_t result = pread(fd, value, file_size, pos);
assert(result == ( file_size ) );
morePairs = false;
}
And the other one when the map function is called and a series of pair<char*,int> is stored inside a vector as the map's result:
The map function:
std::function<void(int key, char *value,MapResult<int,char*,char*,int> *result)> map_func = [](int key,char *value,MapResult<int,char*,char*,int> *result) {
const char delimit[]=" \t\r\n\v\f";
char *token , *save;
token = strtok_r(value, delimit, &save);
while (token != NULL){
result->emit(token,1);
token = strtok_r (NULL,delimit, &save);
}
};
emit implementation (and maps' result generation):
void emit(char* key, int value) {
res.push_back(pair<char*,int>(key,value));
}
...
private:
vector<pair<char*,int>> res;
note: normally key and value in emit are template-based, but I omit them for claricity in this example.
In the first place I thought that std::bad_alloc was thrown because of char *value (which takes 1GB), but the exception was thrown after a testing cout message placed after the value allocation (so that'not the problem).
From what I've read about the strtok implementation, the original char* is modified (adding \0 at the end of each token) so no addition memory is allocated.
The only remaining possibility is the vector<pair<char*,int>> occupied space, but I'm not able to figure its space (please help me about it). Supposing an average word length of 5 chars, we should have ~ 2*10^8 words.
UPDATE AFTER 1201ProgramAlarm's answer :
Unfortunately, precompute the number of words and then call resize() in order to eliminate unused vector's memory isn't feasible for two reasons:
It would greatly reduce the performance. Without calling emit and only counting the words of a file of 280MB, it take 1242ms over 1329ms total time execution (~5000s when file is read for the first time).
Using this solution, the final user should deeply consider memory usage when he writes the map function, which usually doesn't happen in classic map/reduce frameworks like Hadoop.
The problem isn't the space used by the vector, it's all the space previously used by the vector when its capacity was smaller. Unless you call reserve on the vector, it starts empty and allocates a small amount of space (typically large enough for one element) when you push the first element. During later pushes, if it doesn't have enough remaining space allocated, it will allocate more (1.5x or 2x the current size). This means you need enough free memory for both the smaller size and the larger one. Because the freed memory chunks, when combined, still won't be enough for the next larger requested amount, there can be a lot of free but unused memory.
You should call res.reserve(/*appropriate large size*/), or switch containers to a deque that, while it will need more space in the end, won't need to do the reallocations as it grows. To get the size to reserve, you can walk your file once to see how many words are really in it, reserve space for them, then walk it again and save the words.

Assigning vector size vs reserving vector size

bigvalue_t result;
result.assign(left.size() + right.size(), 0);
int carry = 0;
for(size_t i = 0; i < left.size(); i++) {
carry = 0;
for(size_t j = 0; j < right.size(); j++) {
int sum = result[i+j] + (left[i]*right[j]) + carry;
result[i+j] = sum%10;
carry = sum/10;
}
result[i+right.size()] = carry;
}
return result;
Here I used assign to allocate size of result, and result passed back normally.
When I use result.reserve(left.size()+right.size());, the function runs normally inside the both for loops. Somehow when I use print out the result.size(), it is always 0. Does reserve not allocate any space?
It is specified as
void reserve(size_type n);
Effects: A directive that informs a
vector of a planned change in size, so that it can manage the storage
allocation accordingly. After reserve(), capacity() is greater or
equal to the argument of reserve if reallocation happens; and equal to
the previous value of capacity() otherwise. Reallocation happens at
this point if and only if the current capacity is less than the
argument of reserve(). If an exception
is thrown other than by the move constructor of a non-CopyInsertable type, there are no effects.
Complexity: It does not change the size of the sequence and takes at
most linear time in the size of the sequence.
So, yes, it allocates memory, but it doesn't create any objects within the container. To actually create as much elements in the vector as you want to have later, and being able to access them via op[] you need to call resize().
reserve() is for when you want to prevent things like the vector reallocation every now and then when doing lots of push_back()s.
reserve allocates space, but doesn't really create anything. It is used in order to avoid reallocations.
For, example, if you intend to store 10000 elements, by push_back into a vector, you probably will make the vector to use re-allocations. If you use reserve before actually storing your elements, then the vector is prepared to accept about 10000 elements, thus he is prepared and the fill of the vector shall happen faster, than if you didn't use reserve.
resize, actually creates space. Note also, that resize will initialize your elements to their default values (so for an int, it will set every element to 0).
PS - In fact, when you say reserve(1000), then the vector will actually -may- allocate space for more than 1000 elements. If this happens and you store exactly 1000 elements, then the unused space remains unused (it is not de-allocated).
It is the difference between semantically increasing the size of the vector (resize/assign/push_back/etc), and physically creating more underlying memory for it to expand into (reserve).
That you see your code appear to work even with reserve is just because you're not triggering any OS memory errors (because the memory belongs to your vector), but just because you don't see any error messages or crashes doesn't mean your code is safe or correct: as far as the vector is concerned, you are writing into memory that belongs to it and not you.
If you'd used .at() instead of [] you'd have got an exception; as it is, you are simply invoking undefined behaviour.

Different addresses while filling a std::vector

Wouldn't you expect the addresses printed by the two loops to be the same? I was, and I cannot understand why (sometimes) they are different.
#include <iostream>
#include <vector>
using namespace std;
struct S {
void print_address() {
cout << this << endl;
}
};
int main(int argc,char *argv[]) {
vector<S> v;
for (size_t i = 0; i < 10; i++) {
v.push_back( S() );
v.back().print_address();
}
cout << endl;
for (size_t i = 0; i < v.size(); i++) {
v[i].print_address();
}
return 0;
}
I tested this code with many local and on-line compilers and the output I get looks like this (the last three figures are always the same):
0xaec010
0xaec031
0xaec012
0xaec013
0xaec034
0xaec035
0xaec036
0xaec037
0xaec018
0xaec019
0xaec010
0xaec011
0xaec012
0xaec013
0xaec014
0xaec015
0xaec016
0xaec017
0xaec018
0xaec019
I spotted this because making some initialization in the first loop I obtained uninitialized object in the subsequent part of the program. Am I missing something?
Because when vector capicity changes, it reallocates elements. If you std::vector::reserve enough capacity, no reallcation is needed, it will print same address.
vector<S> v;
v.reserve(10);
Note: properly use std::vector::reserve will increase application performance, because no unnecessary reallocation and objects copy.
The vector is performing re-allocations in order to grow as needed. Each time it does this, it allocates a larger buffer for the data and copies the elements across. You can see this clearly in the first loop, where each address jump is followed by a larger sequence of consecutive addresses. In the second loop, you just look at the addresses after the final reallocation.
0xaec010
0xaec031 <--
0xaec012 <--
0xaec013
0xaec034 <--
0xaec035
0xaec036
0xaec037
0xaec018 <--
0xaec019
The simplest way to instantiate a vector with 10 S objects would be
std::vector<S> v(10);
This would involve no re-allocations. See also std::vector::reserve.
Vector elements are stored contiguously; that is, they're all in a row in memory. Your vector object has to allocate space for this contiguous block of elements.
Your vector can't just keep having things added to it indefinitely. It has to grow the space it has allocated. The memory model typically doesn't allow us to expand a memory block — we have to create a new one instead. When the vector does this, it has to move all its elements to the new space. This is occurring several times within your first loop.
If you'd done:
vector<S> v;
v.reserve(10);
(which you can, since you know you'll end up with 10 elements), then no re-allocation would have been necessary, and the addresses would not have changed.
I'm not really surprised that they can change. As the vector initially has no size, it's likely to reallocate the vector once or twice during the initial loop. That'll change the base address of the vector. It's not impossible that after a resize, you'll end up using an address you used before (though I find that somewhat surprising. Are you sure about the first part of the addresses?)
If you want to ensure they don't change, you need to add a v.reserve() before you start pushing stuff on it.