I am learning how vectors work in c++, and wrote a sample program to try to learn how memory with vectors are handled.
#include <iostream>
#include <vector>
int main()
{
//Test 1:
double n = 3.5;
std::vector<double> test;
std::cout << sizeof(test) << std::endl;
test.push_back(n);
std::cout << sizeof(test) << std::endl;
std::cout << std::endl;
std::cout << std::endl;
std::cout << std::endl;
//Test 2
std::vector<int> test2;
std::cout << sizeof(test2) << std::endl;
for (int i = 0; i < 1000; i++) {
test2.push_back(i);
}
std::cout << sizeof(test2) << std::endl;
}
Interestingly, the program prints out 24 as the number of bytes stored each-time. Despite adding new elements to the vector. How is the amount of memory that the vector occupies when it is initially declared the same as after I have added elements to the vector?
Internally, the vector object has a pointer to dynamically-allocated memory that contains the elements. When you use sizeof(test) you're just getting the size of the structure that contains the pointer, the size of the memory that it points to is not included.
This memory has to be dynamically-allocated so that the vector can grow and shrink as needed. It's not possible for a class object to change its size.
To get the amount of memory being used by the data storage, use sizeof(double) * test.capacity().
Related
Why do I get bad_alloc error?
It appears to be limiting me to 32 bits but it is a 64 bit machine and compiler.
I am using this to store large sets of data which are in the vector int array.
I tried setting heap and stack during compiling but this did not seem to affect the bad_alloc.
#include<iostream>
#include<vector>
//set vector with large array of integers
struct Fld
{
int array[256];
};
std::vector <Fld> fld;
int main()
{
std::cout << fld.max_size() << "\n";
int length = 100000000;
try
{
//show maximum vector array size
std::cout << fld.max_size() << "\n";
std::cout << "resize [" << length << "]\n";
//resize to size larger than 32 bit
fld.resize(length);
std::cout << "good\n";
}
catch(std::bad_alloc& ba)
{
std::cout << "bad_alloc caught: " << ba.what() << "\n";
}
}
You are limited by the amount of storage available. You attempted to allocation ~102 gigabytes of storage (each Fld is 1KB). Most systems won't let you do that.
max_size is a theoretical maximum imposed by the size of the data structure in question. It's not a promise that your computer has that much storage.
In c++, when all the memory allocated to a container (say vector) is used up (and we are trying to add one more element), the memory will be reallocated. However, I was wondering that how class in c++ manages the memory for containers.
For example, I run the following code:
#include <iostream>
#include <vector>
class Test{
public:
int i = 0;
std::vector<int> v;
};
int main(){
Test t;
std::cout << "Address of t: " << &t << ", capacity of vector: " << t.v.capacity() << ", size of vector: " << t.v.size() << ", address of vector: " << &(t.v) << std::endl;
t.v.push_back(1);
std::cout << "Address of t: " << &t << ", capacity of vector: " << t.v.capacity() << ", size of vector: " << t.v.size() << ", address of vector: " << &(t.v) << std::endl;
t.v.push_back(2);
std::cout << "Address of t: " << &t << ", capacity of vector: " << t.v.capacity() << ", size of vector: " << t.v.size() << ", address of vector: " << &(t.v) << std::endl;
return 0;
}
And the output is:
Address of t: 0x61fee8, capacity of vector: 0, size of vector: 0, address of vector: 0x61feec
Address of t: 0x61fee8, capacity of vector: 1, size of vector: 1, address of vector: 0x61feec
Address of t: 0x61fee8, capacity of vector: 2, size of vector: 2, address of vector: 0x61feec
The address of the vector is not changed. Does it mean the c++ uses a pointer to represent each data member (so address 0x61feec actually points to the address of the vector)?
The address of the vector is not changed.
Correct, because the vector itself is not moving around in memory.
Does it mean the c++ uses a pointer to represent each data member (so address 0x61feec actually points to the address of the vector)?
Everything in memory has an address.
The std::vector class internally contains a data member that is a pointer to an array of elements. The vector::size() method reports the number of valid elements in the array, while the vector::capacity() method reports the maximum number of elements the array is allocated to hold. The vector (re-)allocates that array dynamically as needed, ie whenever the size() is equal to the capacity() when adding new elements.
Nothing in your example code is printing the address of that array itself. The vector::data() method returns a pointer to that array. Add that pointer to your logging, and you will see it change value as the capacity() changes over time, eg:
void log(const Test &t)
{
std::cout << "Address of t: " << &t
<< ", capacity of vector: " << t.v.capacity()
<< ", size of vector: " << t.v.size()
<< ", address of vector: " << &(t.v)
<< ", address of vector data: " << t.v.data()
<< std::endl;
}
int main(){
Test t;
log(t);
for(int i = 1; i <= 50; i++)
{
t.v.push_back(i);
log(t);
}
return 0;
}
The memory is embedded into the containing class. Let's look at an easy example:
struct myvector {
size_t size;
void* data;
}
class Test {
myvector v;
}
In this case, new objects of class Test will be allocated with sizeof(size_t)+sizeof(void*) bytes.
Now when it comes to resizing the vector, all that is done, is a realloc() on the memory pointed to by data.
Of course, the std::vector implementation is far more complicaten than that, but I think you get the idea.
I am creating std::list of struct elements. With a certain criterion, I want to store addresses of few elements (because those addresses don't change(?)) from the list into std::vector for quick access in another usage. An example of the things is given below
#include <iostream>
#include <vector>
#include <list>
struct Astruct{
double x[2];
int rank;
};
int main(int argc, char *argv[]) {
std::list<Astruct> ants;
std::vector< Astruct* > ptr;
for (auto i = 0; i != 20; ++i) {
Astruct local;
local.x[0] = 1.1;
local.x[1] = 1.2;
local.rank = i;
// put in list
ants.push_back(local);
// store address of odd numbers
// rather than temperory address, permenent address from list is needed
if(local.rank %2 == 0) ptr.push_back(&local);
}
// print the selected elements using addresses from the list
for(int num = 0; num != ptr.size(); num++){
Astruct *local;
local = ptr.at(num);
std::cout << " rank " << local->rank << "\n";
}
/*
// quick way to check whether certain address (eg 3rd element) exists in the std::vector
std::list<Astruct>::iterator it = ants.begin();
std::advance(it , 2);
for(int num = 0; num != ptr.size(); num++){
if(it == ptr.at(num)) std::cout << " exists in vector \n " ;
}
*/
// print memory in bytes for all variables
std::cout << " sizeof Astruct " << sizeof(Astruct) << "\n";
std::cout << " sizeof ants " << sizeof(ants) << "\n";
std::cout << " sizeof ptr " << sizeof(ptr) << "\n";
}
What's the way to access an address of a particular element from the list?
Is it efficient method to add elements to list? (in first for loop)
What is the quickest way to check whether certain address exists in the vector? (shown in comment block)
How to determine the memory size in bytes for different variables here? (end of the code)
Thanks.
What's the way to access an address of a particular element from the list?
address=&(*iterator);
Is it efficient method to add elements to list? (in first for loop)
the first loop does not use the list at all! (Ah, OK, after edition it does)
all the addresses which are stored in the vector refer to a local variable which disappears after each iteration; this is undefined behaviour (very probably, but nothing is certain, all these addresses are the same)
What is the quickest way to check whether certain address exists in the vector? (shown in comment block)
usualy std::find() from <algorithm> is suitable.
How to determine the memory size in bytes for different variables here? (end of the code)
std::cout << " sizeof Astruct " << sizeof(Astruct) << "\n"; is OK
std::cout << " sizeof ants " << size(ants)*sizeof(Astruct) << "\n"; is an approximation since we don't know the overhead of the list and its nodes
std::cout << " sizeof ptr " << size(ptr)*sizeof(Astruct *) << "\n"; is an approximation since we don't know the overhead of the vector
I wanted to read an array of double values from a file to an array. I have like 128^3 values. My program worked just fine as long as I stayed at 128^2 values, but now I get an "segmentation fault" error, even though 128^3 ≈ 2,100,000 is by far below the maximum of int. So how many values can you actually put into an array of doubles?
#include <iostream>
#include <fstream>
int LENGTH = 128;
int main(int argc, const char * argv[]) {
// insert code here...
const int arrLength = LENGTH*LENGTH*LENGTH;
std::string filename = "density.dat";
std::cout << "opening file" << std::endl;
std::ifstream infile(filename.c_str());
std::cout << "creating array with length " << arrLength << std::endl;
double* densdata[arrLength];
std::cout << "Array created"<< std::endl;
for(int i=0; i < arrLength; ++i){
double a;
infile >> a;
densdata[i] = &a;
std::cout << "read value: " << a << " at line " << (i+1) << std::endl;
}
return 0;
}
You are allocating the array on the stack, and stack size is limited (by default, stack limit tends to be in single-digit megabytes).
You have several options:
increase the size of the stack (ulimit -s on Unix);
allocate the array on the heap using new;
move to using std::vector.
I have a program that currently generates large arrays and matrices that can be upwards of 10GB in size. The program uses MPI to parallelize workloads, but is limited by the fact that each process needs its own copy of the array or matrix in order to perform its portion of the computation. The memory requirements make this problem unfeasible with a large number of MPI processes and so I have been looking into Boost::Interprocess as a means of sharing data between MPI processes.
So far, I have come up with the following which creates a large vector and parallelizes the summation of its elements:
#include <cstdlib>
#include <ctime>
#include <functional>
#include <iostream>
#include <string>
#include <utility>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <mpi.h>
typedef boost::interprocess::allocator<double, boost::interprocess::managed_shared_memory::segment_manager> ShmemAllocator;
typedef boost::interprocess::vector<double, ShmemAllocator> MyVector;
const std::size_t vector_size = 1000000000;
const std::string shared_memory_name = "vector_shared_test.cpp";
int main(int argc, char **argv) {
int numprocs, rank;
MPI::Init();
numprocs = MPI::COMM_WORLD.Get_size();
rank = MPI::COMM_WORLD.Get_rank();
if(numprocs >= 2) {
if(rank == 0) {
std::cout << "On process rank " << rank << "." << std::endl;
std::time_t creation_start = std::time(NULL);
boost::interprocess::shared_memory_object::remove(shared_memory_name.c_str());
boost::interprocess::managed_shared_memory segment(boost::interprocess::create_only, shared_memory_name.c_str(), size_t(12000000000));
std::cout << "Size of double: " << sizeof(double) << std::endl;
std::cout << "Allocated shared memory: " << segment.get_size() << std::endl;
const ShmemAllocator alloc_inst(segment.get_segment_manager());
MyVector *myvector = segment.construct<MyVector>("MyVector")(alloc_inst);
std::cout << "myvector max size: " << myvector->max_size() << std::endl;
for(int i = 0; i < vector_size; i++) {
myvector->push_back(double(i));
}
std::cout << "Vector capacity: " << myvector->capacity() << " | Memory Free: " << segment.get_free_memory() << std::endl;
std::cout << "Vector creation successful and took " << std::difftime(std::time(NULL), creation_start) << " seconds." << std::endl;
}
std::flush(std::cout);
MPI::COMM_WORLD.Barrier();
std::time_t summing_start = std::time(NULL);
std::cout << "On process rank " << rank << "." << std::endl;
boost::interprocess::managed_shared_memory segment(boost::interprocess::open_only, shared_memory_name.c_str());
MyVector *myvector = segment.find<MyVector>("MyVector").first;
double result = 0;
for(int i = rank; i < myvector->size(); i = i + numprocs) {
result = result + (*myvector)[i];
}
double total = 0;
MPI::COMM_WORLD.Reduce(&result, &total, 1, MPI::DOUBLE, MPI::SUM, 0);
std::flush(std::cout);
MPI::COMM_WORLD.Barrier();
if(rank == 0) {
std::cout << "On process rank " << rank << "." << std::endl;
std::cout << "Vector summing successful and took " << std::difftime(std::time(NULL), summing_start) << " seconds." << std::endl;
std::cout << "The arithmetic sum of the elements in the vector is " << total << std::endl;
segment.destroy<MyVector>("MyVector");
}
std::flush(std::cout);
MPI::COMM_WORLD.Barrier();
boost::interprocess::shared_memory_object::remove(shared_memory_name.c_str());
}
sleep(300);
MPI::Finalize();
return 0;
}
I noticed that this causes the entire shared object to be mapped into each processes' virtual memory space - which is an issue with our computing cluster as it limits virtual memory to be the same as physical memory. Is there a way to share this data structure without having to map out the entire shared memory space - perhaps in the form of sharing a pointer of some kind? Would trying to access unmapped shared memory even be defined behavior? Unfortunately the operations we are performing on the array means that each process eventually needs to access every element in it (although not concurrently - I suppose its possible to break up the shared array into pieces and trade portions of the array for those you need, but this is not ideal).
Since the data you want to share is so large, it may be more practical to treat the data as a true file, and use file operations to read the data that you want. Then, you do not need to use shared memory to share the file, just let each process read directly from the file system.
ifstream file ("data.dat", ios::in | ios::binary);
file.seekg(someOffset, ios::beg);
file.read(array, sizeof(array));