Which data structure is better for an array of std string - c++

I need a structure as follow:
The structure must hold fixed size std::strings so that the number of its elements is finit (100 - 10000000).
I would like to be able to access each element randomly as follow:
std::string Temp = MyStrcuture[i];
or
MyStrcuture[i] = std::string Temp;
I have to use the fastest structure with no (possibly) memory leak.
Which one is better for me?
std::string* MyStrcuture = new std::string[Nu_of_Elements];
std::queue< std:string> MyStrcuture(Nu_of_Elements);
std::vector< std:string> MyStrcuture(Nu_of_Elements);
boost::circular_buffer< std::string> MyStrcuture(Nu_of_Elements);
Your suggestion?

std::vector< std:string> MyStrcuture(Nu_of_Elements);
Vector is the best fit for your requirements. It supports index-based element access as the elements are stored in continuous memory addresses, and has flexibility with size.
std:string* MyStrcuture = new std::string[Nu_of_Elements]; No
C++ STL vector vs array in the real world
std::queue< std:string> MyStrcuture(Nu_of_Elements); No
How do I get the nth item in a queue in java?
Index-based element access is not supported.
std::vector< std:string> MyStrcuture(Nu_of_Elements); Yes
Clean-up : The vector's destructor automatically invokes the destructor of each element in the vector.
Boost::circular_buffer< std::string> MyStrcuture(Nu_of_Elements); No
Same reason as second one. Know more

Well, since your string have fixed size, if you don't have dedicated requirement when processing string and have enough free memory for contiguous allocation. You can use std::array< char, 400 > or std::unique_ptr< char* > instead of std::string.
You have to manage memory in C way. consider smart pointer
std::queue doesn't have random access, Access c++ queue elements like an array
std::vector is suitable if the number of string will be changed. However, the clear() function just call the destructor of elements, not free vector allocated memory (you can check the capacity after clear).
After reading boost documentation. The random access circular buffer is suitable if your number of string have an upper limit (that you said 10 millions). But its a waste of memory if actually you have so few strings. So I suggest to use with smart pointer.
If your number of string are fixed and unchanged from the beginning. You can have a look at C++11 array container

If number of elements and length is fixed and memory is critical, you may consider using plain char array, which provides minimal memory overhead and fast accessibility. Your code will look like this:
char* MyStructure = new char[n * 401];
memset(MyStructure, 0, n * 401);
std::string Temp = MyStructure[i * 401]; // Get value
strcpy(MyStructure[i * 401], Temp.c_str()); // Put value
401 here is for 400 bytes of your string and 1 trailing zero.

Related

How to reserve a multi-dimensional Vector without increasing the vector size?

I have data which is N by 4 which I push back data as follows.
vector<vector<int>> a;
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
N would be less than 13000. In order to prevent unnecessary reallocation, I would like to reserve 13000 by 4 spaces in advance.
After reading multiple related posts on this topic (eg How to reserve a multi-dimensional Vector?), I know the following will do the work. But I would like to do it with reserve() or any similar function if there are any, to be able to use push_back().
vector<vector<int>> a(13000,vector<int>(4);
or
vector<vector<int>> a;
a.resize(13000,vector<int>(4));
How can I just reserve memory without increasing the vector size?
If your data is guaranteed to be N x 4, you do not want to use a std::vector<std::vector<int>>, but rather something like std::vector<std::array<int, 4>>.
Why?
It's the more semantically-accurate type - std::array is designed for fixed-width contiguous sequences of data. (It also opens up the potential for more performance optimizations by the compiler, although that depends on exactly what it is that you're writing.)
Your data will be laid out contiguously in memory, rather than every one of the different vectors allocating potentially disparate heap locations.
Having said that - #pasbi's answer is correct: You can use std::vector::reserve() to allocate space for your outer vector before inserting any actual elements (both for vectors-of-vectors and for vectors-of-arrays). Also, later on, you can use the std::vector::shrink_to_fit() method if you ended up inserting a lot less than you had planned.
Finally, one other option is to use a gsl::multispan and pre-allocate memory for it (GSL is the C++ Core Guidelines Support Library).
You've already answered your own question.
There is a function vector::reserve which does exactly what you want.
vector<vector<int>> a;
a.reserve(N);
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
This will reserve memory to fit N times vector<int>. Note that the actual size of the inner vector<int> is irrelevant at this point since the data of a vector is allocated somewhere else, only a pointer and some bookkeeping is stored in the actual std::vector-class.
Note: this answer is only here for completeness in case you ever come to have a similar problem with an unknown size; keeping a std::vector<std::array<int, 4>> in your case will do perfectly fine.
To pick up on einpoklum's answer, and in case you didn't find this earlier, it is almost always a bad idea to have nested std::vectors, because of the memory layout he spoke of. Each inner vector will allocate its own chunk of data, which won't (necessarily) be contiguous with the others, which will produce cache misses.
Preferably, either:
Like already said, use an std::array if you have a fixed and known amount of elements per vector;
Or flatten your data structure by having a single std::vector<T> of size N x M.
// Assuming N = 13000, M = 4
std::vector<int> vec;
vec.reserve(13000 * 4);
Then you can access it like so:
// Before:
int& element = vec[nIndex][mIndex];
// After:
int& element = vec[mIndex * 13000 + nIndex]; // Still assuming N = 13000

What's the difference between 2d vector and map of vector?

Let's say I've declared
map< int , vector<int> > g1;
vector< vector<int> > g2;
What are the similarities and dissimilarities between these two ?
The similarity is the way you access data, it can be the same syntax:
std::cout << g1[3][2] << std::endl;
std::cout << g2[3][2] << std::endl;
The main difference is the following: the map of vector doesn't have to contain all the indices. Then, you can have, as example, only 3 vectors in your map accessed with keys '17', '1234' and 13579 :
g2[17].resize(10);
g2[1234].resize(5);
g2[13579].resize(100);
If you want the same syntax with a vector of vectors, you need to have at least 13579 vectors (including 13576 empty vector) in your main vector. But this will use a lot of unused space in the memory.
Moreover, in your map, you also can access your vectors with negative keys (which is not possible in the vector of vectors):
g2[-10].resize(10);
After this obviously high difference, the storage of data is different. The vector allocates contiguous memory, while the map is stored as tree. The complexity of access in the vector is O(1), while it's O(log(n)) in the map. I invite you to learn some tutorial about containers in C++ to understand all the differences and the usual way to use them.
They are fundamentally different. While you may be able to do both g2[0] and g1[0], the behavior is vastly different. Assume there is nothing at index 0, then std::map will default construct a new value_type, in this case a vector, and return a reference, whereas std::vector has undefined behavior, but typically either segfaults or returns garbage.
They are also completely different in terms of memory layout. While std::map is backed by a red-black tree, std::vector is contiguous in memory. So inserting into the map will always result in dynamic allocation somewhere in memory, whereas the vector would be resized in case its current capacity is exceeded. Note however, that a vector of vectors is not contiguous in memory. The first vector, which itself is contiguous in memory is made up of vectors which look roughly like this in terms of data:
struct vector
{
T* data;
size_t capacity;
size_t size;
};
Where each of the vectors owns its dynamic memory allocation at data.
The advantage of the map is that is does not have to be densely populated, i.e. you can have something at index 0 and 12902 without all the stuff in between, plus it is sorted. If you don't need the sorted property and can use c++11 consider std::unordered_map. The vector is always densely populated, i.e. at size 10000, elements 0-9999 exist.
With example you can understand the difference. Lets say the vector<int> stores unique ids of people, and map stores respective pincode as key.
map< int , vector<int> > listOfPeopleAtRespectivePinCode;
vector< vector<int> > bunchOfGroupsOfPeople;
Evidently, map are capable to associate key and value (here a list of values), while vector can efficiently store a bunch of data.

Implementation defined to use a reserved vector without resizing it?

Is it implementation defined to use a reserved vector without resizing it?
By that I mean:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
std::vector<unsigned int> foo;
foo.reserve(1024);
foo[0] = 10;
std::cout<<foo[0];
return 0;
}
In the above, I reserve a good amount of space and I assigned a value to one of the indices in that space. However, I did not call push_back which "resizes" the vector and gives it a default value for each element (which I'm trying to avoid). So in this foo.size() is 0 while foo.capacity() is 1024.
So is this valid code or is it implementation defined? Seeing as I'm assigning to a vector with "0" size. It works but I'm not sure if it's a good idea..
The reason I'm trying to avoid the default value is because for large allocations, I don't need it "zero-ing" out each index as I will decide when I want to write to it or not. I'd use a raw pointer but the lodepng API accepts only a vector for decoding from file.
std::vector::reserve just reserves memory, so the next push_back does not have to allocate memory. It does not change the size of the vector.
If you want a vector with an initial size of 1024 elements, you can use the constructor to do that:
std::vector<unsigned int> foo(1024);
Note that if you create a vector with an initial size of e.g. 1024 elements, if you then do push_back you add an element, so the size of the vector increases to 1025 elements.
It is illegal, regardless of the type of item in the container or what seems to happen on a particular compiler. From 23.1.1/12 (Table 68) we learn that operator[] behaves like *(a.begin() + n). Since you haven't added any items to the container this is the same as accessing an iterator past end() which is undefined.

Copying an array into a std::vector

I was searching about this topic and I found many ways to convert an array[] to an std::vector, like using:
assign(a, a + n)
or, direct in the constructor:
std::vector<unsigned char> v ( a, a + n );
Those solve my problem, but I am wondering if it is possible (and correct) to do:
myvet.resize( 10 );
memcpy( &myvet[0], buffer, 10 );
I am wondering this because I have the following code:
IDiskAccess::ERetRead nsDisks::DiskAccess::Read( std::vector< uint8_t >& bufferRead, int32_t totalToRead )
{
uint8_t* data = new uint8_t[totalToRead];
DWORD totalRead;
ReadFile( mhFile, data, totalToRead, &totalRead, NULL );
bufferRead.resize( totalRead );
bufferRead.assign( data, data + totalRead );
delete[] data;
return IDiskAccess::READ_OK;
}
And I would like to do:
IDiskAccess::ERetRead nsDisks::DiskAccess::Read( std::vector< uint8_t >& bufferRead, int32_t totalToRead )
{
bufferRead.resize( totalToRead );
DWORD totalRead;
ReadFile( mhFile, &bufferRead[0], totalToRead, &totalRead, NULL );
bufferRead.resize( totalRead );
return IDiskAccess::READ_OK;
}
(I have removed the error treatment of the ReadFile function to simplify the post).
It is working, but I am affraid that it is not safe. I believe it is ok, as the memory used by the vector is continuous, but I've never seen someone using vectors this way.
Is it correct to use vectors like this? Is there any other better option?
Yes it is safe with std::vector C++ standard guarantees that the elements will be stored at contiguous memory locations.
C++11 Standard:
23.3.6.1 Class templatevector overview [vector.overview]
A vector is a sequence container that supports random access iterators. In addition,itsupports(amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage management is handled automatically, though hints can be given to improve efficiency. The elements of a vector are stored contiguously, meaning that ifv is avector whereT is some type other than bool, then it obeys the identity&v[n] == &v[0] + n for all0 <= n < v.size().
Yes, it is fine to do that. You might want to do myvet.data() instead of &myvet[0] if it looks better to you, but they both have the same effect. Also, if circumstances permit, you can use std::copy instead and have more type-safety and all those other C++ standard library goodies.
The storage that a vector uses is guaranteed to be contiguous, which makes it suitable for use as a buffer or with other functions.
Make sure that you don't modify the vector (such as calling push_back on it, etc) while you are using the pointer you get from data or &v[0] because the vector could resize its buffer on one of those operations and invalidate the pointer.
That approach is correct, it only depends on the vector having contiguous memory which is required by the standard. I believe that in c++11 there is a new data() member function in vectors that returns a pointer to the buffer. Also note that in the case of `memcpy you need to pass the size in bytes not e size of the array
The memory in vector is guaranteed to be allocated contiguously, and unsigned char is POD, therefore it is totally safe to memcpy into it (assuming you don't copy more than you have allocated, of course).
Do your resize first, and it should work fine.
vector<int> v;
v.resize(100);
memcpy(&v[0], someArrayOfSize100, 100 * sizeof(int));
Yes, the solution using memcpy is correct; the buffer held by a vector is contiguous. But it's not quite type-safe, so prefer assign or std::copy.

std::list<char> list_type to (char * data, int lenght)

I have some
std::list<char> list_type
Now I have to supply contents of the list as (char *data, int length). Is there convenient way to present list contents as pointer and length? Does <vector> has such interface?
Thank you in advance.
You can do it with a vector, because its data is stored contiguously:
std::vector<char> vec;
char* data = &vec[0];
int length = static_cast<int>(vec.size());
For list, you have to copy the data to an array. Luckily, that too is fairly easy:
std::list<char> list:
int length = static_cast<int>(list.size());
char* data = new char[length]; // create the output array
std::copy(list.begin(), list.end(), data); // copy the contents of the list to the output array
Of course, you're then left with a dynamically allocated array you have to free again.
You can do this with vector, not with list. A vector is guaranteed to be a contigous chunk of memory so you can say:
char *data = &list_type[0];
std::vector<char>::size_type length = list_type.size();
I don't know about std::list, but std::vector does:
std::vector<char> list_type;
...
foo(&list_type[0], list_type.size())
std::string can do the job too, but you probably already know it.
You cannot do this with a list, as a list saves its data in list nodes. However, you can do this with a vector, which is guaranteed to store its data in a contiguous piece of memory. You can use either &v[0] or &*v.begin() to get a pointer to its first element:
void f(std::list<char>& list)
{
std::vector<char> vec(list.begin(),list.end());
assert(!vec.empty());
c_api_function(&vec[0],vec.size());
// assuming you need the result of the call to replace the list's content
list.assign(vec.begin(),vec.end());
}
Note that the vector will automatically free its memory when the function returns.
There are (at least) two more noteworthy things:
The vector must not be empty. You are not allowed to access v[0] of an empty vector. (Neither are you allowed to dereference v.begin().)
Since dynamic allocation is involved, converting back and forth between std::list and std::vector can be a real performance killer. Consider switching to std::vector altogether.
list is a linked list data structure. There's no way you could do that (theoretically) without conversion.
You'll be able to access (C++0x Draft 23.2.6.3) the backing store of a vector with .data() in C++0x. Currently, your best bet is to treat it as an array by taking the address of the initial element.