vector and dumping - c++

From what i know a vector is guaranteed to be continuous and i can write a chunk of memory to it and do send of fwrite with it. All i need to do is make sure i call .resize() to force it to be the min length i need then i can use it as a normal char array? would this code be correct
v.resize(numOfElements);
v.clear(); //so i wont get numOfElements + len when i push back
vector<char>v2;
v2.resize(numOfElements*SizeOfType);
while(...)
{
...
v.push_bacK(x);
}
compress(&v2[0], len, &v[0], len);
fwrite(&v2[0], ....)
noting that i never push back or pop v2 i only resize it once and used it as a char array. Would this be safe? and if i also dumped v that would also be safe(i do push back and clear, i may dump it for testing)

v.resize(numOfElements);
v.clear(); //so i wont get numOfElements + len when i push back
Well, that above code snippet is in effect allocating and creating elements, just to destroy them again. It's in effect the same as:
v.reserve(numOfElements);
Just that this code is way faster. So, v.size() == 0 in both cases and v.capacity() might be the same as numOfElements in both cases too (although this is not guaranteed). In the second case, however, the capacity is at least numOfElements, which means the internal buffer will not be reallocated until you have push_back'ed that many elements to your vector. Note that in both cases it is invalid if you try accessing any elements - because there are zero elements actually contained.
Apart from that, i haven't figured a problem in your code. It's safe and i would encourage it so use it instead of a raw new or malloc because of the added safeties it provides. I'm however not sure what you mean by "dump v".

Indeed, std::vector is guaranteed to be contiguous, in order to be layout-compatible with a C array. However, you must be aware that many operations of the vector invalidate all pointers pointing to its elements, so you'd better stick to one type of use: avoid mixing pointer arithmetic and method calls on the vector.
Apart from that is perfectly correct, except the first line : what you want is
v.reserve(numOfElements);
which will allocate enough place to store numOfElements into the vector, whereas
v.resize(numOfElements);
will do the following:
// pseudo-code
if (v.size() < numOfElements)
insert (numOfElements - size) elements default
constructed at the end of the vector
if (v.size() > numOfElements)
erase the last elements so that size = numOfElements
To sum up, after a reserve you are sure that vector capacity is superior or equal to numOfElements, and after a resize you are sure that vector size is equal to numOfElements.

For something like this I would personally use a class like STLSoft's auto_buffer<>:
http://www.synesis.com.au/software/stlsoft/doc-1.9/classstlsoft_1_1auto__buffer.html
As a disclaimer - I don't use the actual STLSoft library version, I've adapted my own template that is quite similar - I started from the Matthew Wilson's (the STLSoft author's) book "Imperfect C++".
I find it useful when I really just want a plain-old C array, but the size must be dynamic at runtime. auto_buffer<> is safer than a plain old array, but once you've constructed it you have no worries about how many elements are there or not - it's always whatever you constructed it with, just like an array (so it's a bit less complex than vector<> - which is appropriate at times).
The major downside to auto_buffer<> is that it's not standard and it's not in Boost, so you either have to incorporate some of STLSoft into your project or roll your own version.

Yes you use a vector of char as a buffer for reading raw input.
// dynamically allocates a buffer of 10,000 char as buffer
std::vector<char> data(10000);
fread(&data[0], sizeof(char),data.size(),fp);
I would not use it for reading any non POD data type directly into an vector though.
You could potentially use a vector as a source for a write.
But I would be very carefull how you read that back in (it may be easier to serialize it).
fwrite(&data[0], sizeof(char),data.size(),fp);

You're replacing reserve() with resize, you may as well replace
vector<char> v2
with
vector<Type> v2
This should simplify the code a tiny bit.
To be frank, it's the oddest use of vectors I've ever seen, but it probably will work. Are you sure you don't want to go with new char[size] and some sort of auto pointer from boost?

Related

How to append raw bytes to std::vector?

I want to append the raw bytes into vector like this.
vector.reserve(current_size + append_data_size);
memcpy(append_data, vector.data() + current_size, append_data_size);
vector.resize(current_size + append_data_size) // Expect only set size to current_size + append_data_size.
does below is slower? because I think vector is initialised to default first then set the data which is waste.
vector.resize(current_size + append_data_size);
memcpy(append_data, vector.data() + current_size, append_data_size);
Modifying vector storage beyond its size is undefined behavior, and a subsequent resize will initialize the new elements at the end of the storage.
However, you could use insert instead:
vector.insert(vector.end(), bytes, bytes + size);
Even if you call reserve, you still must call resize on the vector if you want to access the new elements, otherwise the behaviour of your code is undefined. What reserve can do is make push_back and other such operations more efficient.
Personally I wouldn't concern yourself with any such optimisations unless you can prove they have an effect with an appropriate profiling tool. More often than not, fiddling with the capacity of a std::vector is pointless.
Also using memcpy is hazardous. (Copy constructors will not be called for example, knowledge of the exact behaviour of copying padding in structures with memcpy for example is a sure way of increasing your reputation on this site!) Use insert instead and trust the compiler to optimise as appropriate.
Without an explicit additional parameter, std::vector::resize value-initialises any additional members. Informally that means the elements of a std::vector of T say are set to values in the same way as the t in static T t; would be.

Faster alternative to push_back(size is known)

I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.
For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.
Is there a faster alternative, like a different data structure or something?
If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.
Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).
The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.
Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.
You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.
An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.
::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.
You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:
const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
vec[i] = the_value_for(i);
This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.
I have two answers for you:
As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []
So the most efficient way I would recommend is:
Initialize the vector with the 'fill'-constructor:
std::vector<float> values(172490752, 0.0f);
Set the entries directly using the access operator:
values[i] = some_float;
++i;
The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.
You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.
If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with
size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};
You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.

Dynamic and static array

I am studying C++ reading Stroustrup's book that in my opinion is not very clear in this topic (arrays). From what I have understood C++ has (like Delphi) two kind of arrays:
Static arrays that are declared like
int test[3] = {10,487,-22};
Dynamic arrays that are called vectors
std::vector<int> a;
a.push_back(10);
a.push_back(487);
a.push_back(-22);
I have already seen answers about this (and there were tons of lines and concepts inside) but they didn't clarify me the concept.
From what I have understood vectors consume more memory but they can change their size (dynamically, in fact). Arrays instead have a fixed size that is given at compile time.
In the chapter Stroustrup said that vectors are safe while arrays aren't, whithout explaining the reason. I trust him indeed, but why? Is the reason safety related to the location of the memory? (heap/stack)
I would like to know why I am using vectors if they are safe.
The reason arrays are unsafe is because of memory leaks.
If you declare a dynamic array
int * arr = new int[size]
and you don't do delete [] arr, then the memory remains uncleared and this is known as a memory leak. It should be noted, ANY time you use the word new in C++, there must be a delete somewhere in there to free that memory. If you use malloc(), then free() should be used.
http://ptolemy.eecs.berkeley.edu/ptolemyclassic/almagest/docs/prog/html/ptlang.doc7.html
It is also very easy to go out of bounds in an array, for example inserting a value in an index larger than its size -1. With a vector, you can push_back() as many elements as you want and the vector will resize automatically. If you have an array of size 15 and you try to say arr[18] = x,
Then you will get a segmentation fault. The program will compile, but will crash when it reaches a statement that puts it out of the array bounds.
In general when you have large code, arrays are used infrequently. Vectors are objectively superior in almost every way, and so using arrays becomes sort of pointless.
EDIT: As Paul McKenzie pointed out in the comments, going out of array bounds does not guarantee a segmentation fault, but rather is undefined behavior and is up to the compiler to determine what happens
Let us take the case of reading numbers from a file.
We don't know how many numbers are in the file.
To declare an array to hold the numbers, we need to know the capacity or quantity, which is unknown. We could pick a number like 64. If the file has more than 64 numbers, we start overwriting the array. If the file has fewer than 64 (like 16), we are wasting memory (by not using 48 slots). What we need is to dynamically adjust the size of the container (array).
To dynamically adjust the capacity of an array, a new larger array must be created, then elements copied and the old array deleted.
The std::vector will adjust its capacity as necessary. It handles the dynamic allocation of memory for you.
Another aspect is the passing of the container to a function. With an array, you need to pass the array and the capacity. With std::vector, you only need to pass the vector. The vector object can be queried about its capacity.
One Security I can see is that you can't access something in vector which is not there.
What I meant by that is , if you push_back only 4 elements and you try to access index 7 , then it will throw back an error. But in array that doesn't happen.
In short, it stops you from accessing corrupt data.
edit :
programmer has to compare the index with vector.size() to throw an error. and it doesn't happne automatically. One has to do it by himself/herself.

Dynamic memory allocation, C++

I need to write a function that can read a file, and add all of the unique words to a dynamically allocated array. I know how to create a dynamically allocated array if, for instance, you are asking for the number of entries in the array:
int value;
cin >> value;
int *number;
number = new int[value];
My problem is that I don't know ahead of time how many unique words are going to be in the file, so I can't initially just read the value or ask for it. Also, I need to make this work with arrays, and not vectors. Is there a way to do something similar to a push_back using a dynamically allocated array?
Right now, the only thing I can come up with is first to create an array that stores ALL of the words in the file (1000), then have it pass through it and find the number of unique words. Then use that value to create a dynamically allocated array which I would then pass through again to store all the unique words. Obviously, that solution sounds pretty overboard for something that should have a more effective solution.
Can someone point me in the right direction, as to whether or not there is a better way? I feel like this would be rather easy to do with vectors, so I think it's kind of silly to require it to be an array (unless there's some important thing that I need to learn about dynamically allocated arrays in this homework assignment).
EDIT: Here's another question. I know there are going to be 1000 words in the file, but I don't know how many unique words there will be. Here's an idea. I could create a 1000 element array, write all of the unique words into that array while keeping track of how many I've done. Once I've finished, I could provision a dynamically allocate a new array with that count, and then just copy the words from the initial array to the second. Not sure if that's the most efficient, but with us not being able to use vectors, I don't think efficiency is a huge concern in this assignment.
A vector really is a better fit for this than an array. Really.
But if you must use an array, you can at least make it behave like a vector :-).
Here's how: allocate the array with some capacity. Store the allocated capacity in a "capacity" variable. Each time you add to the array, increment a separate "length" variable. When you go to add something to the array and discover it's not big enough (length == capacity), allocate a second, longer array, then copy the original's contents to the new one, then finally deallocate the original.
This gives you the effect of being able to grow the array. If performance becomes a concern, grow it by more than one element at a time.
Congrats, after following these easy steps you have implemented a small subset of std::vector functionality atop an array!
As you have rightly pointed out this is trivial with a Vector.
However, given that you are limited to using an array, you will likely need to do one of the following:
Initialize the array with a suitably large size and live with poor memory utilization
Write your own code to dynamically increase the size of the array at run time (basically the internals of a Vector)
If you were permitted to do so, some sort of hash map or linked list would also be a good solution.
If I had to use an array, I'd just allocate one with some initial size, then keep doubling that size when I fill it to accommodate any new values that won't fit in an array with the previous sizes.
Since this question regards C++, memory allocation would be done with the new keyword. But what would be nice is if one could use the realloc() function, which resizes the memory and retains the values in the previously allocated memory. That way one wouldn't need to copy the new values from the old array to the new array. Although I'm not so sure realloc() would play well with memory allocated with new.
You can "resize" array like this (N is size of currentArray, T is type of its elements):
// create new array
T *newArray = new T[N * 2];
// Copy the data
for ( int i = 0; i < N; i++ )
newArray[i] = currentArray[i];
// Change the size to match
N *= 2;
// Destroy the old array
delete [] currentArray;
// set currentArray to newArray
currentArray = newArray;
Using this solution you have to copy the data. There might be a solution that does not require it.
But I think it would be more convenient for you to use std::vectors. You can just push_back into them and they will resize automatically for you.
You can cheat a bit:
use std::set to get all the unique words then copy the set into a dynamically allocated array (or preferably vector).
#include <iterator>
#include <set>
#include <iostream>
#include <string>
// Copy into a set
// this will make sure they are all unique
std::set<std::string> data;
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(data, data.end()));
// Copy the data into your array (or vector).
std::string* words = new std::string[data.size()];
std::copy(data.begin(), data.end(), &words[0]);
This could be going a bit overboard, but you could implement a linked list in C++... it would actually allow you to use a vector-like implementation without actually using vectors (which are actually the best solution).
The implementation is fairly easy: just a pointer to the next and previous nodes and storing the "head" node in a place you can easily access to. Then just looping through the list would let you check which words are already in, and which are not. You could even implement a counter, and count the number of times a word is repeated throughout the text.

C++: is it safe to work with std::vectors as if they were arrays?

I need to have a fixed-size array of elements and to call on them functions that require to know about how they're placed in memory, in particular:
functions like glVertexPointer, that needs to know where the vertices are, how distant they are one from the other and so on. In my case vertices would be members of the elements to store.
to get the index of an element within this array, I'd prefer to avoid having an index field within my elements, but would rather play with pointers arithmetic (ie: index of Element *x will be x - & array[0]) -- btw, this sounds dirty to me: is it good practice or should I do something else?
Is it safe to use std::vector for this?
Something makes me think that an std::array would be more appropriate but:
Constructor and destructor for my structure will be rarely called: I don't mind about such overhead.
I'm going to set the std::vector capacity to size I need (the size that would use for an std::array, thus won't take any overhead due to sporadic reallocation.
I don't mind a little space overhead for std::vector's internal structure.
I could use the ability to resize the vector (or better: to have a size chosen during setup), and I think there's no way to do this with std::array, since its size is a template parameter (that's too bad: I could do that even with an old C-like array, just dynamically allocating it on the heap).
If std::vector is fine for my purpose I'd like to know into details if it will have some runtime overhead with respect to std::array (or to a plain C array):
I know that it'll call the default constructor for any element once I increase its size (but I guess this won't cost anything if my data has got an empty default constructor?), same for destructor. Anything else?
Vectors are guaranteed to have all elements in contigous memory, so it is safe to use in your scenario. There can be a small performance hit compared to c-style arrays, for instance due to index validations done by the vector implementation. In most cases, the performance is determined by something else though, so I wouldn't worry about that until actual measurements of performance show that this a real problem.
As pointed out by others, make sure that you don't reallocate the vector while you are making use of the data stored in it if you are using pointers to elements (or iterators) to access it.
It's fine to treat the data in a std::vector as an array, get a pointer to the start of it with &v[0]. Obviously if you do anything that can reallocate the data then then you pointers will probably be invalidated.
Yep, You can use it as Array in OpenGL :) Example:
glBufferData( GL_ARRAY_BUFFER_ARB, dataVec.size() * sizeof( dataVec[0] ), &dataVec[0], GL_STATIC_DRAW_ARB );
Where dataVec is std::Vector
It is even safer than having an array on the stack: how big really is your stack? how big could your array become (fixed size, but the size could be increased in later versions)?
If you really want a std::array you can use boost::array. It is like a common array, but support iterators and you can easily use it with STL algorithms.
Working in multithreading environment and dynamic memory allocation might cause problem because vector is usually a continuous chunk of memory and of pointers might not!