Avoiding copy when writing array to vector

Avoiding copy when writing array to vector - c++

But if you want to prevent the 5000 copy (or more) of some potentially ugly heavy objects. Just for performance and ignore readable code you can do this
#include <iostream>
#include <vector>
int main(int argc, char* argv[])
{
std::vector<int> v;
int* a = (int*) malloc(10*sizeof(int));
//int a[10];
for( int i = 0; i<10;++i )
{
*(a + i) = i;
}
delete v._Myfirst; // ? not sure
v._Myfirst = a;
for( int i = 0; i<10;++i )
{
std::cout << v[i] << std::endl;
}
system("PAUSE");
return 0;
}
simply replace the _Myfirst underlying "array". but be very careful, this will be deleted by the vector.
Does my delete/free of my first fail in some cases?
Does this depend on the allocator?
Is there a swap for that? There is the brand new swap for vectors, but how about from an array?
I had to interface with a code I cannot modify (political reasons), which provides me with arrays of objects and have to make them vector of objects.
The pointer seems the right solution, but it's a lot of memory jumps and I have to perform the loop anyway. I was just wondering if there was some way to avoid the copy and tell the vector, "now this is your underlying array".
I agree that the above is highly ugly and this is why I actually reserve and push in my real code. I just wanted to know if there was a way to avoid the loop.
Please don't answer with a loop and a push back, as obviously that's what my "real" code is already doing. If there is no function to perform this swap can we implement a safe one?

Question 1: does my delete/free of my first fail in some cases ?
Yes. You do not know how the memory for _Myfirst is allocated (apart from it uses the allocator). The standard does not specify that that the default allocator should use malloc to allocate the memory so you do not know if delete will work.
Also you are mixing allocation schemes.
You are allocating with malloc(). But expecting the std::vector<> to allocate with new (because you are calling delete).
Also even if the allocator did use new it would be uisng the array version of new and thus you would need to use the array version of delete to reclaim the memory.
Question 2: does this depends on allocator ?
Yes.
Question 3: is there a swap for that ? there is the brand new swap for vectors, but from an array ?
No.
To prevent expensive copies. Use emplace_back().
std::vector<int> v;
v.reserve(10); // reserve space for 10 objects (min).
for( int i = 0; i<10;++i )
{
v.emplace_back(i); // construct in place in the array (no copy).
}

std::vector<SomeUglyHeavyObject*> vect(5000, NULL);
for (int i=0; i<5000; i++)
{
vect[i] = &arr[i];
}
There, avoids copying and doesn't muck with std::vector internals.

If you don't like the way the vector behaves, don't use a vector. Your code depends on implementation details that you MUST not rely upon.
Yes it's ugly as you said. Make it pretty by using the correct container.
Which container you use depends on what operations you need on the container. It may be something as simple as a C-style array will suit your needs. It may be a vector of pointers, or shared pointers if you want the vector to manage the lifetime of the HeavyObjects.
In response to the need to increase the size of collection (see comments below):
std::vector guarantees that the underlying storage is contiguous, so if you increase the size, it will allocate a new larger buffer and copy the old one into the new (using the copy constructor for the objects if one exists.) Then swap the new buffer into place -- freeing the old one using its own allocator.
This will trash your memory unless the buffer you brute forced into the vector was allocated in a manner consistent with the vector's allocator.
So if your original array is big enough, all you need is a "maxUsed" size_t to tell you how big the collection is and where to put new entries. If it's not big enough, then your technique will either fail horribly, or incur the copy costs you are trying to avoid.

Related

how to transfer ownership from a c++ std::vector<char> to a char *pointer

I have met a situation that, i got a std::vector<char> data output from a third-party lib, it is a very big length data, and then I need to convert it a pybind11::array object, I don't want to allocate memory and do memcpy ,that's not efficient at all.
Now I know I can got the std::vector<char> buffer address, but I do not know how to release the vector's ownership so the buffer would not release when the vector object is destructed, I wonder if there's a way to achieve this,.
I have wrote a test code below to test, but it failed
#include<vector>
#include<iostream>
int *got_vec(int len){// the len in actually scene is decided by the thirdparty lib
std::vector<int> vec;
for(int i =0;i<len;i++){
vec.push_back(i);
}
int *p_vec = &vec[0];
std::move(vec);
return p_vec;
}
int main(int argc,const char **argv){
int len=atoi(argv[1]);
std::cout<<"will go allocate memory size:"<<len<<", before allocation"<<std::endl;
int *p_vec = got_vec(len);
std::cout<<"after allocation, go to print value"<<std::endl;
for(int i = 0; i < len;i++)
std::cout<<p_vec[i]<<",";
std::cout<<std::endl;
delete p_vec;
std::cout<<"deleted"<<std::endl;
}
The program crashed at std::cout<<p_vec[i]<<",";

A std::vector does not allow detaching the underlying buffer. You can also have a look at taking over memory from std::vector for a similar question.

std::vector doesn't provide way to transfer its buffer.
You have to do some copy (or not use std::vector for original buffer).

You can of course, because this is what the std::vector class does for move operation. But you cannot in a portable way (*). You must find in the internals of the std::vector class of your implementation, how the buffer is stored (what attribute is used). Ideally you should search how what the move constructor does with the source vector and do the same. It is likely to just set the internal pointer to NULL, but you should control that.
Simply as you would be using unspecified internals, it would only be guaranteed to work on that version of that compiler.
On a philosophical point of view, we as the programmers are expected to use the classes of the standard library as they are. There are few extension points, and few classes allow to be derived. It is just the opposite of Java which provides abstract classes to help the programmer to build its own custom classes.

You can't directly 'steal' memory from an std::vector, but maybe you don't need to steal it in the 1st place.
I'm not familiar with pybind11::array, but since you want the data pointer from std::vector, I'm guessing you can construct it from data that was allocated somewhere else.
Maybe all you need is a wrapper class, that would keep your data inside a std::vector, while providing a view into it through pybind11::array.
class Wrapper {
public:
Wrapper(std::vector<char>);
pybind11::array asArray();
private:
std::vector<char> m_data
}
You can then use std::move to efficiently transfer data between std::vectors, without copying it around.

Use of pointer to vector which involved the use of 'new'

I would like to create a vector of pointers to struct
vector<myStruct*> vec
For elements in the vector, not all of them contain data. Some of them may point to NULL.
So, should I create space by new in each of the element first
for(int i = 0; vec.size() ;i++){
if (thisSpaceIsValid(i))
vec.at(i) = new myStruct;
else
vect.at(i) = NULL;
}
The problem comes:
-If I use new for each element, it would be very slow. How can I speed it up a bit? Is there a way the create all the spaces that I need , that automatically access the pointer of such space to the vector(vec here)?
-If later I use delete to free the memory, would the problem of speed still bother me?

If I use "new" for each element, it would be very slow. How can I speed it up a bit? Is there a way the create all the spaces that I need , that automatically access the pointer of such space to the vector("vec" here)?
You can do that.
Let's say the size of your vector is M and you only need N of those elements to have pointers to objects and other elements are null pointers. You can use:
myStruct* objects = new myStruct[N];
and then, use:
for(int i = 0, j = 0; vec.size(); i++)
{
if (thisSpaceIsValid(i))
{
if ( j == N )
{
// Error. Do something.
}
else
{
vec[i] = objects+j;
++j;
}
}
else
{
vect[i] = NULL;
}
}
You have to now make sure that you are able to keep track of the value of objeccts so you can safely deallocate the memory by using
delete [] objects;
PS
There might be a better and more elegant solution to your problem. It will be worth your while to spend a bit more time thinking over that.

EDIT:
After reading the question again, it seems I misunderstood the question. So here is an edited answer.
If you only need to execute the code during some kind of initialization phase, you can create all the instances of myStruct in an array and then just point to those from the vector as already proposed by R Sahu. Note that the solution requires you to create and delete all instances at the same time.
However, if you execute this code several times and/or don't know exactly how many myStruct instances you will need, you could overwrite new and delete for the struct and handle memory allocation yourself.
See Callling object constructor/destructor with a custom allocator for an example of this. See the answer by Jerry Coffin.
BTW - you don't need vec.at(i) as you are iterating from 0 to size. vec[i] is okay and should perform a better.
OLD ANSWER:
You can do
vector<myStruct*> vec(10000, nullptr);
to generate a vector with for instance 10000 elements all initialized to nullptr
After that you can fill the relevant elements with pointer to the struct.
For delete just
for (auto e : vec) delete e;
cause it is safe to do deleteon a nullptr

If you need a vector of pointers, and would like to avoid calling new, then firstly create a container of structs themselves, then assign pointers to the elements into your vec. Be careful with choosing the container of structs. If you use vector of structs, make sure to reserve all elements in advance, otherwise its elements may move to a different memory location when vector grows. Deque on the other hand guarantees its elements don't move.
Multiple small new and delete calls should be avoided if possible in c++ when performance matters a lot.

The more I think about it, the less I like #RSahu's solution. In particular, I feel memory management in this scenario would be a nightmare. Instead I suggest using a vector of unique_ptr's owning memory allocated via custom alloctor. I believe, sequential allocator would do.

Pointer to an array get size C++

int * a;
a = new int[10];
cout << sizeof(a)/sizeof(int);
if i would use a normal array the answer would be 10,
alas, the lucky number printed was 1, because sizeof(int) is 4 and iszeof(*int) is 4 too. How do i owercome this? In my case keeping size in memory is a complicated option. How do i get size using code?
My best guess would be to iterate through an array and search for it's end, and the end is 0, right? Any suggestions?
--edit
well, what i fear about vectors is that it will reallocate while pushing back, well you got the point, i can jus allocate the memory. Hoever i cant change the stucture, the whole code is releevant. Thanks for the answers, i see there's no way around, so ill just look for a way to store the size in memory.
what i asked whas not what kind of structure to use.

Simple.
Use std::vector<int> Or std::array<int, N> (where N is a compile-time constant).
If you know the size of your array at compile time, and it doens't need to grow at runtime, then use std::array. Else use std::vector.
These are called sequence-container classes which define a member function called size() which returns the number of elements in the container. You can use that whenever you need to know the size. :-)
Read the documentation:
std::array with example
std::vector with example
When you use std::vector, you should consider using reserve() if you've some vague idea of the number of elements the container is going to hold. That will give you performance benefit.
If you worry about performance of std::vector vs raw-arrays, then read the accepted answer here:
Is std::vector so much slower than plain arrays?
It explains why the code in the question is slow, which has nothing to do with std::vector itself, rather its incorrect usage.
If you cannot use either of them, and are forced to use int*, then I would suggest these two alternatives. Choose whatever suits your need.
struct array
{
int *elements; //elements
size_t size; //number of elements
};
That is self-explanatory.
The second one is this: allocate memory for one more element and store the size in the first element as:
int N = howManyElements();
int *array = int new[N+1]; //allocate memory for size storage also!
array[0] = N; //store N in the first element!
//your code : iterate i=1 to i<=N
//must delete it once done
delete []array;

sizeof(a) is going to be the size of the pointer, not the size of the allocated array.
There is no way to get the size of the array after you've allocated it. The sizeof operator has to be able to be evaluated at compile time.
How would the compiler know how big the array was in this function?
void foo(int size)
{
int * a;
a = new int[size];
cout << sizeof(a)/sizeof(int);
delete[] a;
}
It couldn't. So it's not possible for the sizeof operator to return the size of an allocated array. And, in fact, there is no reliable way to get the size of an array you've allocated with new. Let me repeat this there is no reliable way to get the size of an array you've allocated with new. You have to store the size someplace.
Luckily, this problem has already been solved for you, and it's guaranteed to be there in any implementation of C++. If you want a nice array that stores the size along with the array, use ::std::vector. Particularly if you're using new to allocate your array.
#include <vector>
void foo(int size)
{
::std::vector<int> a(size);
cout << a.size();
}
There you go. Notice how you no longer have to remember to delete it. As a further note, using ::std::vector in this way has no performance penalty over using new in the way you were using it.

If you are unable to use std::vector and std::array as you have stated, than your only remaning option is to keep track of the size of the array yourself.
I still suspect that your reasons for avoiding std::vector are misguided. Even for performance monitoring software, intelligent uses of vector are reasonable. If you are concerned about resizing you can preallocate the vector to be reasonably large.

C++ Allocate Memory Without Activating Constructors

I'm reading in values from a file which I will store in memory as I read them in. I've read on here that the correct way to handle memory location in C++ is to always use new/delete, but if I do:
DataType* foo = new DataType[sizeof(DataType) * numDataTypes];
Then that's going to call the default constructor for each instance created, and I don't want that. I was going to do this:
DataType* foo;
char* tempBuffer=new char[sizeof(DataType) * numDataTypes];
foo=(DataType*) tempBuffer;
But I figured that would be something poo-poo'd for some kind of type-unsafeness. So what should I do?
And in researching for this question now I've seen that some people are saying arrays are bad and vectors are good. I was trying to use arrays more because I thought I was being a bad boy by filling my programs with (what I thought were) slower vectors. What should I be using???

Use vectors!!! Since you know the number of elements, make sure that you reserve the memory first (by calling myVector.reserve(numObjects) before you then insert the elements.).
By doing this, you will not call the default constructors of your class.
So use
std::vector<DataType> myVector; // does not reserve anything
...
myVector.reserve(numObjects); // tells vector to reserve memory

You can use ::operator new to allocate an arbitrarily sized hunk of memory.
DataType* foo = static_cast<DataType*>(::operator new(sizeof(DataType) * numDataTypes));
The main advantage of using ::operator new over malloc here is that it throws on failure and will integrate with any new_handlers etc. You'll need to clean up the memory with ::operator delete
::operator delete(foo);
Regular new Something will of course invoke the constructor, that's the point of new after all.
It is one thing to avoid extra constructions (e.g. default constructor) or to defer them for performance reasons, it is another to skip any constructor altogether. I get the impression you have code like
DataType dt;
read(fd, &dt, sizeof(dt));
If you're doing that, you're already throwing type safety out the window anyway.
Why are you trying to accomplish by not invoking the constructor?

You can allocate memory with new char[], call the constructor you want for each element in the array, and then everything will be type-safe. Read What are uses of the C++ construct "placement new"?
That's how std::vector works underneath, since it allocates a little extra memory for efficiency, but doesn't construct any objects in the extra memory until they're actually needed.

You should be using a vector. It will allow you to construct its contents one-by-one (via push_back or the like), which sounds like what you're wanting to do.

I think you shouldn't care about efficiency using vector if you will not insert new elements anywhere but at the end of the vector (since elements of vector are stored in a contiguous memory block).

vector<DataType> dataTypeVec(numDataTypes);
And as you've been told, your first line there contains a bug (no need to multiply by sizeof).

Building on what others have said, if you ran this program while piping in a text file of integers that would fill the data field of the below class, like:
./allocate < ints.txt
Then you can do:
#include <vector>
#include <iostream>
using namespace std;
class MyDataType {
public:
int dataField;
};
int main() {
const int TO_RESERVE = 10;
vector<MyDataType> everything;
everything.reserve( TO_RESERVE );
MyDataType temp;
while( cin >> temp.dataField ) {
everything.push_back( temp );
}
for( unsigned i = 0; i < everything.size(); i++ ) {
cout << everything[i].dataField;
if( i < everything.size() - 1 ) {
cout << ", ";
}
}
}
Which, for me with a list of 4 integers, gives:
5, 6, 2, 6

How to demonstrate memory error using arrays in C++

I'm trying to think of a method demonstrating a kind of memory error using Arrays and C++, that is hard to detect. The purpose is to motivate the usage of STL vector<> in combination with iterators.
Edit: The accepted answer is the answer i used to explain the advantages / disadvantages. I also used: this

Improperly pairing new/delete and new[]/delete[].
For example, using:
int *array = new int[5];
delete array;
instead of:
int *array = new int[5];
delete [] array;
And while the c++ standard doesn't allow for it, some compilers support stack allocating an array:
int stack_allocated_buffer[size_at_runtime];
This could be the unintended side effect of scoping rules (e.g constant shadowed by a member variable)... and it works until someone passes 'size_at_runtime' too large and blows out the stack. Then lame errors ensue.

A memory leak? IMO, vector in combination with iterators doesn't particularly protect you from errors, such as going out of bounds or generally using an invalidated iterator (unless you have VC++ with iterator debugging); rather it is convenient because it implements a dynamically resizable array for you and takes care of memory management (NB! helps make your code more exception-safe).
void foo(const char* zzz)
{
int* arr = new int[size];
std::string s = zzz;
//...
delete[] arr;
}
Above can leak if an exception occurs (e.g when creating the string). Not with a vector.
Vector also makes it easier to reason about code because of its value semantics.
int* arr = new int[size];
int* second_ref = arr;
//...
delete [] arr;
arr = 0; //play it safe :)
//...
second_ref[x] = y;
//...
delete [] second_ref;
But perhaps a vector doesn't automatically satisfy 100% of dynamic array use cases. (For example, there's also boost::shared_array and the to-be std::unique_ptr<T[]>)

I think the utility of std::vector really shows when you need dynamic arrays.
Make one example using std::vector. Then one example using an array to realloc. I think it speaks for itself.

One obvious:
for (i = 0; i < NUMBER_OF_ELEMENTS; ++i)
destination_array[i] = whatever(i);
versus
for (i = 0; i < NUMBER_OF_ELEMENTS; ++i)
destination_vector.push_back(whatever(i));
pointing out that you know the second works, but whether the first works depends on how destination_array was defined.

void Fn()
{
int *p = new int[256];
if ( p != NULL )
{
if ( !InitIntArray( p, 256 ) )
{
// Log error
return;
}
delete[] p;
}
}
You wouldn't BELIEVE how often I see that. A classic example of where any form of RAII is useful ...

I would think the basic simplicity of using vectors instead of dynamic arrays is already convincing.
You don't have to remember to delete your memory...which is not so simple since attempts to delete it might be bypassed by exceptions and whatnot.
If you want to do dynamic arrays yourself, the safest way to do it in C++ is to wrap them in a class and use RAII. But vectors do that for you. That's kind of the point, actually.
Resizing is done for you.
If you need to support arbitrary types, you don't have to do any extra work.
A lot of algorithms are provided which are designed to handle containers, both included and by other users.
You can still use functions that need arrays by passing the vector's underlying array if necessary; The memory is guaranteed to be contiguous by the standard, except with vector<bool> (google says as of 2003, see 23.2.4./1 of the spec).
Using an array yourself is probably bad practice in general, since you will be re-inventing the wheel...and your implementation will almost definitely be much worse than the existing one...and harder for other people to use, since they know about vector but not your weird thing.
With a dynamic array, you need to keep track of the size yourself, grow it when you want to insert new elements, delete it when it is no longer needed...this is extra work.
Oh, and a warning: vector<bool> is a dirty rotten hack and a classic example of premature optimization.

Why don't you motivate it based on the algorithms that the STL provides?

In raw array, operator[] (if I may call so) is susceptible to index-out-of-bound problem. With vector it is not (There is at least a run time exception).
Sorry, I did not read the question carefully enough. index-out-of-bound is a problem, but not a memory error.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Avoiding copy when writing array to vector - c++

std::vector<SomeUglyHeavyObject*> vect(5000, NULL); for (int i=0; i<5000; i++) { vect[i] = &arr[i]; } There, avoids copying and doesn't muck with std::vector internals.

Related

how to transfer ownership from a c++ std::vector<char> to a char *pointer

Use of pointer to vector which involved the use of 'new'

Pointer to an array get size C++

C++ Allocate Memory Without Activating Constructors

How to demonstrate memory error using arrays in C++

Categories

Resources