Resizing a C++ std::vector<char> without initializing data [duplicate]

Resizing a C++ std::vector<char> without initializing data [duplicate] - c++

This question already has answers here:
Using vector<char> as a buffer without initializing it on resize()
(6 answers)
Closed 6 years ago.
With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array. E.g.,
vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);
But now the vector doesn't know that it contains M bytes of data, added externally by read(). I know that vector::resize() sets the size, but it also clears the data, so it can't be used to update the size after the read() call.
Is there a trivial way to read data directly into vectors and update the size after? Yes, I know of the obvious workarounds like using a small array as a temporary read buffer, and using vector::insert() to append that to the end of the vector:
char tmp[N];
int M = read(fd, tmp, N);
buf.insert(buf.end(), tmp, tmp + M)
This works (and it's what I'm doing today), but it just bothers me that there is an extra copy operation there that would not be required if I could put the data directly into the vector.
So, is there a simple way to modify the vector size when data has been added externally?

vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);
This code fragment invokes undefined behavior. You can't write beyond than size() elements, even if you have reserved the space.
The correct code is like:
vector<char> buf;
buf.resize(N);
int M = read(fd, &buf[0], N);
buf.resize(M);
PS. Your statement "With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array" isn't true. The allowable range is [&vec[0], &vec[vec.size()).

It looks like you can do what you want in C++11 (though I haven't tried this myself). You'll have to define a custom allocator for the vector, then use emplace_back().
First, define
struct do_not_initialize_tag {};
Then define your allocator with this member function:
class my_allocator {
void construct(char* c, do_not_initialize_tag) const {
// do nothing
}
// details omitted
// ...
}
Now you can add elements to your array without initializing them:
std::vector<char, my_allocator> buf;
buf.reserve(N);
for (int i = 0; i != N; ++i)
buf.emplace_back(do_not_initialize_tag());
int M = read(fd, buf.data(), N);
buf.resize(M);
The efficiency of this depends on the compiler's optimizer. For instance, the loop may increment the size member variable N times.

Another, newer, question, a duplicate of this one, has an answer, which looks like exactly what is asked here. Here's its copy (of v3) for quick reference:
It is a known issue that initialization can not be turned off even
explicitly for std::vector.
People normally implement their own pod_vector<> that does not do
any initialization of the elements.
Another way is to create a type which is layout-compatible with char,
whose constructor does nothing:
struct NoInitChar
{
char value;
NoInitChar() {
// do nothing
static_assert(sizeof *this == sizeof value, "invalid size");
static_assert(__alignof *this == __alignof value, "invalid alignment");
}
};
int main() {
std::vector<NoInitChar> v;
v.resize(10); // calls NoInitChar() which does not initialize
// Look ma, no reinterpret_cast<>!
char* beg = &v.front().value;
char* end = beg + v.size();
}

Writing into and after the size()th element is an undefined behavior.
Next example copies whole file into a vector in a c++ way (no need to know the file's size and no need to preallocate the memory in the vector):
#include <algorithm>
#include <fstream>
#include <iterator>
#include <vector>
int main()
{
typedef std::istream_iterator<char> istream_iterator;
std::ifstream file("example.txt");
std::vector<char> input;
file >> std::noskipws;
std::copy( istream_iterator(file),
istream_iterator(),
std::back_inserter(input));
}

Your program fragment has entered the realm of undefined behavior.
when buf.empty() is true, buf[0] has undefined behavior, and therefore &buf[0] is also undefined.
This fragment probably does what you want.
vector<char> buf;
buf.resize(N); // preallocate space
int M = read(fd, &buf[0], N);
buf.resize(M); // disallow access to the remainder

Related

high performance 'proper' c++ alternative to variable length array

I am writing a function which requires an array to be created at runtime. The array will be of small size so I am not worried about unsafe code, however, I want to write 'proper' code. As such I am considering three alternatives:
char array[len];
char array = new char(len);
std::vector array(len);
Using Compiler Explorer to compare them with -O3.
The results were as such:
12 instructions, 0 calls to new
21 instructions, 1 call to new
118 instructions, 2+ calls to new
Am I missing an optimisation for std::vector<> or is the 'proper' c++ way slower or have I entirely missed a way of coding this?
edit: I forgot to delete the heap-allocated array
Test code:
code 1:
#include <string.h>
void populate_array(char* arr);
int compute_result(char* arr);
int str_to_arr(const char* str)
{
auto len = strlen(str);
char array[len];
populate_array(array);
return compute_result(array);
}
code 2:
#include <string.h>
void populate_array(char* arr);
int compute_result(char* arr);
int str_to_arr(const char* str)
{
auto len = strlen(str);
char* array = new char[len];
populate_array(array);
auto result = compute_result(array);
delete[] array;
return result;
}
code 3:
#include <string.h>
#include <vector>
void populate_array(std::vector<char> arr);
int compute_result(std::vector<char> arr);
int str_to_arr(const char* str)
{
auto len = strlen(str);
std::vector<char> array(len);
populate_array(array);
return compute_result(array);
}

There are a few issues in the code, that may be leading you astray in the comparison.
new char(len) allocates a single char, initialized with the value len. You'd be after new char[len] to allocate len chars. There should be a matching delete [], too.
The std::vector<char> object is passed to populate_array by value, making a copy (and consequently not actually populating the array you want), and similarly for compute_result. These copies will engender new allocations. Passing by reference would be appropriate here.
Without using a custom allocator, std::vector will value-initialize all its elements. Effectively, it means that every element in this vector is set to zero. This is not performed by new char[len].
VLAs are not part of C++, but may be provided as an extension. While in this instance, for small len, the compiler has the option of allocating the space for the array on the stack, they are probably best avoided because of their non-standard nature; even in C, they are not required to be supported.

C++ behavior that I don't understand

My friends and I were playing with the C++ language. While doing so, we encountered something we couldn't understand.
Here is the code:
#include <vector>
#include <iostream>
void print(std::vector<char> const &input)
{
std::cout << input.size();
for (int i = 0; i < input.size(); i++)
{
std::cout << input.at(i) << " - ";
}
}
int main()
{
char cha = 'A';
char chb = 'B';
char * pcha = &cha;
char * pchb = &chb;
try
{
std::vector<char> a = {pcha, pchb};
//std::vector<char> a = {pchb, pcha};
print(a);
}
catch(std::exception e)
{
std::cout << e.what();
}
}
Output for this code:
A
When I comment out this first line try block and uncomment the second line, which comes to this:
try
{
// std::vector<char> a = {pcha, pchb};
std::vector<char> a = {pchb, pcha};
print(a);
}
Output becomes:
std:exception
I thought maybe the this occurs because of the different padding and alignments of the declared variables (char, char*), yet still didn't understand. You can find the code here to play around.
Thanks in advance.

std::vector<char> a = {pcha, pchb};
Here, you use the constructor of vector that accepts two iterators to a range. Unless the end iterator is reachable from the begin one, the behaviour of the program is undefined. Your two pointers are not iterators to the same range (i.e. elements of an array), so one is not reachable from the other. Therefore the behaviour of the program is undefined.
These would be correct:
std::vector<char> a = {cha, chb}; // uses initializer_list constructor
// or
char arr[] {cha, chb};
char * pcha = std::begin(arr);
char * pchb = std::end(arr);
std::vector<char> a = {pcha, pchb}; // uses the iterator constructor

#eerorika's answer explains your mistake.
However, I would like to dissuade you, and other readers, from using the second part of the his(?) corrected code snippet - not because it's incorrect, but because it's problematic coding practice:
I accept Nicolai Jossutis' suggestion of trying to uniformly initialize variables with curly brackets and no equals since (e.g.. mytype myvar {my_initializer};).
Freestanding pointers are dangerous beasts. Try to avoid them altogether, or minimize their existence to where you really need them. After all, you were "tempted" to use those pointers in an inappropriate way... so,
char arr[] {cha, chb};
std::vector<char> a = {std::begin(arr), std::end(arr)};
Don't create a dummy container just to create the one you really want. Just stick with the first line in #eerorika's suggestion (without the equals sign):
std::vector<char> a {cha, chb};
In fact, unless you really need it - you probably don't even want to create a variable-length container. So perhaps just
std::array<char, 2> a {cha, chb};
or with C++17's template argument deduction:
std::array a {cha, chb};

Reallocate array with memcpy and memset

I've taken over some code, and came across a weird reallocation of an array. This is a function from within an Array class (used by the JsonValue)
void reserve( uint32_t newCapacity ) {
if ( newCapacity > length + additionalCapacity ) {
newCapacity = std::min( newCapacity, length + std::numeric_limits<decltype( additionalCapacity )>::max() );
JsonValue *newPtr = new JsonValue[newCapacity];
if ( length > 0 ) {
memcpy( newPtr, values, length * sizeof( JsonValue ) );
memset( values, 0, length * sizeof( JsonValue ) );
}
delete[] values;
values = newPtr;
additionalCapacity = uint16_t( newCapacity - length );
}
}
I get the point of this; it is just allocating a new array, and doing a copy of the memory contents from the old array into the new array, then zero-ing out the old array's contents. I also know this was done in order to prevent calling destructors, and moves.
The JsonValue is a class with functions, and some data which is stored in a union (string, array, number, etc.).
My concern is whether this is actually defined behaviour or not. I know it works, and has not had a problem since we began using it a few months ago; but if its undefined then it doesn't mean it is going to keep working.
EDIT:
JsonValue looks something like this:
struct JsonValue {
// …
~JsonValue() {
switch ( details.type ) {
case Type::Array:
case Type::Object:
array.destroy();
break;
case Type::String:
delete[] string.buffer;
break;
default: break;
}
}
private:
struct Details {
Key key = Key::Unknown;
Type type = Type::Null; // (0)
};
union {
Array array;
String string;
EmbedString embedString;
Number number;
Details details;
};
};
Where Array is a wrapper around an array of JsonValues, String is a char*, EmbedString is char[14], Number is a union of int, unsigned int, and double, Details contains the type of value it holds. All values have 16-bits of unused data at the beginning, which is used for Details. Example:
struct EmbedString {
uint16_t : 16;
char buffer[14] = { 0 };
};

Whether this code has well-defined behavior basically depends on two things: 1) is JsonValue trivially-copyable and, 2) if so, are a bunch of all-zero Bytes a valid object representation for a JsonValue.
If JsonValue is trivially-copyable, then the memcpy from one array of JsonValues to another will indeed be equivalent to copying all the elements over [basic.types]/3. If all-zeroes is a valid object representation for a JsonValue, then the memset should be ok (I believe this actually falls into a bit of a grey-area with the current wording of the standard, but I believe at least the intention would be that this is fine).
I'm not sure why you'd need to "prevent calling destructors and moves", but overwriting objects with zeroes does not prevent destructors from running. delete[] values will call the destructurs of the array members. And moving the elements of an array of trivially-copyable type should compile down to just copying over the bytes anyways.
Furthermore, I would suggest to get rid of these String and EmbedString classes and simply use std::string. At least, it would seem to me that the sole purpose of EmbedString is to manually perform small string optimization. Any std::string implementation worth its salt is already going to do exactly that under the hood. Note that std::string is not guaranteed (and will often not be) trivially-copyable. Thus, you cannot simply replace String and EmbedString with std::string while keeping the rest of this current implementation.
If you can use C++17, I would suggest to simply use std::variant instead of or at least inside this custom JsonValue implementation as that seems to be exactly what it's trying to do. If you need some common information stored in front of whatever the variant value may be, just have a suitable member holding that information in front of the member that holds the variant value rather than relying on every member of the union starting with the same couple of members (which would only be well-defined if all union members are standard-layout types that keep this information in their common initial sequence [class.mem]/23).
The sole purpose of Array would seem to be to serve as a vector that zeroes memory before deallocating it for security reasons. If this is the case, I would suggest to just use an std::vector with an allocator that zeros memory before deallocating instead. For example:
template <typename T>
struct ZeroingAllocator
{
using value_type = T;
T* allocate(std::size_t N)
{
return reinterpret_cast<T*>(new unsigned char[N * sizeof(T)]);
}
void deallocate(T* buffer, std::size_t N) noexcept
{
auto ptr = reinterpret_cast<volatile unsigned char*>(buffer);
std::fill(ptr, ptr + N, 0);
delete[] reinterpret_cast<unsigned char*>(buffer);
}
};
template <typename A, typename B>
bool operator ==(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return true; }
template <typename A, typename B>
bool operator !=(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return false; }
and then
using Array = std::vector<JsonValue, ZeroingAllocator<JsonValue>>;
Note: I fill the memory via volatile unsigned char* to prevent the compiler from optimizing away the zeroing. If you need to support overaligned types, you can replace the new[] and delete[] with direct calls to ::operator new and ::operator delete (doing this will prevent the compiler from optimizing away allocations). Pre C++17, you will have to allocate a sufficiently large buffer and then manually align the pointer, e.g., using std::align…

C++: How to read dynamic data elegantly into a struct?

Lets say i store headers in some file, but some part of the header is dynamic length, something like this it would look:
struct HeaderTest {
int someparam;
int more;
int arrsize; // how big array, read arrsize elements into arr:
int arr[arrsize]; // not valid
};
Is there some elegant way for reading dynamic data into a struct?

Instead of having arr and arrsize variables in your struct, you can define your struct like this:
struct HeaderTest
{
int someparam;
int more;
std::vector<int> data;
}
No arr, no arrsize. Just use std::vector, and std::vector::size(). That is elegant!
And if you want to read binary data from a file, then you can write like this:
struct HeaderTest
{
int someparam;
int more;
int size;
char *data;
}
Otherwise, go with the first struct!
An Advice:
Reading your comments everywhere, I feel that I should suggest you to get a good book, and study it first. Here is list of really good books:
The Definitive C++ Book Guide and List

Well, if you don't want to use a container class (not sure why you wouldn't) you can declare arr as a pointer to int and leave it to the client to initialize the pointer to a valid memory location as well as correctly initialize arrsize.
That said, you should just use a vector. Why make things more difficult than they need to be?

This answer is more C than C++, but, you can easily make use of realloc() to resize a buffer to be as large as you need it. As demonstrated in this pseudo code.
struct HeaderTest {
int someparam;
int more;
int arrsize;
int arr[];
};
HeaderTest* pkt = (HeaderTest*)malloc(sizeof(HeaderTest));
read(&pkt,sizeof(pkt));
pkt = (HeaderTest*)realloc(pkt,sizeof(HeaderTest)+sizeof(pkt->arr[0])*pkt->arrsize);
read(pkt->arr,sizeof(int)*pkt->arrsize);

I don't think there is a very elegant way. You should probably make that dynamic member a pointer, then read all other members first, allocate memory for the last one, and then read the remainder of the data.
Since you're in C++, you can nicely encapsulate this in a class so that you don't have to worry about this detail in your code anymore. Also, as other have said, a std::vector would be a more C++-like approach than a simple pointer and manually allocated memory. It would also be more resistant to memory leaks.

No one was able to give you the solution you wanted, but I have devised it for you.
This function takes a C-string filename, opens the file and reads the contents for you. It returns an int*, which can be assigned to t.container. Enjoy.
int* read(char* filename)
{
// open file
ifstream f;
f.open(filename, ios::binary);
// get file size
f.seekg (0, ios::end);
int length = f.tellg();
f.seekg (0, ios::beg);
// allocate new int*
length = (length -(sizeof(int)*2)) / sizeof(int);
int* buf = new int[length];
for(int i = 0; i < length; ++i)
{
// create byte array to hold bytes
unsigned char* temp = new char[sizeof(int)];
stream.read((char*)temp, sizeof(int));
// convert byte array to int
for(int j = 0; j < sizeof(int); ++j)
{
buf[i] = buf[i] + (temp[j] << (j*8));
}
delete[] temp;
}
f.close();
return buf;
}

How to initialize an array that is part of a struct typedef?

If I have a typedef of a struct
typedef struct
{
char SmType;
char SRes;
float SParm;
float EParm;
WORD Count;
char Flags;
char unused;
GPOINT2 Nodes[];
} GPATH2;
and it contains an uninitialized array, how can I create an instance of this type so that is will hold, say, 4 values in Nodes[]?
Edit: This belongs to an API for a program written in Assembler. I guess as long as the underlying data in memory is the same, an answer changing the struct definition would work, but not if the underlying memory is different. The Assembly Language application is not using this definition .... but .... a C program using it can create GPATH2 elements that the Assembly Language application can "read".
Can I ever resize Nodes[] once I have created an instance of GPATH2?
Note: I would have placed this with a straight C tag, but there is only a C++ tag.

You could use a bastard mix of C and C++ if you really want to:
#include <new>
#include <cstdlib>
#include "definition_of_GPATH2.h"
using namespace std;
int main(void)
{
int i;
/* Allocate raw memory buffer */
void * raw_buffer = calloc(1, sizeof(GPATH2) + 4 * sizeof(GPOINT2));
/* Initialize struct with placement-new */
GPATH2 * path = new (raw_buffer) GPATH2;
path->Count = 4;
for ( i = 0 ; i < 4 ; i++ )
{
path->Nodes[i].x = rand();
path->Nodes[i].y = rand();
}
/* Resize raw buffer */
raw_buffer = realloc(raw_buffer, sizeof(GPATH2) + 8 * sizeof(GPOINT2));
/* 'path' still points to the old buffer that might have been free'd
* by realloc, so it has to be re-initialized
* realloc copies old memory contents, so I am not certain this would
* work with a proper object that actaully does something in the
* constructor
*/
path = new (raw_buffer) GPATH2;
/* now we can write more elements of array */
path->Count = 5;
path->Nodes[4].x = rand();
path->Nodes[4].y = rand();
/* Because this is allocated with malloc/realloc, free it with free
* rather than delete.
* If 'path' was a proper object rather than a struct, you should
* call the destructor manually first.
*/
free(raw_buffer);
return 0;
}
Granted, it's not idiomatic C++ as others have observed, but if the struct is part of legacy code it might be the most straightforward option.
Correctness of the above sample program has only been checked with valgrind using dummy definitions of the structs, your mileage may vary.

If it is fixed size write:
typedef struct
{
char SmType;
char SRes;
float SParm;
float EParm;
WORD Count;
char Flags;
char unused;
GPOINT2 Nodes[4];
} GPATH2;
if not fixed then change declaration to
GPOINT2* Nodes;
after creation or in constructor do
Nodes = new GPOINT2[size];
if you want to resize it you should use vector<GPOINT2>, because you can't resize array, only create new one. If you decide to do it, don't forget to delete previous one.
also typedef is not needed in c++, you can write
struct GPATH2
{
char SmType;
char SRes;
float SParm;
float EParm;
WORD Count;
char Flags;
char unused;
GPOINT2 Nodes[4];
};

This appears to be a C99 idiom known as the "struct hack". You cannot (in standard C99; some compilers have an extension that allows it) declare a variable with this type, but you can declare pointers to it. You have to allocate objects of this type with malloc, providing extra space for the appropriate number of array elements. If nothing holds a pointer to an array element, you can resize the array with realloc.
Code that needs to be backward compatible with C89 needs to use
GPOINT2 Nodes[1];
as the last member, and take note of this when allocating.
This is very much not idiomatic C++ -- note for instance that you would have to jump through several extra hoops to make new and delete usable -- although I have seen it done. Idiomatic C++ would use vector<GPOINT2> as the last member of the struct.

Arrays of unknown size are not valid as C++ data members. They are valid in C99, and your compiler may be mixing C99 support with C++.
What you can do in C++ is 1) give it a size, 2) use a vector or another container, or 3) ditch both automatic (local variable) and normal dynamic storage in order to control allocation explicitly. The third is particularly cumbersome in C++, especially with non-POD, but possible; example:
struct A {
int const size;
char data[1];
~A() {
// if data was of non-POD type, we'd destruct data[1] to data[size-1] here
}
static auto_ptr<A> create(int size) {
// because new is used, auto_ptr's use of delete is fine
// consider another smart pointer type that allows specifying a deleter
A *p = ::operator new(sizeof(A) + (size - 1) * sizeof(char));
try { // not necessary in our case, but is if A's ctor can throw
new(p) A(size);
}
catch (...) {
::operator delete(p);
throw;
}
return auto_ptr<A>(p);
}
private:
A(int size) : size (size) {
// if data was of non-POD type, we'd construct here, being very careful
// of exception safety
}
A(A const &other); // be careful if you define these,
A& operator=(A const &other); // but it likely makes sense to forbid them
void* operator new(size_t size); // doesn't prevent all erroneous uses,
void* operator new[](size_t size); // but this is a start
};
Note you cannot trust sizeof(A) any where else in the code, and using an array of size 1 guarantees alignment (matters when the type isn't char).

This type of structure is not trivially useable on the stack, you'll have to malloc it. the significant thing to know is that sizeof(GPATH2) doesn't include the trailing array. so to create one, you'd do something like this:
GPATH2 *somePath;
size_t numPoints;
numPoints = 4;
somePath = malloc(sizeof(GPATH2) + numPoints*sizeof(GPOINT2));
I'm guessing GPATH2.Count is the number of elements in the Nodes array, so if it's up to you to initialize that, be sure and set somePath->Count = numPoints; at some point. If I'm mistaken, and the convention used is to null terminate the array, then you would do things just a little different:
somePath = malloc(sizeof(GPATH2) + (numPoints+1)*sizeof(GPOINT2));
somePath->Nodes[numPoints] = Some_Sentinel_Value;
make darn sure you know which convention the library uses.
As other folks have mentioned, realloc() can be used to resize the struct, but it will invalidate old pointers to the struct, so make sure you aren't keeping extra copies of it (like passing it to the library).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Resizing a C++ std::vector<char> without initializing data [duplicate] - c++

Related

high performance 'proper' c++ alternative to variable length array

C++ behavior that I don't understand

Reallocate array with memcpy and memset

C++: How to read dynamic data elegantly into a struct?

How to initialize an array that is part of a struct typedef?

Categories

Resources