Vector of double to a vector bytes - c++

I have a std::vector<double> and need to work with a library the takes a const vector<uint8_t>. I specify what type the data is to the library with an enum.
Is there anyway that I can avoid copying data completely and have the byte vector internally refer to the same data as the double vector? Since the byte vector is const and the double vector won't change during the lifetime of the byte vector, this appears like it would be pretty safe. There is a lot of data so copying it really isn't an option.

If your "byte vector" were actually a vector of bytes then you would have a chance, because you can legally examine pretty much anything as a char array. However, uint32_ts are not bytes and they are certainly not chars. So, no, you basically can't do this without horrible hacky magic whose safety will be entirely implementation dependent.
Either way, you can't do it with the vector types: you'd have to cast and pass the result of std::vector::data(), i.e. a pointer.
Sorry but I have to recommend revisiting your design. If the library you're using really takes a vector of integers that is actually supposed to be a vector of doubles, then its developers have put you in an awkward position.

Related

Why aren't built-in arrays safe?

The book C++ Primer, 5th edition by Stanley B. Lippman (ISBN 0-321-71411-3/978-0-321-71411-4) mentions:
An [std::]array is a safer, easier-to-use alternative to built-in arrays.
What's wrong with built-in arrays?
A built-in array is a contiguous block of bytes, usually on the stack. You really have no decent way to keep useful information about the array, its boundaries or its state. std::array keeps this information.
Built-in arrays are decayed into pointers when passed from/to functions. This may cause:
When passing a built-in array, you pass a raw pointer. A pointer doesn't keep any information about the size of the array. You will have to pass along the size of the array and thus uglify the code. std::array can be passed as reference, copy or move.
There is no way of returning a built-in array, you will eventually return a pointer to local variable if the array was declared in that function scope.
std::array can be returned safely, because it's an object and its lifetime is managed automatically.
You can't really do useful stuff on built-in arrays such as assigning, moving or copying them. You'll end writing a customized function for each built-in array (possibly using templates). std::array can be assigned.
By accessing an element which is out of the array boundaries, you are triggering undefined behaviour. std::array::at will preform boundary checking and throw a regular C++ exception if the check fails.
Better readability: built in arrays involves pointers arithmetic. std::array implements useful functions like front, back, begin and end to avoid that.
Let's say I want to sort a built-in array, the code could look like:
int arr[7] = {/*...*/};
std::sort(arr, arr+7);
This is not the most robust code ever. By changing 7 to a different number, the code breaks.
With std::array:
std::array<int,7> arr{/*...*/};
std::sort(arr.begin(), arr.end());
The code is much more robust and flexible.
Just to make things clear, built-in arrays can sometimes be easier. For example, many Windows as well as UNIX API functions/syscalls require some (small) buffers to fill with data. I wouldn't go with the overhead of std::array instead of a simple char[MAX_PATH] that I may be using.
It's hard to gauge what the author meant, but I would guess they are referring to the following facts about native arrays:
they are raw
There is no .at member function you can use for element access with bounds checking, though I'd counter that you usually don't want that anyway. Either you're accessing an element you know exists, or you're iterating (which you can do equally well with std::array and native arrays); if you don't know the element exists, a bounds-checking accessor is already a pretty poor way to ascertain that, as it is using the wrong tool for code flow and it comes with a substantial performance penalty.
they can be confusing
Newbies tend to forget about array name decay, passing arrays into functions "by value" then performing sizeof on the ensuing pointer; this is not generally "unsafe", but it will create bugs.
they can't be assigned
Again, not inherently unsafe, but it leads to silly people writing silly code with multiple levels of pointers and lots of dynamic allocation, then losing track of their memory and committing all sorts of UB crimes.
Assuming the author is recommending std::array, that would be because it "fixes" all of the above things, leading to generally better code by default.
But are native arrays somehow inherently "unsafe" by comparison? No, I wouldn't say so.
How is std::array safer and easier-to-use than a built-in array?
It's easy to mess up with built-in arrays, especially for programmers who aren't C++ experts and programmers who sometimes make mistakes. This causes many bugs and security vulnerabilities.
With a std::array a1, you can access an element with bounds checking a.at(i) or without bounds checking a[i]. With a built-in array, it's always your responsibility to diligently avoid out-of-bounds accesses. Otherwise the code can smash some memory that goes unnoticed for a long time and becomes very difficult to debug. Even just reading outside an array's bounds can be exploited for security holes like the Heartbleed bug that divulges private encryption keys.
C++ tutorials may pretend that array-of-T and pointer-to-T are the same thing, then later tell you about various exceptions where they are not the same thing. E.g. an array-of-T in a struct is embedded in the struct, while a pointer-to-T in a struct is a pointer to memory that you'd better allocate. Or consider an array of arrays (such as a raster image). Does auto-increment the pointer to the next pixel or the next row? Or consider an array of objects where an object pointer coerces to its base class pointer. All this is complicated and the compiler doesn't catch mistakes.
With a std::array a1, you can get its size a1.size(), compare its contents to another std::array a1 == a2, and use other standard container methods like a1.swap(a2). With built-in arrays, these operations take more programming work and are easier to mess up. E.g. given int b1[] = {10, 20, 30}; to get its size without hard-coding 3, you must do sizeof(b1) / sizeof(b1[0]). To compare its contents, you must loop over those elements.
You can pass a std::array to a function by reference f(&a1) or by value f(a1) [i.e. by copy]. Passing a built-in array only goes by reference and confounds it with a pointer to the first element. That's not the same thing. The compiler doesn't pass the array size.
You can return a std::array from a function by value, return a1. Returning a built-in array return b1 returns a dangling pointer, which is broken.
You can copy a std::array in the usual way, a1 = a2, even if it contains objects with constructors. If you try that with built-in arrays, b1 = b2, it'll just copy the array pointer (or fail to compile, depending on how b2 is declared). You can get around that using memcpy(b1, b2, sizeof(b1) / sizeof(b1[0])), but this is broken if the arrays have different sizes or if they contain elements with constructors.
You can easily change code that uses std::array to use another container like std::vector or std::map.
See the C++ FAQ Why should I use container classes rather than simple arrays? to learn more, e.g. the perils of built-in arrays containing C++ objects with destructors (like std::string) or inheritance.
Don't Freak Out About Performance
Bounds-checking access a1.at(i) requires a few more instructions each time you fetch or store an array element. In some inner loop code that jams through a large array (e.g. an image processing routine that you call on every video frame), this cost might add up enough to matter. In that rare case it makes sense to use unchecked access a[i] and carefully ensure that the loop code takes care with bounds.
In most code you're either offloading the image processing code to the GPU, or the bounds-checking cost is a tiny fraction of the overall run time, or the overall run time is not at issue. Meanwhile the risk of array access bugs is high, starting with the hours it takes you to debug it.
The only benefit of a built-in array would be slightly more concise declaration syntax. But the functional benefits of std::array blow that out of the water.
I would also add that it really doesn't matter that much. If you have to support older compilers, then you don't have a choice, of course, since std::array is only for C++11. Otherwise, you can use whichever you like, but unless you make only trivial use of the array, you should prefer std::array just to keep things in line with other STL containers (e.g., what if you later decide to make the size dynamic, and use std::vector instead, then you will be happy that you used std::array because all you will have to change is probably the array declaration itself, and the rest will be the same, especially if you use auto and other type-inference features of C++11.
std::array is a template class that encapsulate a statically-sized array, stored inside the object itself, which means that, if you instantiate the class on the stack, the array itself will be on the stack. Its size has to be known at compile time (it's passed as a template parameter), and it cannot grow or shrink.
Arrays are used to store a sequence of objects
Check the tutorial: http://www.cplusplus.com/doc/tutorial/arrays/
A std::vector does the same but it's better than built-in arrays (e.g: in general, vector hasn't much efficiency difference than built
in arrays when accessing elements via operator[]): http://www.cplusplus.com/reference/stl/vector/
The built-in arrays are a major source of errors – especially when they are used to build multidimensional arrays.
For novices, they are also a major source of confusion. Wherever possible, use vector, list, valarray, string, etc.
STL containers don't have the same problems as built in arrays
So, there is no reason in C++ to persist in using built-in arrays. Built-in arrays are in C++ mainly for backwards compatibility with C.
If the OP really wants an array, C++11 provides a wrapper for the built-in array, std::array. Using std::array is very similar to using the built-in array has no effect on their run-time performance, with much more features.
Unlike with the other containers in the Standard Library, swapping two array containers is a linear operation that involves swapping all the elements in the ranges individually, which generally is a considerably less efficient operation. On the other side, this allows the iterators to elements in both containers to keep their original container association.
Another unique feature of array containers is that they can be treated as tuple objects: The header overloads the get function to access the elements of the array as if it was a tuple, as well as specialized tuple_size and tuple_element types.
Anyway, built-in arrays are all ways passed by reference. The reason for this is when you pass an array to a function as a argument, pointer to it's first element is passed.
when you say void f(T[] array) compiler will turn it into void f(T* array)
When it comes to strings. C-style strings (i.e. null terminated character sequences) are all ways passed by reference since they are 'char' arrays too.
STL strings are not passed by reference by default. They act like normal variables.
There are no predefined rules for making parameter pass by reference. Even though the arrays are always passed by reference automatically.
vector<vector<double>> G1=connectivity( current_combination,M,q2+1,P );
vector<vector<double>> G2=connectivity( circshift_1_dexia(current_combination),M,q1+1,P );
This could also be copying vectors since connectivity returns a vector by value. In some cases, the compiler will optimize this out. To avoid this for sure though, you can pass the vector as non-const reference to connectivity rather than returning them. The return value of maxweight is a 3-dimensional vector returned by value (which may make a copy of it).
Vectors are only efficient for insert or erase at the end, and it is best to call reserve() if you are going to push_back a lot of values. You may be able to re-write it using list if you don't really need random access; with list you lose the subscript operator, but you can still make linear passes through, and save iterators to elements, rather than subscripts.
With some compilers, it can be faster to use pre-increment, rather than post-increment. Prefer ++i to i++ unless you actually need to use the post-increment. They are not the same.
Anyway, vector is going to be horribly slow if you are not compiling with optimization on. With optimization, it is close to built-in arrays. Built-in arrays can be quite slow without optimization on also, but not as bad as vector.
std::array has the at member function which is safe. It also have begin, end, size which you can use to make your code safer.
Raw arrays don't have that. (In particular, when raw arrays are decayed to pointers -e.g. when passed as arguments-, you lose any size information, which is kept in the std::array type since it is a template with the size as argument)
And a good optimizing C++11 compiler will handle std::array (or references to them) as efficiently as raw arrays.
Built-in arrays are not inherently unsafe - if used correctly. But it is easier to use built-in arrays incorrectly than it is to use alternatives, such as std::array, incorrectly and these alternatives usually offer better debugging features to help you detect when they have been used incorrectly.
Built-in arrays are subtle. There are lots of aspects that behave unexpectedly, even to experienced programmers.
std::array<T, N> is truly a wrapper around a T[N], but many of the already mentioned aspects are sorted out for free essentially, which is very optimal and something you want to have.
These are some I haven't read:
Size: N shall be a constant expression, it cannot be variable, for both. However, with built-in arrays, there's VLA(Variable Length Array) that allows that as well.
Officially, only C99 supports them. Still, many compilers allow that in previous versions of C and C++ as extensions. Therefore, you may have
int n; std::cin >> n;
int array[n]; // ill-formed but often accepted
that compiles fine. Had you used std::array, this could never work because N is required and checked to be an actual constant expression!
Expectations for arrays: A common flawed expectation is that the size of the array is carried along with the array itself when the array isn't really an array anymore, due to the kept misleading C syntax as in:
void foo(int array[])
{
// iterate over the elements
for (int i = 0; i < sizeof(array); ++i)
array[i] = 0;
}
but this is wrong because array has already decayed to a pointer, which has no information about the size of the pointed area. That misconception triggers undefined behavior if array has less than sizeof(int*) elements, typically 8, apart from being logically erroneous.
Crazy uses: Even further on that, there are some quirks arrays have:
Whether you have array[i] or i[array], there is no difference. This is not true for array, because calling an overloaded operator is effectively a function call and the order of the parameters matters.
Zero-sized arrays: N shall be greater than zero but it is still allowed as an extensions and, as before, often not warned about unless more pedantry is required. Further information here.
array has different semantics:
There is a special case for a zero-length array (N == 0). In that
case, array.begin() == array.end(), which is some unique value. The
effect of calling front() or back() on a zero-sized array is
undefined.

Memory allocation of C++ vector<bool>

The vector<bool> class in the C++ STL is optimized for memory to allocate one bit per bool stored, rather than one byte. Every time I output sizeof(x) for vector<bool> x, the result is 40 bytes creating the vector structure. sizeof(x.at(0)) always returns 16 bytes, which must be the allocated memory for many bool values, not just the one at position zero. How many elements do the 16 bytes cover? 128 exactly? What if my vector has more or less elements?
I would like to measure the size of the vector and all of its contents. How would I do that accurately? Is there a C++ library available for viewing allocated memory per variable?
I don't think there's any standard way to do this. The only information a vector<bool> implementation gives you about how it works is the reference member type, but there's no reason to assume that this has any congruence with how the data are actually stored internally; it's just that you get a reference back when you dereference an iterator into the container.
So you've got the size of the container itself, and that's fine, but to get the amount of memory taken up by the data, you're going to have to inspect your implementation's standard library source code and derive a solution from that. Though, honestly, this seems like a strange thing to want in the first place.
Actually, using vector<bool> is kind of a strange thing to want in the first place. All of the above is essentially why its use is frowned upon nowadays: it's almost entirely incompatible with conventions set by other standard containers… or even those set by other vector specialisations.

C++ alignment of multidimensional array structure

In my code, I have to consider an array of arrays, where the inner arrays are of a fixed dimension. In order to make use of STL algorithms, it is useful to actually store the data as array of arrays, but I also need to pass that data to a C library, which takes a flattened C-style array.
It would be great to be able to convert (i.e. flatten) the multi-dimensional array cheaply and in a portable way. I will stick to a very simple case, the real problem is more general.
struct my_inner_array { int data[3]; };
std::vector<my_inner_array> x(15);
Is
&(x[0].data[0])
a pointer to a continuous block of memory of size 45*sizeof(int) containing the same entries as x? Or do I have to worry about alignment? I am afraid that this will work for me (at least for certain data types and inner array sizes) but that it is not portable.
Is this code portable?
If not, is there a way to make it work?
If not, do you have any suggestions what I could do?
Does it change anything at all if my_inner_array is not a POD struct, but contains some methods (as long as the class does not contain any virtual methods)?
1 Theoretically no. The compiler may decide to add padding to my_inner_array. In practice, I don't see a reason why the compiler would add padding to a struct that has an array in it. In such a case there's no alignment problem creating an array of such structs. You can use a compile time assert:
typedef int my_inner_array_array[3];
BOOST_STATIC_ASSERT(sizeof(my_inner_array) == sizeof(my_inner_array_array));
4 If there are no virtual methods it shouldn't make any difference.

C++ STL's String eqivalent for Binary Data

I am writing a C++ application and I was wondering what the C++ conventional way of storing a byte array in memory.
Is there something like a string, except specifically made for binary data.
Right now I am using a *unsigned char** array to store the data, but something more STL/C++ like would be better.
I'd use std::vector<unsigned char>. Most operations you need can be done using the STL with iterator ranges. Also, remember that if you really need the raw data &v[0] is guaranteed to give a pointer to the underlying array.
You can use std::string also for binary data. The length of the data in std::string is stored explicitly and not determined by null-termination, so null-bytes don't have special meaning in a std::string.
std::string is often more convenient than std::vector<char> because it provides many methods that are useful to work with binary data but not provided by vector. To parse/create binary data it is useful to have things like substr(), overloads for + and std::stringstream at your disposal. On vectors the algorithms from <algorithm> can be used to achieve the same effects, but it's more clumsy than the string methods. If you just act on "sequences of characters", std::string gives you the methods you usually want, even if these sequences happen to contain "binary" data.
You should use std::vector<unsigned char> or std::vector<uint8_t> (if you have a modern stdint.h header). There's nothing wrong with using unsigned char[] or uint8_t[] if you are working with fixed size buffers. Where std::vector really shines is when you need to grow or append to your buffers frequently. STL iterators have the same semantics as pointers, so STL algorithms will work equally well with std::vector and plain old arrays.
And as CAdaker pointed out, the expression &v[0] is guaranteed to give you the underlying pointer to the vector's buffer (and it's guaranteed to be one contiguous block of memory). This guarantee was added in an addendum to the C++ standard.
Personally, I'd avoid using std::string to manipulate arbitrary byte buffers, since I think it's potentially confusing, but it's not an unheard of practice.
There are multiple solutions but the closest one (I feel) is the std::vector<std::byte>> because it expresses the intent directly in code.
From : https://en.cppreference.com/w/cpp/types/byte
std::byte is a distinct type that implements the concept of byte as
specified in the C++ language definition.
Like char and unsigned char, it can be used to access raw memory
occupied by other objects (object representation), but unlike those
types, it is not a character type and is not an arithmetic type. A
byte is only a collection of bits, and the only operators defined for
it are the bitwise ones.
how about std::basic_string<uint8_t> ?

vector and dumping

From what i know a vector is guaranteed to be continuous and i can write a chunk of memory to it and do send of fwrite with it. All i need to do is make sure i call .resize() to force it to be the min length i need then i can use it as a normal char array? would this code be correct
v.resize(numOfElements);
v.clear(); //so i wont get numOfElements + len when i push back
vector<char>v2;
v2.resize(numOfElements*SizeOfType);
while(...)
{
...
v.push_bacK(x);
}
compress(&v2[0], len, &v[0], len);
fwrite(&v2[0], ....)
noting that i never push back or pop v2 i only resize it once and used it as a char array. Would this be safe? and if i also dumped v that would also be safe(i do push back and clear, i may dump it for testing)
v.resize(numOfElements);
v.clear(); //so i wont get numOfElements + len when i push back
Well, that above code snippet is in effect allocating and creating elements, just to destroy them again. It's in effect the same as:
v.reserve(numOfElements);
Just that this code is way faster. So, v.size() == 0 in both cases and v.capacity() might be the same as numOfElements in both cases too (although this is not guaranteed). In the second case, however, the capacity is at least numOfElements, which means the internal buffer will not be reallocated until you have push_back'ed that many elements to your vector. Note that in both cases it is invalid if you try accessing any elements - because there are zero elements actually contained.
Apart from that, i haven't figured a problem in your code. It's safe and i would encourage it so use it instead of a raw new or malloc because of the added safeties it provides. I'm however not sure what you mean by "dump v".
Indeed, std::vector is guaranteed to be contiguous, in order to be layout-compatible with a C array. However, you must be aware that many operations of the vector invalidate all pointers pointing to its elements, so you'd better stick to one type of use: avoid mixing pointer arithmetic and method calls on the vector.
Apart from that is perfectly correct, except the first line : what you want is
v.reserve(numOfElements);
which will allocate enough place to store numOfElements into the vector, whereas
v.resize(numOfElements);
will do the following:
// pseudo-code
if (v.size() < numOfElements)
insert (numOfElements - size) elements default
constructed at the end of the vector
if (v.size() > numOfElements)
erase the last elements so that size = numOfElements
To sum up, after a reserve you are sure that vector capacity is superior or equal to numOfElements, and after a resize you are sure that vector size is equal to numOfElements.
For something like this I would personally use a class like STLSoft's auto_buffer<>:
http://www.synesis.com.au/software/stlsoft/doc-1.9/classstlsoft_1_1auto__buffer.html
As a disclaimer - I don't use the actual STLSoft library version, I've adapted my own template that is quite similar - I started from the Matthew Wilson's (the STLSoft author's) book "Imperfect C++".
I find it useful when I really just want a plain-old C array, but the size must be dynamic at runtime. auto_buffer<> is safer than a plain old array, but once you've constructed it you have no worries about how many elements are there or not - it's always whatever you constructed it with, just like an array (so it's a bit less complex than vector<> - which is appropriate at times).
The major downside to auto_buffer<> is that it's not standard and it's not in Boost, so you either have to incorporate some of STLSoft into your project or roll your own version.
Yes you use a vector of char as a buffer for reading raw input.
// dynamically allocates a buffer of 10,000 char as buffer
std::vector<char> data(10000);
fread(&data[0], sizeof(char),data.size(),fp);
I would not use it for reading any non POD data type directly into an vector though.
You could potentially use a vector as a source for a write.
But I would be very carefull how you read that back in (it may be easier to serialize it).
fwrite(&data[0], sizeof(char),data.size(),fp);
You're replacing reserve() with resize, you may as well replace
vector<char> v2
with
vector<Type> v2
This should simplify the code a tiny bit.
To be frank, it's the oddest use of vectors I've ever seen, but it probably will work. Are you sure you don't want to go with new char[size] and some sort of auto pointer from boost?