Filling an std::vector with raw data

Filling an std::vector with raw data - c++

I need to fill a vector with raw data, sometimes 2 bytes, sometimes 8... I ended up with this template function:
template <typename T>
void fillVector(std::vector<uint8_t>& dest, T t)
{
auto ptr = reinterpret_cast<uint8_t*>(&t);
dest.insert(dest.end(),ptr,ptr+sizeof(t));
}
with this I can fill the vector like this:
fillVector<uint32_t>(dst,32bitdata);
fillVector<uint16_t>(dst,16bitdata);
I was wondering if something more similar already exist in the standard library?

No, there's nothing in the standard library to achieve what you are after. So your solution is pretty much what you can currently go with (assuming your goal is to do some form of serialization).
The only point of improvement is that you are assuming uint8_t is a type that may be used to alias an object and inspect its bytes. That need not be the case. The only such types in C++11 are char and unsigned char. While uint8_t usually aliases the later in most modern architectures, that's not a hard requirement, it could alias a platform specific 8 bit unsigned integer type (the merits of that are outside the scope of this question). So to be standard conforming, either guard against it:
static_assert(std::is_same<unsigned char, std::uint8_t>::value, "Oops!");
Or use your own alias for valid "byte" type
namespace myapp { using byte = unsigned char; }
and deal in std::vector<myapp::byte>.

Related

Weird Data Types - C++

I know this sounds like a silly question, but I would like to know if it is possible in any way to make a custom variable size like this rather than using plain 8, 16, 32 and 64 bit integers:
uint15_t var; //as an example
I did some research online and I found nothing that works on all sizes (only stuff like 12 bit and 24 bit). (I was also wondering if bitfields would work with other data sizes too).
And yes, I know you could use a 16 bit integer to store a 15 bit value, but I just wanted to know if it is possible to do something like this.
How can I make and implement custom integer sizes?

Inside a struct or class, can use the bitfields feature to declare integers of the size you want for a member-variable:
unsigned int var : 15;
... it won't be very CPU-efficient (since the compiler will have to generate bitwise-operations on most accesses to that variable) but it will give you the desired 15-bit behavior.

To be able to use your bitfield int as a normal int but still get the behavior of a 15 bit int you can do it like this :
#include <cassert>
#include <utility>
template<typename type_t, std::size_t N>
struct bits_t final
{
bits_t() = default;
~bits_t() = default;
// explicitly implicit so it can convert automatically from underlying type
bits_t(const type_t& value) :
m_value{ value }
{
};
// implicit conversion back to underlying type
operator type_t ()
{
return m_value;
}
private:
type_t m_value : N;
};
int main()
{
bits_t<int, 15> value;
value = 16383; // 0x3FFF
assert(value == 16383);
// overflow now at bit 15 :)
value = 16384; // 0x4000, 16th bit is set.
assert(value == -16384);
return 0;
}

bitfields feature will do the trick ...
uint32_t customInt : 15;

You can try using bitfields, like many people mentioned. However, bitfields don't have a proper type. If you want to make your arbitrary-sized integers object-oriented, you can stuff the bitfield into a template:
template <int size> struct my_uint
{
uint32_t value: size;
};
typedef my_uint<13> uint13_t; // some people use "using" syntax to do this
typedef my_uint<14> uint14_t;
typedef my_uint<15> uint15_t;
However, now you lost arithmetic operators, and you have to implement (overload) them yourself. You have to ask yourself many questions about what you really want to do with these new types:
Do you want to overload operators like +, *, etc? Which ones?
Do you want to support arrays?
What is the maximum size you want to support? In my example, it's 32.
Do you want to support implicit constructors, e.g. uint15_t(uint32_t)?
How to support overflow?
There is no way to make your new types behave like built-in types - you can come close but cannot quite do it. That is, if you write a big program where you work with uint15_t and later you decide to switch to uint16_t, there will be subtle changes caused by uint16_t being a built-in type (e.g. consider rules about implicit conversions).

Portable bit fields for Handles

I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:
Bit fields are implementation defined
Bit shifting is not portable across big/little endian machines.
What I need:
Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
Store 2 values in the handle with minimum space
What I got:
template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
typedef T_HandleDef HandleDef;
typedef T_Storage Storage;
Handle(): handle_(0){}
private:
const T_Storage handle_;
};
template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
static const unsigned numIndexBits = T_numIndexBits;
};
template<class T_Handle>
struct HandleAccessor{
typedef typename T_Handle::Storage Storage;
typedef typename T_Handle::HandleDef HandleDef;
static const unsigned numIndexBits = HandleDef::numIndexBits;
static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;
/// "Magic" struct that splits the handle into values
union HandleData{
struct
{
Storage index : numIndexBits;
Storage magic : numMagicBits;
};
T_Handle handle;
};
};
A usage would be for example:
typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
HandleAccessor<FooHandle>::HandleData data;
data.idx = idx;
data.magic = m;
return data.handle;
}
My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.
So problems I run into:
Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.
Note: There are already questions about bitfields and unions. Summary:
Bitfields may have unexpected padding (impossible here as whole type occupied)
Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.
Note: No C++11, but boost is allowed.

Answer is pretty simple (based on another question I forgot the link to and comments by #Jeremy Friesner ):
As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian)
Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)
One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".
Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.

compact offset pointer , existing implementations?

Is there template in stl,boost or other LGPL open-source toolkit which behaves exactly like this:-
- a relative pointer with custom alignment,option to store fewer bits to reduce range.
a possible implementation to illustrate:-
template<typename T, typename OFFSET=int,
int ALIGN_SHIFT=2>
class OffsetPtr
{
OFFSET ofs;
public:
T* operator->() {
return (T*) (((((size_t)this)>>ALIGN_SHIFT)+ofs)<<ALIGN_SHIFT);
};
void operator=(T* src) {
size_t ofs_shifted = (((size_t) src)>>ALIGN_SHIFT) - (((size_t) this)>>ALIGN_SHIFT); //asserts..
ofs = (OFFSET) (ofs_shifted);
}
//...
};
Its something I would routinely create in the past (compact cache-friendly precompiled data-structures), e.g. for data broken into sub 128k chunks OFFSET=short
Another variation I'd use in ancient C #defines would use offsets from a header, where the alignments would be more useful.
I've seen something about an 'interprocess library' in boost having an 'offset_ptr', that looks very similar, so it seems likely there's an existing implementation including this exact pattern somewhere.
It's quick to write but there might be benefits to an existing implementation like a suite of associated stl compliant data structures built around the same concept - a 'near vector' with 16bit offset pointer & 16bit count for example

If you're using Visual C++, you might like to use __based pointers.

WinAPI _Interlocked* intrinsic functions for char, short

I need to use _Interlocked*** function on char or short, but it takes long pointer as input. It seems that there is function _InterlockedExchange8, I don't see any documentation on that. Looks like this is undocumented feature. Also compiler wasn't able to find _InterlockedAdd8 function.
I would appreciate any information on that functions, recommendations to use/not to use and other solutions as well.
update 1
I'll try to simplify the question.
How can I make this work?
struct X
{
char data;
};
X atomic_exchange(X another)
{
return _InterlockedExchange( ??? );
}
I see two possible solutions
Use _InterlockedExchange8
Cast another to long, do exchange and cast result back to X
First one is obviously bad solution.
Second one looks better, but how to implement it?
update 2
What do you think about something like this?
template <typename T, typename U>
class padded_variable
{
public:
padded_variable(T v): var(v) {}
padded_variable(U v): var(*static_cast<T*>(static_cast<void*>(&v))) {}
U& cast()
{
return *static_cast<U*>(static_cast<void*>(&var));
}
T& get()
{
return var;
}
private:
T var;
char padding[sizeof(U) - sizeof(T)];
};
struct X
{
char data;
};
template <typename T, int S = sizeof(T)> class var;
template <typename T> class var<T, 1>
{
public:
var(): data(T()) {}
T atomic_exchange(T another)
{
padded_variable<T, long> xch(another);
padded_variable<T, long> res(_InterlockedExchange(&data.cast(), xch.cast()));
return res.get();
}
private:
padded_variable<T, long> data;
};
Thanks.

It's pretty easy to make 8-bit and 16-bit interlocked functions but the reason they're not included in WinAPI is due to IA64 portability. If you want to support Win64 the assembler cannot be inline as MSVC no longer supports it. As external function units, using MASM64, they will not be as fast as inline code or intrinsics so you are wiser to investigate promoting algorithms to use 32-bit and 64-bit atomic operations instead.
Example interlocked API implementation: intrin.asm

Why do you want to use smaller data types? So you can fit a bunch of them in a small memory space? That's just going to lead to false sharing and cache line contention.
Whether you use locking or lockless algorithms, it's ideal to have your data in blocks of at least 128 bytes (or whatever the cache line size is on your CPU) that are only used by a single thread at a time.

Well, you have to make do with the functions available. _InterlockedIncrement and `_InterlockedCompareExchange are available in 16 and 32-bit variants (the latter in a 64-bit variant as well), and maybe a few other interlocked intrinsics are available in 16-bit versions as well, but InterlockedAdd doesn't seem to be, and there seem to be no byte-sized Interlocked intrinsics/functions at all.
So... You need to take a step back and figure out how to solve your problem without an IntrinsicAdd8.
Why are you working with individual bytes in any case? Stick to int-sized objects unless you have a really good reason to use something smaller.

Creating a new answer because your edit changed things a bit:
Use _InterlockedExchange8
Cast another to long, do exchange and cast result back to X
The first simply won't work. Even if the function existed, it would allow you to atomically update a byte at a time. Which means that the object as a whole would be updated in a series of steps which wouldn't be atomic.
The second doesn't work either, unless X is a long-sized POD type. (and unless it is aligned on a sizeof(long) boundary, and unless it is of the same size as a long)
In order to solve this problem you need to narrow down what types X might be. First, of course, is it guaranteed to be a POD type? If not, you have an entirely different problem, as you can't safely treat non-POD types as raw memory bytes.
Second, what sizes may X have? The Interlocked functions can handle 16, 32 and, depending on circumstances, maybe 64 or even 128 bit widths.
Does that cover all the cases you can encounter?
If not, you may have to abandon these atomic operations, and settle for plain old locks. Lock a Mutex to ensure that only one thread touches these objects at a time.

Where can I look up the definition of size_type for vectors in the C++ STL?

It seems safe to cast the result of my vector's size() function to an unsigned int. How can I tell for sure, though? My documentation isn't clear about how size_type is defined.

Do not assume the type of the container size (or anything else typed inside).
Today?
The best solution for now is to use:
std::vector<T>::size_type
Where T is your type. For example:
std::vector<std::string>::size_type i ;
std::vector<int>::size_type j ;
std::vector<std::vector<double> >::size_type k ;
(Using a typedef could help make this better to read)
The same goes for iterators, and all other types "inside" STL containers.
After C++0x?
When the compiler will be able to find the type of the variable, you'll be able to use the auto keyword. For example:
void doSomething(const std::vector<double> & p_aData)
{
std::vector<double>::size_type i = p_aData.size() ; // Old/Current way
auto j = p_aData.size() ; // New C++0x way, definition
decltype(p_aData.size()) k; // New C++0x way, declaration
}
Edit: Question from JF
What if he needs to pass the size of the container to some existing code that uses, say, an unsigned int? – JF
This is a problem common to the use of the STL: You cannot do it without some work.
The first solution is to design the code to always use the STL type. For example:
typedef std::vector<int>::size_type VIntSize ;
VIntSize getIndexOfSomeItem(const std::vector<int> p_aInt)
{
return /* the found value, or some kind of std::npos */
}
The second is to make the conversion yourself, using either a static_cast, using a function that will assert if the value goes out of bounds of the destination type (sometimes, I see code using "char" because, "you know, the index will never go beyond 256" [I quote from memory]).
I believe this could be a full question in itself.

According to the standard, you cannot be sure. The exact type depends on your machine. You can look at the definition in your compiler's header implementations, though.

I can't imagine that it wouldn't be safe on a 32-bit system, but 64-bit could be a problem (since ints remain 32 bit). To be safe, why not just declare your variable to be vector<MyType>::size_type instead of unsigned int?

It should always be safe to cast it to size_t. unsigned int isn't enough on most 64-bit systems, and even unsigned long isn't enough on Windows (which uses the LLP64 model instead of the LP64 model most Unix-like systems use).

The C++ standard only states that size_t is found in <cstddef>, which puts the identifiers in <stddef.h>. My copy of Harbison & Steele places the minimum and maximum values for size_t in <stdint.h>. That should give you a notion of how big your recipient variable needs to be for your platform.
Your best bet is to stick with integer types that are large enough to hold a pointer on your platform. In C99, that'd be intptr_t and uintptr_t, also officially located in <stdint.h>.

As long as you're sure that an unsigned int on your system will be large enough to hold the number of items you'll have in the vector you should be safe ;-)

I'm not sure how well this will work because I'm just thinking off the top of my head, but a compile-time assertion (such as BOOST_STATIC_ASSERT() or see Ways to ASSERT expressions at build time in C) might help. Something like:
BOOST_STATIC_ASSERT( sizeof( unsigned int) >= sizeof( size_type));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Filling an std::vector with raw data - c++

Related

Weird Data Types - C++

Portable bit fields for Handles

compact offset pointer , existing implementations?

WinAPI _Interlocked* intrinsic functions for char, short

Where can I look up the definition of size_type for vectors in the C++ STL?

Categories

Resources