Is there template in stl,boost or other LGPL open-source toolkit which behaves exactly like this:-
- a relative pointer with custom alignment,option to store fewer bits to reduce range.
a possible implementation to illustrate:-
template<typename T, typename OFFSET=int,
int ALIGN_SHIFT=2>
class OffsetPtr
{
OFFSET ofs;
public:
T* operator->() {
return (T*) (((((size_t)this)>>ALIGN_SHIFT)+ofs)<<ALIGN_SHIFT);
};
void operator=(T* src) {
size_t ofs_shifted = (((size_t) src)>>ALIGN_SHIFT) - (((size_t) this)>>ALIGN_SHIFT); //asserts..
ofs = (OFFSET) (ofs_shifted);
}
//...
};
Its something I would routinely create in the past (compact cache-friendly precompiled data-structures), e.g. for data broken into sub 128k chunks OFFSET=short
Another variation I'd use in ancient C #defines would use offsets from a header, where the alignments would be more useful.
I've seen something about an 'interprocess library' in boost having an 'offset_ptr', that looks very similar, so it seems likely there's an existing implementation including this exact pattern somewhere.
It's quick to write but there might be benefits to an existing implementation like a suite of associated stl compliant data structures built around the same concept - a 'near vector' with 16bit offset pointer & 16bit count for example
If you're using Visual C++, you might like to use __based pointers.
Related
This question already has answers here:
Compile-time check to make sure that there is no padding anywhere in a struct
(4 answers)
Closed 3 years ago.
Lets consider the following task:
My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8].
The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:
struct Interpretation_1 {
uint8_t multiplexer;
uint8_t timestamp;
uint32_t position;
uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";
The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use
with gcc: struct MyStruct{} __attribute__((__packed__));
with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
with clang: ? (I could check it)
But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.
But is there any portable way to achieve this?
No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.
Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.
We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).
There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.
Given such check, you can make the compilation fail on systems where the assumption doesn't hold.
Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.
Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.
(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)
You could, however, write a wrapper, like how std::vector<bool>::operator[] works:
#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>
template <typename T>
struct unaligned_wrapper {
static_assert(std::is_trivial<T>::value);
std::aligned_storage_t<sizeof(T), 1> buf;
operator T() const noexcept {
T ret;
memcpy(&ret, &buf, sizeof(T));
return ret;
}
unaligned_wrapper& operator=(T t) noexcept {
memcpy(&buf, &t, sizeof(T));
return *this;
}
};
struct Interpretation_1 {
unaligned_wrapper<uint8_t> multiplexer;
unaligned_wrapper<uint8_t> timestamp;
unaligned_wrapper<uint32_t> position;
unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
int main(){
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << interpreter->movement.position << "\n";
}
This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.
I would like to know if it's possible in C++ to change the type of a std::vector already filled with values, exactly as a union works, i.e.:
not changing any single bit of the binary content
not computing any type casting (no mathematical operations)
just reinterpreting the content of binary data using a new type (ex. uint16 or float32) without any memory copy or reallocation (as I would like to use vectors of several gigabytes in size)
For example, I have a vector filled with 20 values:
0x00, 0x01, 0x02, 0x03 ...
and I want to re-interpret it as a vector of 10 values, with the same overall binary content:
0x0001, 0x0203 (depending on the little endian / big endian convention)
The closest thing I could do is:
vector<uint8_t> test8(20);
uint16_t* pv16 = (uint16_t*) (&test8[0]);
vector<uint16_t> test16(pv16, pv16+10);
The result is exactly what I want, except that it makes a copy of the entire data, whereas I would like to use the existing data.
I would appreciate any help on this subject.
Thanks a lot for your answer.
You probably don't need a full-blown vector, just something that behaves like a container. You can create your own punned_view that just references the memory in the existing vector.
Please also read up on type punning and undefined behavior in C++, as it's quite a subtle topic. See https://blog.regehr.org/archives/959
#include <type_traits>
#include <cstring>
#include <cstdint>
#include <vector>
template <typename To>
class punned_view
{
static_assert(std::is_trivial<To>::value);
const char* begin_;
const char* end_;
public:
template <typename From>
punned_view(From* begin, From* end)
: begin_{reinterpret_cast<const char*>(begin)}
, end_{reinterpret_cast<const char*>(end)}
{
static_assert(sizeof(To) >= sizeof(From)); // exercise to make it work with smaller types too
static_assert(std::is_trivial<From>::value);
// add checks that size is a multiple of To here
}
std::size_t size() const noexcept
{
return (end_ - begin_) / sizeof(To);
}
class const_iterator
{
const char* current_;
public:
const_iterator(const char* current)
: current_{current}
{ }
const_iterator& operator++() noexcept
{
current_ += sizeof(To);
return *this;
}
To operator*() const noexcept
{
To result;
// only legal way to type pun in C++
std::memcpy(&result, current_, sizeof(result));
return result;
}
bool operator != (const_iterator other) const noexcept
{
return current_ != other.current_;
}
};
const_iterator begin() const noexcept { return {begin_}; }
const_iterator end() const noexcept { return {end_}; }
};
uint16_t sum_example(const std::vector<uint8_t>& vec)
{
punned_view<uint16_t> view{vec.data(), vec.data() + vec.size()};
uint16_t sum = 0;
for (uint16_t v : view)
sum += v;
return sum;
}
and thank you for all your quick and detailed answers. I was nicely surprised as the last time I used a forum (Eclipse) I remember getting exactly zero answers after an entire month...
Anyway, before I can try to test the different solutions you suggested, I wanted first to react to the excellent point rose by David Schwartz: yes my question is definitely a XY question, and yes I completely omitted to mention the context that led me to this exotic situation and what my real need ares.
So to make a long story short, what I really want is to read the content of a tiff image (satellite image with only gray-scale values, no RGB or any color combination) using gdal in C++, then perform some simple operations, some of them as basic as getting the right pixels values. Sounds simple as hell, doesn't it ? Now in real life everything is a nightmare when using gdal (which is as powerful as cryptic) and NOT knowing beforehand the actual pixel data type (which could be basically any kind of int or floating-point with any precision). As far as I could understand with tutorials, examples and forum, gdal offers me only 2 (hardly satisfactory) ways of reading the content of a tiff image:
1) either I know exactly the pixel datatype of my image (ex int16), and I have to hardcode it somewhere, which I cannot afford (and templates would not help here, as at a certain point I have to store the content of my image into a variable, which means I must know its precise type).
2) or I can read an image of any pixel data type but using a automatic conversion into a given target type (ex float64 to cover all possible value ranges). Sounds convenient and easy, but the downside is that this systematic conversion is a potentially huge waste of time and memory (think of uint8 in source array converted into float64 in target array!). An insane option for me as I usually work with massively big images (like several giga-pixels!)
3) I kind of figured out by myself a ugly/clumsy alternate solution, where I let gdal load the image content in a kind of "raw binary" content (officially an array of bytes) then eventually try to read it back by interpreting it according to the real datatype (that gdal can tell me afterwards). The good side is that the exact binary content of the image is loaded with no conversion whatsoever, so best speed and memory usage. The downside is that I end up eventually trying to fiddle with this binary data in order to interpret it correctly, avoiding any copy or mathematical operations.
So that's what led me into this awkward attempt of "in-place re-interpretation" of my data, or whatever the proper name is, just because I thought it would be a very simple and final step to getting the job done, but I might be wrong, and I might have overlooked simpler/cleaner solutions (actually I wish I have!).
Some final thoughts in order to "de-Y" my XY question !!!
_ using gdal library seems almost mandatory here, for as far as I know it is the only library that can handle properly the kind of image I am dealing with, i.e. multi-band tiff images (other libraries typically always consider 3 bands and interpret them blindly as RGB color components, which is absolutely not what I want here).
_ also I gave it a quick try with gdal for python but handling gigapixel large images in python sounds definitely like a wrong choice. Moreover my next step here should be to make a basic interactive image viewer (probably using Qt), so execution speed really matters.
_ I mentioned a lot using std::vector because I thought it would be easier to play with, but probably old-school C array would do the job.
_ finally I saw many answers mentioning alignment issue, that's really something I am not so comfortable with and that I wouldn't like to mess with...
So again, any further advice is welcome, including throwing away some of my previous attempts if it can simplify the situation and come out with a more direct solution, which is really something I would dream of.
Thanks again.
To get the data as another type, this could be achieved with some pointers and cast- magic.
From c++11 and later you can get a pointer to the raw data of a std::vector http://www.cplusplus.com/reference/vector/vector/data/
void* p;
uint16_t* p2;
std::vector<uint32_t> myvector;
myvector.push_back(0x12345678);
myvector.push_back(400);
p=myvector.data();
p2 = (uint16_t*)p;
for (size_t i = 0; i < 2*myvector.size(); i++) {
std::cout << *p2++ <<",";
}
As always when using casts of pointers you tell the compiler that you know better than it how to use and interpret the data and it will happily permit you to ignore alignment and endianess and do all harm you care with it.
I need to fill a vector with raw data, sometimes 2 bytes, sometimes 8... I ended up with this template function:
template <typename T>
void fillVector(std::vector<uint8_t>& dest, T t)
{
auto ptr = reinterpret_cast<uint8_t*>(&t);
dest.insert(dest.end(),ptr,ptr+sizeof(t));
}
with this I can fill the vector like this:
fillVector<uint32_t>(dst,32bitdata);
fillVector<uint16_t>(dst,16bitdata);
I was wondering if something more similar already exist in the standard library?
No, there's nothing in the standard library to achieve what you are after. So your solution is pretty much what you can currently go with (assuming your goal is to do some form of serialization).
The only point of improvement is that you are assuming uint8_t is a type that may be used to alias an object and inspect its bytes. That need not be the case. The only such types in C++11 are char and unsigned char. While uint8_t usually aliases the later in most modern architectures, that's not a hard requirement, it could alias a platform specific 8 bit unsigned integer type (the merits of that are outside the scope of this question). So to be standard conforming, either guard against it:
static_assert(std::is_same<unsigned char, std::uint8_t>::value, "Oops!");
Or use your own alias for valid "byte" type
namespace myapp { using byte = unsigned char; }
and deal in std::vector<myapp::byte>.
I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:
Bit fields are implementation defined
Bit shifting is not portable across big/little endian machines.
What I need:
Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
Store 2 values in the handle with minimum space
What I got:
template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
typedef T_HandleDef HandleDef;
typedef T_Storage Storage;
Handle(): handle_(0){}
private:
const T_Storage handle_;
};
template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
static const unsigned numIndexBits = T_numIndexBits;
};
template<class T_Handle>
struct HandleAccessor{
typedef typename T_Handle::Storage Storage;
typedef typename T_Handle::HandleDef HandleDef;
static const unsigned numIndexBits = HandleDef::numIndexBits;
static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;
/// "Magic" struct that splits the handle into values
union HandleData{
struct
{
Storage index : numIndexBits;
Storage magic : numMagicBits;
};
T_Handle handle;
};
};
A usage would be for example:
typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
HandleAccessor<FooHandle>::HandleData data;
data.idx = idx;
data.magic = m;
return data.handle;
}
My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.
So problems I run into:
Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.
Note: There are already questions about bitfields and unions. Summary:
Bitfields may have unexpected padding (impossible here as whole type occupied)
Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.
Note: No C++11, but boost is allowed.
Answer is pretty simple (based on another question I forgot the link to and comments by #Jeremy Friesner ):
As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian)
Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)
One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".
Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.
I need to use _Interlocked*** function on char or short, but it takes long pointer as input. It seems that there is function _InterlockedExchange8, I don't see any documentation on that. Looks like this is undocumented feature. Also compiler wasn't able to find _InterlockedAdd8 function.
I would appreciate any information on that functions, recommendations to use/not to use and other solutions as well.
update 1
I'll try to simplify the question.
How can I make this work?
struct X
{
char data;
};
X atomic_exchange(X another)
{
return _InterlockedExchange( ??? );
}
I see two possible solutions
Use _InterlockedExchange8
Cast another to long, do exchange and cast result back to X
First one is obviously bad solution.
Second one looks better, but how to implement it?
update 2
What do you think about something like this?
template <typename T, typename U>
class padded_variable
{
public:
padded_variable(T v): var(v) {}
padded_variable(U v): var(*static_cast<T*>(static_cast<void*>(&v))) {}
U& cast()
{
return *static_cast<U*>(static_cast<void*>(&var));
}
T& get()
{
return var;
}
private:
T var;
char padding[sizeof(U) - sizeof(T)];
};
struct X
{
char data;
};
template <typename T, int S = sizeof(T)> class var;
template <typename T> class var<T, 1>
{
public:
var(): data(T()) {}
T atomic_exchange(T another)
{
padded_variable<T, long> xch(another);
padded_variable<T, long> res(_InterlockedExchange(&data.cast(), xch.cast()));
return res.get();
}
private:
padded_variable<T, long> data;
};
Thanks.
It's pretty easy to make 8-bit and 16-bit interlocked functions but the reason they're not included in WinAPI is due to IA64 portability. If you want to support Win64 the assembler cannot be inline as MSVC no longer supports it. As external function units, using MASM64, they will not be as fast as inline code or intrinsics so you are wiser to investigate promoting algorithms to use 32-bit and 64-bit atomic operations instead.
Example interlocked API implementation: intrin.asm
Why do you want to use smaller data types? So you can fit a bunch of them in a small memory space? That's just going to lead to false sharing and cache line contention.
Whether you use locking or lockless algorithms, it's ideal to have your data in blocks of at least 128 bytes (or whatever the cache line size is on your CPU) that are only used by a single thread at a time.
Well, you have to make do with the functions available. _InterlockedIncrement and `_InterlockedCompareExchange are available in 16 and 32-bit variants (the latter in a 64-bit variant as well), and maybe a few other interlocked intrinsics are available in 16-bit versions as well, but InterlockedAdd doesn't seem to be, and there seem to be no byte-sized Interlocked intrinsics/functions at all.
So... You need to take a step back and figure out how to solve your problem without an IntrinsicAdd8.
Why are you working with individual bytes in any case? Stick to int-sized objects unless you have a really good reason to use something smaller.
Creating a new answer because your edit changed things a bit:
Use _InterlockedExchange8
Cast another to long, do exchange and cast result back to X
The first simply won't work. Even if the function existed, it would allow you to atomically update a byte at a time. Which means that the object as a whole would be updated in a series of steps which wouldn't be atomic.
The second doesn't work either, unless X is a long-sized POD type. (and unless it is aligned on a sizeof(long) boundary, and unless it is of the same size as a long)
In order to solve this problem you need to narrow down what types X might be. First, of course, is it guaranteed to be a POD type? If not, you have an entirely different problem, as you can't safely treat non-POD types as raw memory bytes.
Second, what sizes may X have? The Interlocked functions can handle 16, 32 and, depending on circumstances, maybe 64 or even 128 bit widths.
Does that cover all the cases you can encounter?
If not, you may have to abandon these atomic operations, and settle for plain old locks. Lock a Mutex to ensure that only one thread touches these objects at a time.