Get byte representation of C++ class - c++

I have objects that I need to hash with SHA256. The object has several fields as follows:
class Foo {
// some methods
protected:
std::array<32,int> x;
char y[32];
long z;
}
Is there a way I can directly access the bytes representing the 3 member variables in memory as I would a struct ? These hashes need to be computed as quickly as possible so I want to avoid malloc'ing a new set of bytes and copying to a heap allocated array. Or is the answer to simply embed a struct within the class?
It is critical that I get the exact binary representation of these variables so that the SHA256 comes out exactly the same given that the 3 variables are equal (so I can't have any extra padding bytes etc included going into the hash function)

Most Hash classes are able to take multiple regions before returning the hash, e.g. as in:
class Hash {
public:
void update(const void *data, size_t size) = 0;
std::vector<uint8_t> digest() = 0;
}
So your hash method could look like this:
std::vector<uint8_t> Foo::hash(Hash *hash) const {
hash->update(&x, sizeof(x));
hash->update(&y, sizeof(y));
hash->update(&z, sizeof(z));
return hash->digest();
}

You can solve this by making an iterator that knows the layout of your member variables. Make Foo::begin() and Foo::end() functions and you can even take advantage of range-based for loops.
If you can increment it and dereference it, you can use it any other place you're able to use a LegacyForwardIterator.
Add in comparison functions to get access to the common it = X.begin(); it != X.end(); ++it idiom.
Some downsides include: ugly library code, poor maintainability, and (in this current form) no regard for endianess.
The solution to the latter downside is left as an exercise to the reader.
#include <array>
#include <iostream>
class Foo {
friend class FooByteIter;
public:
FooByteIter begin() const;
FooByteIter end() const;
Foo(const std::array<int, 2>& x, const char (&y)[2], long z)
: x_{x}
, y_{y[0], y[1]}
, z_{z}
{}
protected:
std::array<int, 2> x_;
char y_[2];
long z_;
};
class FooByteIter {
public:
FooByteIter(const Foo& foo)
: ptr_{reinterpret_cast<const char*>(&(foo.x_))}
, x_end_{reinterpret_cast<const char*>(&(foo.x_)) + sizeof(foo.x_)}
, y_begin_{reinterpret_cast<const char*>(&(foo.y_))}
, y_end_{reinterpret_cast<const char*>(&(foo.y_)) + sizeof(foo.y_)}
, z_begin_{reinterpret_cast<const char*>(&(foo.z_))}
{}
static FooByteIter end(const Foo& foo) {
FooByteIter fbi{foo};
fbi.ptr_ = reinterpret_cast<const char*>(&foo.z_) + sizeof(foo.z_);
return fbi;
}
bool operator==(const FooByteIter& other) const { return ptr_ == other.ptr_; }
bool operator!=(const FooByteIter& other) const { return ! (*this == other); }
FooByteIter& operator++() {
ptr_++;
if (ptr_ == x_end_) {
ptr_ = y_begin_;
}
else if (ptr_ == y_end_) {
ptr_ = z_begin_;
}
return *this;
}
FooByteIter operator++(int) {
FooByteIter pre = *this;
(*this)++;
return pre;
}
char operator*() const {
return *ptr_;
}
private:
const char* ptr_;
const char* const x_end_;
const char* const y_begin_;
const char* const y_end_;
const char* const z_begin_;
};
FooByteIter Foo::begin() const {
return FooByteIter(*this);
}
FooByteIter Foo::end() const {
return FooByteIter::end(*this);
}
template <typename InputIt>
char checksum(InputIt first, InputIt last) {
char check = 0;
while (first != last) {
check += (*first);
++first;
}
return check;
}
int main() {
Foo f{{1, 2}, {3, 4}, 5};
for (const auto b : f) {
std::cout << (int)b << ' ';
}
std::cout << std::endl;
std::cout << "Checksum is: " << (int)checksum(f.begin(), f.end()) << std::endl;
}
You can generalize this further by making serialization functions for all data types you might care about, allowing serialization of classes that aren't plain-old-data types.
Warning
This code assumes that the underlying types being serialized have no internal padding, themselves. This answer works for this datatype because it is made of types which themselves do not pad. To make this work for datatypes that have datatypes that have padding, this method would need to be recursed all the way down.

Just cast a pointer to object to a pointer to char. You can iterate through the bytes by increment. Use sizeof(foo) to check overflow.

As long as you're able to make your class an aggregate, i.e. std::is_aggregate_v<T> == true, you can actually sort-of reflect the members of the structure.
This allows you to easily hash the members without actually having to name them. (also you don't have to remember updating your hash function every time you add a new member)
Step 1: Getting the number of members inside the aggregate
First we need to know how many members a given aggregate type has.
We can check this by (ab-)using aggregate initialization.
Example:
Given struct Foo { int i; int j; };:
Foo a{}; // ok
Foo b{{}}; // ok
Foo c{{}, {}}; // ok
Foo d{{}, {}, {}}; // error: too many initializers for 'Foo'
We can use this to get the number of members inside the struct, by trying to add more initializers until we get an error:
template<class T>
concept aggregate = std::is_aggregate_v<T>;
struct any_type {
template<class T>
operator T() {}
};
template<aggregate T>
consteval std::size_t count_members(auto ...members) {
if constexpr (requires { T{ {members}... }; } == false)
return sizeof...(members) - 1;
else
return count_members<T>(members..., any_type{});
}
Notice that i used {members}... instead of members....
This is because of arrays - a structure like struct Bar{int i[2];}; could be initialized with 2 elements, e.g. Bar b{1, 2}, so our function would have returned 2 for Bar if we had used members....
Step 2: Extracting the members
Now that we know how many members our structure has, we can use structured bindings to extract them.
Unfortunately there is no way in the current standard to create a structured binding expression with a variable amount of expressions, so we have to add a few extra lines of code for each additional member we want to support.
For this example i've only added a max of 4 members, but you can add as many as you like / need:
template<aggregate T>
constexpr auto tie_struct(T const& data) {
constexpr std::size_t fieldCount = count_members<T>();
if constexpr(fieldCount == 0) {
return std::tie();
} else if constexpr (fieldCount == 1) {
auto const& [m1] = data;
return std::tie(m1);
} else if constexpr (fieldCount == 2) {
auto const& [m1, m2] = data;
return std::tie(m1, m2);
} else if constexpr (fieldCount == 3) {
auto const& [m1, m2, m3] = data;
return std::tie(m1, m2, m3);
} else if constexpr (fieldCount == 4) {
auto const& [m1, m2, m3, m4] = data;
return std::tie(m1, m2, m3, m4);
} else {
static_assert(fieldCount!=fieldCount, "Too many fields for tie_struct! add more if statements!");
}
}
The fieldCount!=fieldCount in the static_assert is intentional, this prevents the compiler from evaluating it prematurely (it only complains if the else case is actually hit)
Now we have a function that can give us references to each member of an arbitrary aggregate.
Example:
struct Foo {int i; float j; std::string s; };
Foo f{1, 2, "miau"};
// tup is of type std::tuple<int const&, float const&, std::string const&>
auto tup = tie_struct(f);
// this will output "12miau"
std::cout << std::get<0>(tup) << std::get<1>(tup) << std::get<2>(tup) << std::endl;
Step 3: hashing the members
Now that we can convert any aggregate into a tuple of its members, hashing it shouldn't be a big problem.
You can basically hash the individual types like you want and then combine the individual hashes:
// for merging two hash values
std::size_t hash_combine(std::size_t h1, std::size_t h2)
{
return (h2 + 0x9e3779b9 + (h1<<6) + (h1>>2)) ^ h1;
}
// Handling primitives
template <class T, class = void>
struct is_std_hashable : std::false_type { };
template <class T>
struct is_std_hashable<T, std::void_t<decltype(std::declval<std::hash<T>>()(std::declval<T>()))>> : std::true_type { };
template <class T>
concept std_hashable = is_std_hashable<T>::value;
template<std_hashable T>
std::size_t hash(T value) {
return std::hash<T>{}(value);
}
// Handling tuples
template<class... Members>
std::size_t hash(std::tuple<Members...> const& tuple) {
return std::apply([](auto const&... members) {
std::size_t result = 0;
((result = hash_combine(result, hash(members))), ...);
return result;
}, tuple);
}
template<class T, std::size_t I>
using Arr = T[I];
// Handling arrays
template<class T, std::size_t I>
std::size_t hash(Arr<T, I> const& arr) {
std::size_t result = 0;
for(T const& elem : arr) {
std::size_t h = hash(elem);
result = hash_combine(result, h);
}
return result;
};
// Handling structs
template<aggregate T>
std::size_t hash(T const& agg) {
return hash(tie_struct(agg));
}
This allows you to hash basically any aggregate struct, even with arrays and nested structs:
struct Foo{ int i; double d; std::string s; };
struct Bar { Foo k[10]; float f; };
std::cout << hash(Foo{1, 1.2f, "miau"}) << std::endl;
std::cout << hash(Bar{}) << std::endl;
full example on godbolt
Footnotes
This only works with aggregates
No need to worry about padding because we access the members directly.
You have to add a few more ifs into tie_struct if you need more than 4 members
The provided hash() function doesn't handle all types - if you need e.g. std::array, std::pair, etc... you need to add overloads for those.
It's a lot of boilerplate code, but it's insanely powerful.
You can also use Boost.PFR for the aggregate-to-tuple part, if you are allowed to use boost

Related

Use struct member pointer to fill-in a struct in C++

So I have the following available:
struct data_t {
char field1[10];
char field2[20];
char field3[30];
};
const char *getData(const char *key);
const char *field_keys[] = { "key1", "key2", "key3" };
This code is given to my and I cannot modify it in any way. It comes from some old C project.
I need to fill in the struct using the getData function with the different keys, something like the following:
struct data_t my_data;
strncpy(my_data.field1, getData(field_keys[0]), sizeof(my_data.field1));
strncpy(my_data.field1, getData(field_keys[1]), sizeof(my_data.field2));
strncpy(my_data.field1, getData(field_keys[2]), sizeof(my_data.field3));
Of course, this is a simplification, and more things are going on in each assignment. The point is that I would like to represent the mapping between keys and struct member in a constant structure, and use that to transform the last code in a loop. I am looking for something like the following:
struct data_t {
char field1[10];
char field2[20];
char field3[30];
};
typedef char *(data_t:: *my_struct_member);
const std::vector<std::pair<const char *, my_struct_member>> mapping = {
{ "FIRST_KEY" , &my_struct_t::field1},
{ "SECOND_KEY", &my_struct_t::field2},
{ "THIRD_KEY", &my_struct_t::field3},
};
int main()
{
data_t data;
for (auto const& it : mapping) {
strcpy(data.*(it.second), getData(it.first));
// Ideally, I would like to do
// strlcpy(data.*(it.second), getData(it.first), <the right sizeof here>);
}
}
This, however, has two problems:
It does not compile :) But I believe that should be easy to solve.
I am not sure about how to get the sizeof() argument for using strncpy/strlcpy, instead of strcpy. I am using char * as the type of the members, so I am losing the type information about how long each array is. In the other hand, I am not sure how to use the specific char[T] types of each member, because if each struct member pointer has a different type I don't think I will be able to have them in a std::vector<T>.
As explained in my comment, if you can store enough information to process a field in a mapping, then you can write a function that does the same.
Therefore, write a function to do so, using array references to ensure what you do is safe, e.g.:
template <std::size_t N>
void process_field(char (&dest)[N], const char * src)
{
strlcpy(dest, getData(src), N);
// more work with the field...
};
And then simply, instead of your for loop:
process_field(data.field1, "foo");
process_field(data.field2, "bar");
// ...
Note that the amount of lines is the same as with a mapping (one per field), so this is not worse than a mapping solution in terms of repetition.
Now, the advantages:
Easier to understand.
Faster: no memory needed to keep the mapping, more easily optimizable, etc.
Allows you to write different functions for different fields, easily, if needed.
Further, if both of your strings are known at compile-time, you can even do:
template <std::size_t N, std::size_t M>
void process_field(char (&dest)[N], const char (&src)[M])
{
static_assert(N >= M);
std::memcpy(dest, src, M);
// more work with the field...
};
Which will be always safe, e.g.:
process_field(data.field1, "123456789"); // just fits!
process_field(data.field1, "1234567890"); // error
Which has even more pros:
Way faster than any strcpy variant (if the call is done in run-time).
Guaranteed to be safe at compile-time instead of run-time.
A variadic templates based solution:
struct my_struct_t {
char one_field[30];
char another_field[40];
};
template<typename T1, typename T2>
void do_mapping(T1& a, T2& b) {
std::cout << sizeof(b) << std::endl;
strncpy(b, a, sizeof(b));
}
template<typename T1, typename T2, typename... Args>
void do_mapping(T1& a, T2& b, Args&... args) {
do_mapping(a, b);
do_mapping(args...);
}
int main()
{
my_struct_t ms;
do_mapping(
"FIRST_MAPPING", ms.one_field,
"SECOND_MAPPING", ms.another_field
);
return 0;
}
Since data_t is a POD structure, you can use offsetof() for this.
const std::vector<std::pair<const char *, std::size_t>> mapping = {
{ "FIRST_FIELD" , offsetof(data_t, field1},
{ "SECOND_FIELD", offsetof(data_t, field2)}
};
Then the loop would be:
for (auto const& it : mapping) {
strcpy(static_cast<char*>(&data) + it.second, getData(it.first));
}
I don't think there's any way to get the size of the member similarly. You can subtract the offset of the current member from the next member, but this will include padding bytes. You'd also have to special-case the last member, subtracting the offset from the size of the structure itself, since there's no next member.
The mapping can be a function to write the data into the appropriate member
struct mapping_t
{
const char * name;
std::function<void(my_struct_t *, const char *)> write;
};
const std::vector<mapping_t> mapping = {
{ "FIRST_KEY", [](data_t & data, const char * str) { strlcpy(data.field1, str, sizeof(data.field1); } }
{ "SECOND_KEY", [](data_t & data, const char * str) { strlcpy(data.field2, str, sizeof(data.field2); } },
{ "THIRD_KEY", [](data_t & data, const char * str) { strlcpy(data.field3, str, sizeof(data.field3); } },
};
int main()
{
data_t data;
for (auto const& it : mapping) {
it.write(data, getData(it.name));
}
}
To iterate over struct member you need:
offset / pointer to the beginning of that member
size of that member
struct Map {
const char *key;
std::size_t offset;
std::size_t size;
};
std::vector<Map> map = {
{ field_keys[0], offsetof(data_t, field1), sizeof(data_t::field1), },
{ field_keys[1], offsetof(data_t, field2), sizeof(data_t::field2), },
{ field_keys[2], offsetof(data_t, field3), sizeof(data_t::field3), },
};
once we have that we need strlcpy:
std::size_t mystrlcpy(char *to, const char *from, std::size_t max)
{
char * const to0 = to;
if (max == 0)
return 0;
while (--max != 0 && *from) {
*to++ = *from++;
}
*to = '\0';
return to0 - to - 1;
}
After having that, we can just:
data_t data;
for (auto const& it : map) {
mystrlcpy(reinterpret_cast<char*>(&data) + it.offset, getData(it.key), it.size);
}
That reinterpret_cast looks a bit ugly, but it just shift &data pointer to the needed field.
We can also create a smarter container which takes variable pointer on construction, thus is bind with an existing variable and it needs a little bit of writing:
struct Map2 {
static constexpr std::size_t max = sizeof(field_keys)/sizeof(*field_keys);
Map2(data_t* pnt) : mpnt(pnt) {}
char* getDest(std::size_t num) {
std::array<char*, max> arr = {
mpnt->field1,
mpnt->field2,
mpnt->field3,
};
return arr[num];
}
const char* getKey(std::size_t num) {
return field_keys[num];
}
std::size_t getSize(std::size_t num) {
std::array<std::size_t, max> arr = {
sizeof(mpnt->field1),
sizeof(mpnt->field2),
sizeof(mpnt->field3),
};
return arr[num];
}
private:
data_t* mpnt;
};
But probably makes the iterating more readable:
Map2 m(&data);
for (std::size_t i = 0; i < m.max; ++i) {
mystrlcpy(m.getDest(i), getData(m.getKey(i)), m.getSize(i));
}
Live code available at onlinegdb.

Virtually turn vector of struct into vector of struct members

I have a function that takes a vector-like input. To simplify things, let's use this print_in_order function:
#include <iostream>
#include <vector>
template <typename vectorlike>
void print_in_order(std::vector<int> const & order,
vectorlike const & printme) {
for (int i : order)
std::cout << printme[i] << std::endl;
}
int main() {
std::vector<int> printme = {100, 200, 300};
std::vector<int> order = {2,0,1};
print_in_order(order, printme);
}
Now I have a vector<Elem> and want to print a single integer member, Elem.a, for each Elem in the vector. I could do this by creating a new vector<int> (copying a for all Elems) and pass this to the print function - however, I feel like there must be a way to pass a "virtual" vector that, when operator[] is used on it, returns this only the member a. Note that I don't want to change the print_in_order function to access the member, it should remain general.
Is this possible, maybe with a lambda expression?
Full code below.
#include <iostream>
#include <vector>
struct Elem {
int a,b;
Elem(int a, int b) : a(a),b(b) {}
};
template <typename vectorlike>
void print_in_order(std::vector<int> const & order,
vectorlike const & printme) {
for (int i : order)
std::cout << printme[i] << std::endl;
}
int main() {
std::vector<Elem> printme = {Elem(1,100), Elem(2,200), Elem(3,300)};
std::vector<int> order = {2,0,1};
// how to do this?
virtual_vector X(printme) // behaves like a std::vector<Elem.a>
print_in_order(order, X);
}
It's not really possible to directly do what you want. Instead you might want to take a hint from the standard algorithm library, for example std::for_each where you take an extra argument that is a function-like object that you call for each element. Then you could easily pass a lambda function that prints only the wanted element.
Perhaps something like
template<typename vectorlike, typename functionlike>
void print_in_order(std::vector<int> const & order,
vectorlike const & printme,
functionlike func) {
for (int i : order)
func(printme[i]);
}
Then call it like
print_in_order(order, printme, [](Elem const& elem) {
std::cout << elem.a;
});
Since C++ have function overloading you can still keep the old print_in_order function for plain vectors.
Using member pointers you can implement a proxy type that will allow you view a container of objects by substituting each object by one of it's members (see pointer to data member) or by one of it's getters (see pointer to member function). The first solution addresses only data members, the second accounts for both.
The container will necessarily need to know which container to use and which member to map, which will be provided at construction. The type of a pointer to member depends on the type of that member so it will have to be considered as an additional template argument.
template<class Container, class MemberPtr>
class virtual_vector
{
public:
virtual_vector(const Container & p_container, MemberPtr p_member_ptr) :
m_container(&p_container),
m_member(p_member_ptr)
{}
private:
const Container * m_container;
MemberPtr m_member;
};
Next, implement the operator[] operator, since you mentioned that it's how you wanted to access your elements. The syntax for dereferencing a member pointer can be surprising at first.
template<class Container, class MemberPtr>
class virtual_vector
{
public:
virtual_vector(const Container & p_container, MemberPtr p_member_ptr) :
m_container(&p_container),
m_member(p_member_ptr)
{}
// Dispatch to the right get method
auto operator[](const size_t p_index) const
{
return (*m_container)[p_index].*m_member;
}
private:
const Container * m_container;
MemberPtr m_member;
};
To use this implementation, you would write something like this :
int main() {
std::vector<Elem> printme = { Elem(1,100), Elem(2,200), Elem(3,300) };
std::vector<int> order = { 2,0,1 };
virtual_vector<decltype(printme), decltype(&Elem::a)> X(printme, &Elem::a);
print_in_order(order, X);
}
This is a bit cumbersome since there is no template argument deduction happening. So lets add a free function to deduce the template arguments.
template<class Container, class MemberPtr>
virtual_vector<Container, MemberPtr>
make_virtual_vector(const Container & p_container, MemberPtr p_member_ptr)
{
return{ p_container, p_member_ptr };
}
The usage becomes :
int main() {
std::vector<Elem> printme = { Elem(1,100), Elem(2,200), Elem(3,300) };
std::vector<int> order = { 2,0,1 };
auto X = make_virtual_vector(printme, &Elem::a);
print_in_order(order, X);
}
If you want to support member functions, it's a little bit more complicated. First, the syntax to dereference a data member pointer is slightly different from calling a function member pointer. You have to implement two versions of the operator[] and enable the correct one based on the member pointer type. Luckily the standard provides std::enable_if and std::is_member_function_pointer (both in the <type_trait> header) which allow us to do just that. The member function pointer requires you to specify the arguments to pass to the function (non in this case) and an extra set of parentheses around the expression that would evaluate to the function to call (everything before the list of arguments).
template<class Container, class MemberPtr>
class virtual_vector
{
public:
virtual_vector(const Container & p_container, MemberPtr p_member_ptr) :
m_container(&p_container),
m_member(p_member_ptr)
{}
// For mapping to a method
template<class T = MemberPtr>
auto operator[](std::enable_if_t<std::is_member_function_pointer<T>::value == true, const size_t> p_index) const
{
return ((*m_container)[p_index].*m_member)();
}
// For mapping to a member
template<class T = MemberPtr>
auto operator[](std::enable_if_t<std::is_member_function_pointer<T>::value == false, const size_t> p_index) const
{
return (*m_container)[p_index].*m_member;
}
private:
const Container * m_container;
MemberPtr m_member;
};
To test this, I've added a getter to the Elem class, for illustrative purposes.
struct Elem {
int a, b;
int foo() const { return a; }
Elem(int a, int b) : a(a), b(b) {}
};
And here is how it would be used :
int main() {
std::vector<Elem> printme = { Elem(1,100), Elem(2,200), Elem(3,300) };
std::vector<int> order = { 2,0,1 };
{ // print member
auto X = make_virtual_vector(printme, &Elem::a);
print_in_order(order, X);
}
{ // print method
auto X = make_virtual_vector(printme, &Elem::foo);
print_in_order(order, X);
}
}
You've got a choice of two data structures
struct Employee
{
std::string name;
double salary;
long payrollid;
};
std::vector<Employee> employees;
Or alternatively
struct Employees
{
std::vector<std::string> names;
std::vector<double> salaries;
std::vector<long> payrollids;
};
C++ is designed with the first option as the default. Other languages such as Javascript tend to encourage the second option.
If you want to find mean salary, option 2 is more convenient. If you want to sort the employees by salary, option 1 is easier to work with.
However you can use lamdas to partially interconvert between the two. The lambda is a trivial little function which takes an Employee and returns a salary for him - so effectively providing a flat vector of doubles we can take the mean of - or takes an index and an Employees and returns an employee, doing a little bit of trivial data reformatting.
template<class F>
struct index_fake_t{
F f;
decltype(auto) operator[](std::size_t i)const{
return f(i);
}
};
template<class F>
index_fake_t<F> index_fake( F f ){
return{std::move(f)};
}
template<class F>
auto reindexer(F f){
return [f=std::move(f)](auto&& v)mutable{
return index_fake([f=std::move(f),&v](auto i)->decltype(auto){
return v[f(i)];
});
};
}
template<class F>
auto indexer_mapper(F f){
return [f=std::move(f)](auto&& v)mutable{
return index_fake([f=std::move(f),&v](auto i)->decltype(auto){
return f(v[i]);
});
};
}
Now, print in order can be rewritten as:
template <typename vectorlike>
void print(vectorlike const & printme) {
for (auto&& x:printme)
std::cout << x << std::endl;
}
template <typename vectorlike>
void print_in_order(std::vector<int> const& reorder, vectorlike const & printme) {
print(reindexer([&](auto i){return reorder[i];})(printme));
}
and printing .a as:
print_in_order( reorder, indexer_mapper([](auto&&x){return x.a;})(printme) );
there may be some typos.

Passing arguments to "array-like" container constructor

Background
I'm working with an embedded platform with the following restrictions:
No heap
No Boost libraries
C++11 is supported
I've dealt with the following problem a few times in the past:
Create an array of class type T, where T has no default constructor
The project only recently added C++11 support, and up until now I've been using ad-hoc solutions every time I had to deal with this. Now that C++11 is available, I thought I'd try to make a more general solution.
Solution Attempt
I copied an example of std::aligned_storage to come up with the framework for my array type. The result looks like this:
#include <type_traits>
template<class T, size_t N>
class Array {
// Provide aligned storage for N objects of type T
typename std::aligned_storage<sizeof(T), alignof(T)>::type data[N];
public:
// Build N objects of type T in the aligned storage using default CTORs
Array()
{
for(auto index = 0; index < N; ++index)
new(data + index) T();
}
const T& operator[](size_t pos) const
{
return *reinterpret_cast<const T*>(data + pos);
}
// Other methods consistent with std::array API go here
};
This is a basic type - Array<T,N> only compiles if T is default-constructible. I'm not very familiar with template parameter packing, but looking at some examples led me to the following:
template<typename ...Args>
Array(Args&&... args)
{
for(auto index = 0; index < N; ++index)
new(data + index) T(args...);
}
This was definitely a step in the right direction. Array<T,N> now compiles if passed arguments matching a constructor of T.
My only remaining problem is constructing an Array<T,N> where different elements in the array have different constructor arguments. I figured I could split this into two cases:
1 - User Specifies Arguments
Here's my stab at the CTOR:
template<typename U>
Array(std::initializer_list<U> initializers)
{
// Need to handle mismatch in size between arg and array
size_t index = 0;
for(auto arg : initializers) {
new(data + index) T(arg);
index++;
}
}
This seems to work fine, aside from needing to handle a dimension mismatch between the array and initializer list, but there are a number of ways to deal with that that aren't important. Here's an example:
struct Foo {
explicit Foo(int i) {}
};
void bar() {
// foos[0] == Foo(0)
// foos[1] == Foo(1)
// ..etc
Array<Foo,10> foos {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
}
2 - Arguments Follow Pattern
In my previous example, foos is initialized with an incrementing list, similar to std::iota. Ideally I'd like to support something like the following, where range(int) returns SOMETHING that can initialize the array.
// One of these should initialize foos with parameters returned by range(10)
Array<Foo,10> foosA = range(10);
Array<Foo,10> foosB {range(10)};
Array<Foo,10> foosC = {range(10)};
Array<Foo,10> foosD(range(10));
Googling has shown me that std::initializer_list isn't a "normal" container, so I don't think there's any way for me to make range(int) return a std::initializer_list depending on the function parameter.
Again, there are a few options here:
Parameters specified at run-time (function return?)
Parameters specified at compile-time (constexpr function return? templates?)
Questions
Are there any issues with this solution so far?
Does anyone have a suggestion to generate constructor parameters? I can't think of a solution at runtime or compile-time other than hard-coding an std::initializer_list, so any ideas are welcome.
If i understand your problem correctly, I've also stumbled across std::array's total inflexibility regarding element construction in favor of aggregate initialization (and an absense of statically-allocated container with flexible element contruction options). The best approach I came up with was creating a custom array-like container which accepts an iterator to construct it's elements.
This is totally flexible solution:
Works for both fixed-size and dynamic-sized containers
Can pass different or same parameters to element constructors
Can call constructors with one or multiple (tuple piecewise construction) arguments, or even different constructors for different elements (with inversion of control)
For your example it would be like:
const size_t SIZE = 10;
std::array<int, SIZE> params;
for (size_t c = 0; c < SIZE; c++) {
params[c] = c;
}
Array<Foo, SIZE> foos(iterator_construct, &params[0]); //iterator_construct is a special tag to call specific constructor
// also, we are able to pass a pointer as iterator, since it has both increment and dereference operators
Note: you can totally skip parameters array allocation here by using custom iterator class, which calculates it's value from it's position on-the-fly.
For multiple-argument constructor that would be:
const size_t SIZE = 10;
std::array<std::tuple<int, float>, SIZE> params; // will call Foo(int, float)
for (size_t c = 0; c < SIZE; c++) {
params[c] = std::make_tuple(c, 1.0f);
}
Array<Foo, SIZE> foos(iterator_construct, piecewise_construct, &params[0]);
Concrete implementation example is kinda big piece of code, so please let me know if you want more insights into implementation details besides the general idea - I will update my answer then.
I'd use a factory lambda.
The lambda takes a pointer to where to construct and an index, and is responsible for constructing.
This makes copy/move easy to write as well, which is a good sign.
template<class T, std::size_t N>
struct my_array {
T* data() { return (T*)&buffer; }
T const* data() const { return (T const*)&buffer; }
// basic random-access container operations:
T* begin() { return data(); }
T const* begin() const { return data(); }
T* end() { return data()+N; }
T const* end() const { return data()+N; }
T& operator[](std::size_t i){ return *(begin()+i); }
T const& operator[](std::size_t i)const{ return *(begin()+i); }
// useful utility:
bool empty() const { return N!=0; }
T& front() { return *begin(); }
T const& front() const { return *begin(); }
T& back() { return *(end()-1); }
T const& back() const { return *(end()-1); }
std::size_t size() const { return N; }
// construct from function object:
template<class Factory,
typename std::enable_if<!std::is_same<std::decay_t<Factory>, my_array>::value, int> =0
>
my_array( Factory&& factory ) {
std::size_t i = 0;
try {
for(; i < N; ++i) {
factory( (void*)(data()+i), i );
}
} catch(...) {
// throw during construction. Unroll creation, and rethrow:
for(std::size_t j = 0; j < i; ++j) {
(data()+i-j-1)->~T();
}
throw;
}
}
// other constructors, in terms of above naturally:
my_array():
my_array( [](void* ptr, std::size_t) {
new(ptr) T();
} )
{}
my_array(my_array&& o):
my_array( [&](void* ptr, std::size_t i) {
new(ptr) T( std::move(o[i]) );
} )
{}
my_array(my_array const& o):
my_array( [&](void* ptr, std::size_t i) {
new(ptr) T( o[i] );
} )
{}
my_array& operator=(my_array&& o) {
for (std::size_t i = 0; i < N; ++i)
(*this)[i] = std::move(o[i]);
return *this;
}
my_array& operator=(my_array const& o) {
for (std::size_t i = 0; i < N; ++i)
(*this)[i] = o[i];
return *this;
}
private:
using storage = typename std::aligned_storage< sizeof(T)*N, alignof(T) >::type;
storage buffer;
};
it defines my_array(), but that is only compiled if you try to compile it.
Supporting initializer list is relatively easy. Deciding what to do when the il isn't long enough, or too long, is hard. I think you might want:
template<class Fail>
my_array( std::initializer_list<T> il, Fail&& fail ):
my_array( [&](void* ptr, std::size_t i) {
if (i < il.size()) new(ptr) T(il[i]);
else fail(ptr, i);
} )
{}
which requires you pass in a "what to do on fail". We could default to throw by adding:
template<class WhatToThrow>
struct throw_when_called {
template<class...Args>
void operator()(Args&&...)const {
throw WhatToThrow{"when called"};
}
};
struct list_too_short:std::length_error {
list_too_short():std::length_error("list too short") {}
};
template<class Fail=ThrowWhenCalled<list_too_short>>
my_array( std::initializer_list<T> il, Fail&& fail={} ):
my_array( [&](void* ptr, std::size_t i) {
if (i < il.size()) new(ptr) T(il[i]);
else fail(ptr, i);
} )
{}
which if I wrote it right, makes a too-short initializer list cause a meaningful throw message. On your platform, you could just exit(-1) if you don't have exceptions.

call vector::size() on vector stored in boost::any

I have value stored in boost::any and I would like to have function which would return number of elements if boost::any holds std::vector.
Here is example of use:
int a = 42;
vector<int> v = {1,2,3,4};
vector<int> w;
boost::any aa = a;
boost::any av = v;
boost::any aw = w;
// I would like to have this function `count`
count( aa ) // return 1
count( av ) // return 4
count( aw ) // return 0
// I can do following. But I do not like the template argument.
count<int>( aa ) // return 1
count<int>( av ) // return 4
count<int>( aw ) // return 0
count<float>( aa ) // error
The problem is that I cant simply cast to vector<T> without specifying T. Is there a way around it?
A solution could be to use an intermediate container:
class vector_holder_base {
public:
virtual std::size_t size() = 0;
}
template <class T, class... Others>
class vector_holder : public vector_holder_base {
public:
vector_holder(const std::vector<T, Others...>& val) {...}
vector_holder(std::vector<T, Others...>&& val) {...}
vector_holder& operator=(const std::vector<T, Others...>& val) {...}
vector_holder& operator=(std::vector<T, Others...>&& val) {...}
std::size_t size() override {
return values.size();
}
private:
std::vector<T, Others...> values;
}
Then all you have to do is:
boost::any aa = vector_holder<int>(a);
std::size_t count = boost::any_cast<vector_holder_base>(aa).size();
As you can see with this trick you don't need to know the vector template type when you retrieve the size.
Yet, you need to think about multiple copies of your vector (when you pass it to a vector_holder, and then when the vector_holder is copied into a boost::any (think about move semantics).
std::vector<int> and std::vector<float> are unrelated types at runtime.
boost::any type erases copying and extraction to its own type (exactly), and no more.
If you want to take unrelated (runtime) types and type erase additional properties, you should examine boost.TypeErasure, or do such erasure yourself.
Alternatively, an augmented any (with type erased size) could work. Assuming C++11 support:
struct sized_any;
typedef std::size_t(sizer_t*)(sized_any const*)>;
template<class ValueType>
struct make_sizer {
sizer_t operator()() const {
return [](sized_any const*){return 1;}
}
};
template<class ValueType, class... Whatever>
struct make_sizer< std::vector<ValueType, Whatever...> > {
sizer_t operator()() const {
return [](sized_any const* n){
// convert n to a const std::vector<ValueType, Whatever...>*
// invoke .size()
}
}
};
struct sized_any : private boost::any {
sized_any( sized_any const& o ) = default;
sized_any( sized_any && o ) = default;
sized_any():boost::any(), size([](sized_any const*){return 0;}) {}
sized_any & operator=(const sized_any &) = default;
sized_any & operator=(sized_any &&) = default;
template<typename ValueType> sized_any(const ValueType &v):boost::any(v), sizer(make_sizer<ValueType>{}())
{}
template<typename ValueType> sized_any(ValueType &&v):boost::any(std::move(v)), sizer(make_sizer<ValueType>{}())
{}
template<typename ValueType> sized_any & operator=(const ValueType & v){
this->boost::any::operator=(v);
sizer=make_sizer<ValueType>{}();
return *this;
}
template<typename ValueType> sized_any & operator=(ValueType && v) {
this->boost::any::operator=(std::move(v));
sizer=make_sizer<ValueType>{}();
return *this;
}
~sized_any() = default;
// modifiers
sized_any & swap(sized_any & o) { this->boost::any::swap(o); std::swap( sizer, o.sizer ); }
std::size_t size() const { return sizer(this); }
private:
sizer_t sizer;
};
the private inheritance of boost::any is to block boost::any::swap from being called directly, or other functions that can change what type is stored in the boost::any. You have to reimplement/forward functions that operate on boost::any to operate on sized_any.
The basic design is simple. We maintain a boost::any, and whenever its type changes, we build a new function pointer that can extract the proper size from it. The sizer takes a pointer to our sized_any to make our copy/assignment operators easier, plus that is all the state it needs (so we don't need to store any state in the function pointer).
The above is not a complete implementation, but a sketch.
The extra state (the function pointer) has to be maintained independently, so stateless modification of the any isn't possible.

Generic way to cast int to enum in C++

Is there a generic way to cast int to enum in C++?
If int falls in range of an enum it should return an enum value, otherwise throw an exception. Is there a way to write it generically? More than one enum type should be supported.
Background: I have an external enum type and no control over the source code. I'd like to store this value in a database and retrieve it.
The obvious thing is to annotate your enum:
// generic code
#include <algorithm>
template <typename T>
struct enum_traits {};
template<typename T, size_t N>
T *endof(T (&ra)[N]) {
return ra + N;
}
template<typename T, typename ValType>
T check(ValType v) {
typedef enum_traits<T> traits;
const T *first = traits::enumerators;
const T *last = endof(traits::enumerators);
if (traits::sorted) { // probably premature optimization
if (std::binary_search(first, last, v)) return T(v);
} else if (std::find(first, last, v) != last) {
return T(v);
}
throw "exception";
}
// "enhanced" definition of enum
enum e {
x = 1,
y = 4,
z = 10,
};
template<>
struct enum_traits<e> {
static const e enumerators[];
static const bool sorted = true;
};
// must appear in only one TU,
// so if the above is in a header then it will need the array size
const e enum_traits<e>::enumerators[] = {x, y, z};
// usage
int main() {
e good = check<e>(1);
e bad = check<e>(2);
}
You need the array to be kept up to date with e, which is a nuisance if you're not the author of e. As Sjoerd says, it can probably be automated with any decent build system.
In any case, you're up against 7.2/6:
For an enumeration where emin is the
smallest enumerator and emax is the
largest, the values of the enumeration
are the values of the underlying type
in the range bmin to bmax, where bmin
and bmax are, respectively, the
smallest and largest values of the
smallest bit-field that can store emin
and emax. It is possible to define an
enumeration that has values not
defined by any of its enumerators.
So if you aren't the author of e, you may or may not have a guarantee that valid values of e actually appear in its definition.
Ugly.
enum MyEnum { one = 1, two = 2 };
MyEnum to_enum(int n)
{
switch( n )
{
case 1 : return one;
case 2 : return two;
}
throw something();
}
Now for the real question. Why do you need this? The code is ugly, not easy to write (*?) and not easy to maintain, and not easy to incorporate in to your code. The code it telling you that it's wrong. Why fight it?
EDIT:
Alternatively, given that enums are integral types in C++:
enum my_enum_val = static_cast<MyEnum>(my_int_val);
but this is even uglier that above, much more prone to errors, and it won't throw as you desire.
If, as you describe, the values are in a database, why not write a code generator that reads this table and creates a .h and .cpp file with both the enum and a to_enum(int) function?
Advantages:
Easy to add a to_string(my_enum) function.
Little maintenance required
Database and code are in synch
No- there's no introspection in C++, nor is there any built in "domain check" facility.
What do you think about this one?
#include <iostream>
#include <stdexcept>
#include <set>
#include <string>
using namespace std;
template<typename T>
class Enum
{
public:
static void insert(int value)
{
_set.insert(value);
}
static T buildFrom(int value)
{
if (_set.find(value) != _set.end()) {
T retval;
retval.assign(value);
return retval;
}
throw std::runtime_error("unexpected value");
}
operator int() const { return _value; }
private:
void assign(int value)
{
_value = value;
}
int _value;
static std::set<int> _set;
};
template<typename T> std::set<int> Enum<T>::_set;
class Apples: public Enum<Apples> {};
class Oranges: public Enum<Oranges> {};
class Proxy
{
public:
Proxy(int value): _value(value) {}
template<typename T>
operator T()
{
T theEnum;
return theEnum.buildFrom(_value);
}
int _value;
};
Proxy convert(int value)
{
return Proxy(value);
}
int main()
{
Apples::insert(4);
Apples::insert(8);
Apples a = convert(4); // works
std::cout << a << std::endl; // prints 4
try {
Apples b = convert(9); // throws
}
catch (std::exception const& e) {
std::cout << e.what() << std::endl; // prints "unexpected value"
}
try {
Oranges b = convert(4); // also throws
}
catch (std::exception const& e) {
std::cout << e.what() << std::endl; // prints "unexpected value"
}
}
You could then use code I posted here to switch on values.
You should not want something like what you describe to exist, I fear there are problems in your code design.
Also, you assume that enums come in a range, but that's not always the case:
enum Flags { one = 1, two = 2, four = 4, eigh = 8, big = 2000000000 };
This is not in a range: even if it was possible, are you supposed to check every integer from 0 to 2^n to see if they match some enum's value?
If you are prepared to list your enum values as template parameters you can do this in C++ 11 with varadic templates. You can look at this as a good thing, allowing you to accept subsets of the valid enum values in different contexts; often useful when parsing codes from external sources.
Perhaps not quite as generic as you'd like, but the checking code itself is generalised, you just need to specify the set of values. This approach handles gaps, arbitrary values, etc.
template<typename EnumType, EnumType... Values> class EnumCheck;
template<typename EnumType> class EnumCheck<EnumType>
{
public:
template<typename IntType>
static bool constexpr is_value(IntType) { return false; }
};
template<typename EnumType, EnumType V, EnumType... Next>
class EnumCheck<EnumType, V, Next...> : private EnumCheck<EnumType, Next...>
{
using super = EnumCheck<EnumType, Next...>;
public:
template<typename IntType>
static bool constexpr is_value(IntType v)
{
return v == static_cast<typename std::underlying_type<EnumType>::type>(V) || super::is_value(v);
}
EnumType convert(IntType v)
{
if (!is_value(v)) throw std::runtime_error("Enum value out of range");
return static_cast<EnumType>(v);
};
enum class Test {
A = 1,
C = 3,
E = 5
};
using TestCheck = EnumCheck<Test, Test::A, Test::C, Test::E>;
void check_value(int v)
{
if (TestCheck::is_value(v))
printf("%d is OK\n", v);
else
printf("%d is not OK\n", v);
}
int main()
{
for (int i = 0; i < 10; ++i)
check_value(i);
}
C++0x alternative to the "ugly" version, allows for multiple enums. Uses initializer lists rather than switches, a bit cleaner IMO. Unfortunately, this doesn't work around the need to hard-code the enum values.
#include <cassert> // assert
namespace // unnamed namespace
{
enum class e1 { value_1 = 1, value_2 = 2 };
enum class e2 { value_3 = 3, value_4 = 4 };
template <typename T>
int valid_enum( const int val, const T& vec )
{
for ( const auto item : vec )
if ( static_cast<int>( item ) == val ) return val;
throw std::exception( "invalid enum value!" ); // throw something useful here
} // valid_enum
} // ns
int main()
{
// generate list of valid values
const auto e1_valid_values = { e1::value_1, e1::value_2 };
const auto e2_valid_values = { e2::value_3, e2::value_4 };
auto result1 = static_cast<e1>( valid_enum( 1, e1_valid_values ) );
assert( result1 == e1::value_1 );
auto result2 = static_cast<e2>( valid_enum( 3, e2_valid_values ) );
assert( result2 == e2::value_3 );
// test throw on invalid value
try
{
auto result3 = static_cast<e1>( valid_enum( 9999999, e1_valid_values ) );
assert( false );
}
catch ( ... )
{
assert( true );
}
}