I've been given the task to refactor a bunch of C++ code that has a lot of math and not an explanation of what it does.
In order to do that I've setup a bunch of automated test that given random data compare old and new code results.
The thing is that, while it is simple to generate random vector of any size I have a lot of "struct" with many public fields (> 20) I'm a bit tired of writing custom function to fill them.
One can think of using some kind of script to parse the definition and autobuild the corresponding generator function.
Do you think this is a good idea ?
Is there anything like that already done?
If you have only Plain Old Data, a struct is, roughly, merely a blob of memory with some meaning to the compiler.
This means you can treat it as such, and simply fill it with random bytes, using a union:
struct a {
int i;
char c;
float f;
double d;
};
union u {
char arr[sizeof(a)];
a record;
};
char generateRandomChar(); // implement some random char generation
int main() {
u foo;
for (char& c : foo.arr) {
c = generateRandomChar();
}
std::cout << "i:" << foo.record.i
<< "\nc:" << foo.record.c
<< "\nf:" << foo.record.f
<< "\nd:" << foo.record.d;
}
See it live!
Technically, this is Undefined Behavior. In practice, it is well defined in most compilers.
Related
Consider the following structure:
struct S
{
int a;
int b;
double arr[0];
} __attribute__((packed));
As you can see, this structure is packed and has Zero sized array at the end.
I'd like to send this as binary data over the network (assume I took care of endianity).
In C/C++ I could just use malloc to allocate as much space as I want and use free later.
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
Sure, there is:
shared_ptr<S> make_buffer(size_t s)
{
auto buffer = malloc(s); // allocate as usual
auto release = [](void* p) { free(p); }; // a deleter
shared_ptr<void> sptr(buffer, release); // make it shared
return { sptr, new(buffer) S }; // an aliased pointer
}
This works with any objects that are placed in a malloced buffer, not just when there are zero-sized arrays, provided that the destructor is trivial (performs no action) because it is never called.
The usual caveats about zero-sized arrays and packed structures still apply as well, of course.
double arr[0];
} __attribute__((packed));
Zero sized arrays are not allowed as data members (nor as any other variable) in C++. Furthermore, there is no such attribute as packed in C++; it is a language extension (as such, it may be considered to be a special hack). __attribute__ itself is a language extension. The standard syntax for function attributes uses nested square brackets like this: [[example_attribute]].
I'd like to send this as binary data over the network
You probably should properly serialise the data. There are many serialisation specifications although none of them is universally ubiquitous and none of them is implemented in the C++ standard library. Indeed, there isn't a standard API for network commnication either.
A straightforward solution is to pick an existing serialisation format and use an existing library that implements it.
First, let me explain again why I have this packed structure:
it is used for serialization of data over the network so there's a header file with all network packet structures.
I know it generates bad assembly due to alignment issues, but I guess that this problem persists with regular serialization (copy to char * buffer with memcpy).
Zero size arrays are supported both by gcc and clang which I use.
Here's an example of a full program with a solution to my question and it's output (same output for gcc and g++).
compiled with -O3 -std=c++17 flags
#include <iostream>
#include <memory>
#include <type_traits>
#include <cstddef>
struct S1
{
~S1() {std::cout << "deleting s1" << std::endl;}
char a;
int b;
int c[0];
} __attribute__((packed));
struct S2
{
char a;
int b;
int c[0];
};
int main(int argc, char **argv)
{
auto s1 = std::shared_ptr<S1>(static_cast<S1 *>(::operator
new(sizeof(S1) + sizeof(int) * 1e6)));
std::cout << "is standart: " << std::is_standard_layout<S1>::value << std::endl;
for (int i = 0; i < 1e6; ++i)
{
s1->c[i] = i;
}
std::cout << sizeof(S1) << ", " << sizeof(S2) << std::endl;
std::cout << offsetof(S1, c) << std::endl;
std::cout << offsetof(S2, c) << std::endl;
return 0;
}
This is the output:
is standart: 1
5, 8
5
8
deleting s1
Is there anything wrong with doing this?
I made sure using valgrind all allocations/frees work properly.
Let's assume that A and B are two classes (or structures) having no inheritance relationships (thus, object slicing cannot work). I also have an object b of the type B. I would like to interpret its binary value as a value of type A:
A a = b;
I could use reinterpret_cast, but I would need to use pointers:
A a = reinterpret_cast<A>(b); // error: invalid cast
A a = *reinterpret_cast<A *>(&b); // correct [EDIT: see *footnote]
Is there a more compact way (without pointers) that does the same? (Including the case where sizeof(A) != sizeof(B))
Example of code that works using pointers: [EDIT: see *footnote]
#include <iostream>
using namespace std;
struct C {
int i;
string s;
};
struct S {
unsigned char data[sizeof(C)];
};
int main() {
C c;
c.i = 4;
c.s = "this is a string";
S s = *reinterpret_cast<S *>(&c);
C s1 = *reinterpret_cast<C *>(&s);
cout << s1.i << " " << s1.s << endl;
cout << reinterpret_cast<C *>(&s)->i << endl;
return 0;
}
*footnote: It worked when I tried it, but it is actually an undefined behavior (which means that it may work or not) - see comments below
No. I think there's nothing in the C++ syntax that allows you to implicitly ignore types. First, that's against the notion of static typing. Second, C++ lacks standardization at binary level. So, whatever you do to trick the compiler about the types you're using might be specific to a compiler implementation.
That being said, if you really wanna do it, you should check how your compiler's data alignment/padding works (i.e.: struct padding in c++) and if there's a way to control it (i.e.: What is the meaning of "__attribute__((packed, aligned(4))) "). If you're planning to do this across compilers (i.e.: with data transmitted across the network), then you should be extra careful. There are also platform issues, like different addressing models and endianness.
Yes, you can do it without a pointer:
A a = reinterpret_cast<A &>(b); // note the '&'
Note that this may be undefined behaviour. Check out the exact conditions at http://en.cppreference.com/w/cpp/language/reinterpret_cast
struct student {
string name;
int age;
};
int main() {
student a1;
cout << a1[0] << endl; //Access the first variable of the struct
cout << a2[1] << endl; //Access the first variable of the struct
}
How could I access and retrieve value from the C++ struct using index instead of using "a1.name" ??
One way to do this is by creating a tuple from the member variables and using std::tie to get at the member by index. The index would have to be known at compile time however. You could wrap this inside a member function of your struct:
#include <tuple>
#include <iostream>
struct student {
std::string name;
int age;
template<size_t I>
auto& get() {
return std::get<I>(std::tie(name, age));
}
};
int main() {
student boy{ "Paul", 12 };
std::cout << "Name: " << boy.get<0>() << " Age: " << boy.get<1>() << std::endl;
//Change members
boy.get<0>() = "John";
boy.get<1>() = 14;
std::cout << "Name: " << boy.get<0>() << " Age: " << boy.get<1>() << std::endl;
}
Demo
(Requires at least C++14)
In C++11, since it doesn't have automatic return type deduction unless specified, you could use std::tuple_element to specify the return type instead:
#include <tuple>
#include <iostream>
struct student {
std::string name;
int age;
template<size_t I>
using T = typename std::tuple_element<I, std::tuple<std::string, int>>::type;
template<size_t I>
T<I>& get()
{
return std::get<I>(std::tie(name, age));
}
};
int main() {
student boy{ "Paul", 12 };
std::cout << "Name: " << boy.get<0>() << " Age: " << boy.get<1>() << std::endl;
//Change members
boy.get<0>() = "John";
boy.get<1>() = 14;
std::cout << "Name: " << boy.get<0>() << " Age: " << boy.get<1>() << std::endl;
}
Demo
You can't. At least not in the direct manner you want to do it and without partially redefining what a structure is. I will split my answer into two parts the first one explaining possible ways to get at least close to what you want and the second one explaining what you actually should do:
Getting down and dirty
There are two ways (that I can currently come up with) that might give you something to think about:
Use a wrapper class - while C++ does increase the flexibility of structure it doesn't change their purpose of a simple heterogeneous data container. It does however allow operator overloading including the [] operator. So if you create a class that contains the structure as its member (that is it wraps around the structure), you can expose the structure's data using []. This comes as close to what you want to do as possible. It does however defeat the whole purpose of using a struct since you can do that with just plain non-sturct class members but I have actually seen it not so long time ago when I was going through a C++ library that was wrapping a previous C-based version of itself in order to provide more modern features without the need of completely rewriting the C code.
Use pointer with an offset - using indexing generally suggest that the underlying container has a consistency when it comes to the blocks of data it contains. The problem is that a structure doesn't necessarily obey this since it can contain (just like in your case) multiple types of data. If you can sacrifice the heterogeneity of your structure and stick with a single data type (for example one or more doubles), you can safely use (up to the point that you have to always remember the number of members the structure has) a pointer and an increasing/decreasing offset to access its members. Just like with any sort of data when you create a standard reference (aka pointer) to something, that reference points at the address of the beginning of the memory this data is using. It is a common practice to use pointers to iterate through arrays and it works exactly like that - create a reference to your structure and the add +1, +2, ... (as many members that struct has). This makes things overly complicated though and is prone to error. As mentioned it also requires using the same type of data inside your structure. However you can also create a set of functions that handle (internally) the offsets. But this idea is similar to the class wrapper I have proposed above.
The alternatives ...
From what you have given as information I think you are looking for a completely different type of data - a dictionary, map or a list that contains some sort of custom generic data container that can hold any type of data but also stores that data's type in order to allow recasting it to its original state. Many libraries provide such containers for example Qt has the QVariant (part of the core module), boost has the boost::variant, std::tuple (or even better - named tuples) provided with your standard C++ (since C++11) and so on. I can speak about Qt in greater detail since I have more experience with it. It offers the QVariantList (a typedef for QList<QVariant>) which allows indexing. Of course all this requires you to 1)abandon your structure-thing and 2)use some more advanced containers that may or may not introduce huge drawbacks on whatever you are working on including licensing issues, memory overhead, larger binaries, handling a lot of extra library files etc.
How to access C++ struct property value using index?
You can not. C++ language has no feature that would allow this. This could be possible in a language that supports (static) reflection.
You could choose to use a std::tuple instead, which does allow indexed member access, but that's a step down in readability since you don't get to name the members.
I tried to stay as close to your example as possible but I did have to convert the age from int to string. This works and I have found it useful in one application.
struct student
{
std::string name, age;
std::string *elemtnPtr[10];
student()
{
int i=0;
elemtnPtr[i++] = &name;
elemtnPtr[i++] = &age;
}
};
void demo()
{
student a1;
a1.name = "This Works";
a1.age = "99";
std::cout << *a1.elemtnPtr[0] << std::endl;
std::cout << *a1.elemtnPtr[1] << std::endl;
}
You cannot.
Not until reflection has been introduced in C++, which should (I hope) be the case in C++20.
Some projects introduce tuples enhanced with names, but it still not real structs.
Consider the trivial test of this swap function in C++ which uses pass by pointer.
#include <iostream>
using std::cout;
using std::endl;
void swap_ints(int *a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
return;
}
int main(void)
{
int a = 1;
int b = 0;
cout << "a = " << a << "\t" << "b = " << b << "\n\n";
swap_ints(&a, &b);
cout << "a = " << a << "\t" << "b = " << b << endl;
return 0;
}
Does this program use more memory than if I had passed by address? Such as in this function decleration:
void swap_ints(int &a, int &b)
{
int temp = a;
a = b;
b = temp;
return;
}
Does this pass-by-reference version of the C++ function use less memory, by not needing to create the pointer variables?
And does C not have this "pass-by-reference" ability the same that C++ does? If so, then why not, because it means more memory efficient code right? If not, what is the pitfall behind this that C does not adopt this ability. I suppose what I am not consider is the fact that C++ probably creates pointers to achieve this functionality behind the scenes. Is this what the compiler actually does -- and so C++ really does not have any true advantage besides neater code?
The only way to be sure would be to examine the code the compiler generated for each and compare the two to see what you get.
That said, I'd be a bit surprised to see a real difference (at least when optimization was enabled), at least for a reasonably mainstream compiler. You might see a difference for a compiler on some really tiny embedded system that hasn't been updated in the last decade or so, but even there it's honestly pretty unlikely.
I should also add that in most cases I'd expect to see code for such a trivial function generated inline, so there was on function call or parameter passing involved at all. In a typical case, it's likely to come down to nothing more than a couple of loads and stores.
Don't confuse counting variables in your code with counting memory used by the processor. C++ has many abstractions that hide the inner workings of the compiler in order to make things simpler and easier for a human to follow.
By design, C does not have quite as many levels of abstractions as C++.
I've been reading about strict aliasing quite a lot lately. The C/C++ standards say that the following code is invalid (undefined behavior to be correct), since the compiler might have the value of a cached somewhere and would not recognize that it needs to update the value when I update b;
float *a;
...
int *b = reinterpret_cast<int*>(a);
*b = 1;
The standard also says that char* can alias anything, so (correct me if I'm wrong) compiler would reload all cached values whenever a write access to a char* variable is made. Thus the following code would be correct:
float *a;
...
char *b = reinterpret_cast<char*>(a);
*b = 1;
But what about the cases when pointers are not involved at all? For example, I have the following code, and GCC throws warnings about strict aliasing at me.
float a = 2.4;
int32_t b = reinterpret_cast<int&>(a);
What I want to do is just to copy raw value of a, so strict aliasing shouldn't apply. Is there a possible problem here, or just GCC is overly cautious about that?
EDIT
I know there's a solution using memcpy, but it results in code that is much less readable, so I would like not to use that solution.
EDIT2
int32_t b = *reinterpret_cast<int*>(&a); also does not work.
SOLVED
This seems to be a bug in GCC.
If you want to copy some memory, you could just tell the compiler to do that:
Edit: added a function for more readable code:
#include <iostream>
using std::cout; using std::endl;
#include <string.h>
template <class T, class U>
T memcpy(const U& source)
{
T temp;
memcpy(&temp, &source, sizeof(temp));
return temp;
}
int main()
{
float f = 4.2;
cout << "f: " << f << endl;
int i = memcpy<int>(f);
cout << "i: " << i << endl;
}
[Code]
[Updated Code]
Edit: As user/GMan correctly pointed out in the comments, a full-featured implementation could check that T and U are PODs. However, given that the name of the function is still memcpy, it might be OK to rely on your developers treating it as having the same constraints as the original memcpy. That's up to your organization. Also, use the size of the destination, not the source. (Thanks, Oli.)
Basically the strict aliasing rules is "it is undefined to access memory with another type than its declared one, excepted as array of characters". So, gcc isn't overcautious.
If this is something you need to do often, you can also just use a union, which IMHO is more readable than casting or memcpy for this specific purpose:
union floatIntUnion {
float a;
int32_t b;
};
int main() {
floatIntUnion fiu;
fiu.a = 2.4;
int32_t &x = fiu.b;
cout << x << endl;
}
I realize that this doesn't really answer your question about strict-aliasing, but I think this method makes the code look cleaner and shows your intent better.
And also realize that even doing the copies correctly, there is no guarantee that the int you get out will correspond to the same float on other platforms, so count any network/file I/O of these floats/ints out if you plan to create a cross-platform project.