Can I pass this through a function? - c++

I'm wrapping a scripting library and this macro exists.
#define asOFFSET(s,m) ((size_t)(&reinterpret_cast<s*>(100000)->m)-100000)
what type is m? It has the example:
struct MyStruct
{
int a;
};
asOFFSET(MyStruct,a)
I want to put this into a function.

The macro is (most likely) used by the scripting library to find out the internal layout of a class's members without making assumptions about its type, architecture or inheritance model.
(A simple example is discussed here).
For most C++ programs, this information (memory layout) should ideally not be needed at all. But in the off chance that you do need it (e.g. if you're writing an analyser / debugger), you would be better off retaining this macro as-is (or preferably replacing its usage in your code with offsetof as Michael Anderson points out.) There are compiler-specific implementations which are
more efficient
less likely to be reported as performing an invalid operation (e.g. dereferencing an invalid memory address when using tools like Valgrind).
With these equivalent options, a hand-spun alternative or wrapper should ideally not be needed.

This is an implementation of the offsetof macro. m is any member of s. It doesn't have a corresponding C or C++ type - but is closely related to the concept of pointer to member.

The purpose of the macro is to determine, given any name of a struct and any name of a member of that struct, the distance in memory from the beginning of an arbitrary instance of the struct, and the location of that member in the same instance.
m does not have a "type", and neither does s. The entire concept of "type" goes out the window when you use macros. This stuff simply is not C++; it's basically a completely separate language that's used to edit C++ code in-place. When the preprocessor runs, asOFFSET(MyStruct, a) will be literally replaced with the text ((size_t)(&reinterpret_cast<MyStruct*>(100000)->a)-100000) before the compiler even begins its work.
((size_t)(&reinterpret_cast<MyStruct*>(100000)->a)-100000) is intended to evaluate to 0, because the a member of MyStruct instances appears at the beginning of each instance. I'm not actually 100% sure that this is legal behaviour per the specification, but the intent is as follows:
Pretend that there is an instance of MyStruct at the memory location 100000, by treating the number 100000 as if it were a pointer to a MyStruct.
Get the address in memory of the a member of this fake struct, and subtract 100000 again. That gives us the distance from the beginning of the fake struct to the specified member of that fake struct.
Cast that numeric value back to size_t (the numeric type used for measuring memory allocations, an unsigned integer type).

Related

Flex/Bison: cannot use semantic_type

I try to create a c++ flex/bison parser. I used this tutorial as a starting point and did not change any bison/flex configurations. I am stuck now to the point of trying to unit test the lexer.
I have a function in my unit tests that directly calls yylex, and checks the result of it:
private: static void checkIntToken(MyScanner &scanner, Compiler *comp, unsigned long expected, unsigned char size, char isUnsigned, unsigned int line, const std::string &label) {
yy::MyParser::location_type loc;
yy::MyParser::semantic_type semantic; // <---- is seems like the destructor of this variable causes the crash
int type = scanner.yylex(&semantic, &loc, comp);
Assert::equals(yy::MyParser::token::INT, type, label + "__1");
MyIntToken* token = semantic.as<MyIntToken*>();
Assert::equals(expected, token->value, label + "__2");
Assert::equals(size, token->size, label + "__3");
Assert::equals(isUnsigned, token->isUnsigned, label + "__4");
Assert::equals(line, loc.begin.line, label + "__5");
//execution comes to this point, and then, program crashes
}
The error message is:
program: ../src/__autoGenerated__/MyParser.tab.hh:190: yy::variant<32>::~variant() [S = 32]: Assertion `!yytypeid_' failed.
I have tried to follow the logic in the auto-generated bison files, and make some sense out of it. But I did not succeed on that and ultimately gave up. I searched then for any advice on the web about this error message but did not find any.
The location indicated by the error has the following code:
~variant (){
YYASSERT (!yytypeid_);
}
EDIT: The problem disappears only if I remove the
%define parse.assert
option from the bison file. But I am not sure if this is a good idea...
What is the proper way to obtain the value of the token generated by flex, for unit testing purposes?
Note: I've tried to explain bison variant types to the best of my knowledge. I hope it is accurate but I haven't used them aside from some toy experiments. It would be an error to assume that this explanation in any way implies an endorsement of the interface.
The so-called "variant" type provided by bison's C++ interface is not a general-purpose variant type. That was a deliberate decision based on the fact that the parser is always able to figure out the semantic type associated with a semantic value on the parser stack. (This fact also allows a C union to be used safely within the parser.) Recording type information within the "variant" would therefore be redundant. So they don't. In that sense, it is not really a discriminated union, despite what one might expect of a type named "variant".
(The bison variant type is a template with an integer (non-type) template argument. That argument is the size in bytes of the largest type which is allowed in the variant; it does not in any other way specify the possible types. The semantic_type alias serves to ensure that the same template argument is used for every bison variant object in the parser code.)
Because it is not a discriminated union, its destructor cannot destruct the current value; it has no way to know how to do that.
This design decision is actually mentioned in the (lamentably insufficient) documentation for the Bison "variant" type. (When reading this, remember that it was originally written before std::variant existed. These days, it would be std::variant which was being rejected as "redundant", although it is also possible that the existence of std::variant might have had the happy result of revisiting this design decision). In the chapter on C++ Variant Types, we read:
Warning: We do not use Boost.Variant, for two reasons. First, it appeared unacceptable to require Boost on the user’s machine (i.e., the machine on which the generated parser will be compiled, not the machine on which bison was run). Second, for each possible semantic value, Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already “knows” the type of the semantic value, so that would be duplicating the information.
Therefore we developed light-weight variants whose type tag is external (so they are really like unions for C++ actually).
And indeed they are. So any use of a bison "variant" must have a definite type:
You can build a variant with an argument of the type to build. (This is the only case where you don't need a template parameter, because the type is deduced from the argument. You would have to use an explicit template parameter only if the argument were not of the precise type; for example, an integer of lesser rank.)
You can get a reference to the value of known type T with as<T>. (This is undefined behaviour if the value has a different type.)
You can destruct the value of known type T with destroy<T>.
You can copy or move the value from another variant of known type T with copy<T> or move<T>. (move<T> involves constructing and then destructing a T(), so you might not want to do it if T had an expensive default constructor. On the whole, I'm not convinced by the semantics of the move method. And its name conflicts semantically with std::move, but again it came first.)
You can swap the values of two variants which both have the same known type T with swap<T>.
Now, the generated parser understands all these restrictions, and it always knows the real types of the "variants" it has at its disposal. But you might come along and try to do something with one of these objects in a way that violates a constraint. Since the object really doesn't have any way to check the constraint, you'll end up with undefined behaviour which will probably have some disastrous eventual consequence.
So they also implemented an option which allows the "variant" to check the constraints. Unsurprisingly, this consists of adding a discriminator. But since the discriminator is only used to validate and not to modify behaviour, it is not a small integer which chooses between a small number of known alternatives, but rather a pointer to a std::typeid (or NULL if the variant does not yet contain a value.) (To be fair, in most cases alignment constraints mean that using a pointer for this purpose is no more expensive than using a small enum. All the same...)
So that's what you're running into. You enabled assertions with %define parse.assert; that option was provided specifically to prevent you from doing what you are trying to do, which is let the variant object's destructor run before the variant's value is explicitly destructed.
So the "correct" way to avoid the problem is to insert an explicit call at the end of the scope:
// execution comes to this point, and then, without the following
// call, the program will fail on an assertion
semantic.destroy<MyIntType*>();
}
With the parse assertion enabled, the variant object will be able to verify that the types specified as template parameters to semantic.as<T> and semantic.destroy<T> are the same types as the value stored in the object. (Without parse.assert, that too is your responsibility.)
Warning: opinion follows.
In case anyone reading this cares, my preference for using real std::variant types comes from the fact that it is actually quite common for the semantic value of an AST node to require a discriminated union. The usual solution (in C++) is to construct a type hierarchy which is, in some ways, entirely artificial, and it is quite possible that std::variant can better express the semantics.
In practice, I use the C interface and my own discriminated union implementation.

Access to protected member through member-pointer: is it a hack?

We all know members specified protected from a base class can only be accessed from a derived class own instance. This is a feature from the Standard, and this has been discussed on Stack Overflow multiple times:
Cannot access protected member of another instance from derived type's scope
;
Why can't my object access protected members of another object defined in common base class?
And others.
But it seems possible to walk around this restriction with member pointers, as user chtz has shown me:
struct Base { protected: int value; };
struct Derived : Base
{
void f(Base const& other)
{
//int n = other.value; // error: 'int Base::value' is protected within this context
int n = other.*(&Derived::value); // ok??? why?
(void) n;
}
};
Live demo on coliru
Why is this possible, is it a wanted feature or a glitch somewhere in the implementation or the wording of the Standard?
From comments emerged another question: if Derived::f is called with an actual Base, is it undefined behaviour?
The fact that a member is not accessible using class member access expr.ref (aclass.amember) due to access control [class.access] does not make this member inaccessible using other expressions.
The expression &Derived::value (whose type is int Base::*) is perfectly standard compliant, and it designates the member value of Base. Then the expression a_base.*p where p is a pointer to a member of Base and a_base an instance of Base is also standard compliant.
So any standard compliant compiler shall make the expression other.*(&Derived::value); defined behavior: access the member value of other.
is it a hack?
In similar vein to using reinterpret_cast, this can be dangerous and may potentially be a source of hard to find bugs. But it's well formed and there's no doubt whether it should work.
To clarify the analogy: The behaviour of reinterpret_cast is also specified exactly in the standard and can be used without any UB. But reinterpret_cast circumvents the type system, and the type system is there for a reason. Similarly, this pointer to member trick is well formed according to the standard, but it circumvents the encapsulation of members, and that encapsulation (typically) exists for a reason (I say typically, since I suppose a programmer can use encapsulation frivolously).
[Is it] a glitch somewhere in the implementation or the wording of the Standard?
No, the implementation is correct. This is how the language has been specified to work.
Member function of Derived can obviously access &Derived::value, since it is a protected member of a base.
The result of that operation is a pointer to a member of Base. This can be applied to a reference to Base. Member access privileges does not apply to pointers to members: It applies only to the names of the members.
From comments emerged another question: if Derived::f is called with an actual Base, is it undefined behaviour?
Not UB. Base has the member.
Just to add to the answers and zoom in a bit on the horror I can read between your lines. If you see access specifiers as 'the law', policing you to keep you from doing 'bad things', I think you are missing the point. public, protected, private, const ... are all part of a system that is a huge plus for C++. Languages without it may have many merits but when you build large systems such things are a real asset.
Having said that: I think it's a good thing that it is possible to get around almost all the safety nets provided to you. As long as you remember that 'possible' does not mean 'good'. This is why it should never be 'easy'. But for the rest - it's up to you. You are the architect.
Years ago I could simply do this (and it may still work in certain environments):
#define private public
Very helpful for 'hostile' external header files. Good practice? What do you think? But sometimes your options are limited.
So yes, what you show is kind-of a breach in the system. But hey, what keeps you from deriving and hand out public references to the member? If horrible maintenance problems turn you on - by all means, why not?
Basically what you're doing is tricking the compiler, and this is supposed to work. I always see this kind of questions and people some times get bad results and some times it works, depending on how this converts to assembler code.
I remember seeing a case with a const keyword on a integer, but then with some trickery the guy was able to change the value and successfully circumvented the compiler's awareness. The result was: A wrong value for a simple mathematical operation. The reason is simple: Assembly in x86 does make a distinction between constants and variables, because some instructions do contain constants in their opcode. So, since the compiler believes it's a constant, it'll treat it as a constant and deal with it in an optimized way with the wrong CPU instruction, and baam, you have an error in the resulting number.
In other words: The compiler will try to enforce all the rules it can enforce, but you can probably eventually trick it, and you may or may not get wrong results based on what you're trying to do, so you better do such things only if you know what you're doing.
In your case, the pointer &Derived::value can be calculated from an object by how many bytes there are from the beginning of the class. This is basically how the compiler accesses it, so, the compiler:
Doesn't see any problem with permissions, because you're accessing value through derived at compile-time.
Can do it, because you're taking the offset in bytes in an object that has the same structure as derived (well, obviously, the base).
So, you're not violating any rules. You successfully circumvented the compilation rules. You shouldn't do it, exactly because of the reasons described in the links you attached, as it breaks OOP encapsulation, but, well, if you know what you're doing...

How does c++11 resolve constexpr into assembly?

The basic question:
Edit: v-The question-v
class foo {
public:
constexpr foo() { }
constexpr int operator()(const int& i) { return int(i); }
}
Performance is a non-trivial issue. How does the compiler actually compile the above? I know how I want it to be resolved, but how does the specification actually specify it will be resolved?
1) Seeing the type int has a constexpr constructor, create a int object and compile the string of bytes that make the type from memory into the code directly?
2) Replace any calls to the overload with a call to the 'int's constructor that for some unknown reason int doesn't have constexpr constructors? (Inlining the call.)
3) Create a function, call the function, and have that function call 'int's consctructor?
Why I want to know, and how I plan to use the knowledge
edit:v-Background only-v
The real library I'm working with uses template arguments to decide how a given type should be passed between functions. That is, by reference or by value because the exact size of the type is unknown. It will be a user's responsibility to work within the limits I give them, but I want these limits to be as light and user friendly as I can sanely make them.
I expect a simple single byte character to be passed around in which case it should be passed by value. I do not bar 300mega-byte behemoth that does several minuets of recalculation every time a copy constructor is invoked. In which case passing by reference makes more sense. I have only a list of requirements that a type must comply with, not set cap on what a type can or can not do.
Why I want to know the answer to my question is so I can in good faith make a function object that accepts this unknown template, and then makes a decision how, when, or even how much of a object should be copied. Via a virtual member function and a pointer allocated with new is so required. If the compiler resolves constexpr badly I need to know so I can abandon this line of thought and/or find a new one. Again, It will be a user's responsibility to work within the limits I give them, but I want these limits to be as light and user friendly as I can sanely make them.
Edit: Thank you for your answers. The only real question was the second sentence. It has now been answered. Everything else If more background is required, Allow me to restate the above:
I have a template with four argument. The goal of the template is a routing protocol. Be that TCP/IP -unlikely- or node to node within a game -possible. The first two are for data storage. They have no requirement beyond a list of operators for each. The last two define how the data is passed within the template. By default this is by reference. For performance and freedom of use, these can be changed define to pass information by value at a user's request.
Each is expect to be a single byte long. They could in the case of metric for a EIGRP or OSFP like protocol the second template argument could be the compound of a dozen or more different variable. Each taking a non-trival time to copy or recompute.
For ease of use I investigate the use a function object that accepts the third and fourth template to handle special cases and polymorphic classes that would fail to function or copy correctly. The goal to not force a user to rebuild their objects from scratch. This would require planning for virtual function to preform deep copies, or any number of other unknown oddites. The usefulness of the function object depends on how sanely a compiler can be depended on not generate a cascade of function calls.
More helpful I hope?
The C++11 standard doesn't say anything about how constexpr will be compiled down to machine instructions. The standard just says that expressions that are constexpr may be used in contexts where a compile time constant value is required. How any particular compiler chooses to translate that to executable code is an implementation issue.
Now in general, with optimizations turned on you can expect a reasonable compiler to not execute any code at runtime for many uses of constexpr but there aren't really any guarantees. I'm not really clear on what exactly you're asking about in your example so it's hard to give any specifics about your use case.
constexpr expressions are not special. For all intents and purposes, they're basically const unless the context they're used in is constexpr and all variables/functions are also constexpr. It is implementation defined how the compiler chooses to handle this. The Standard never deals with implementation details because it speaks in abstract terms.

Does the "type" of a struct change from computer to computer?

Let's assume I have this code;
class Ingredients{
public:
Ingredients(int size,string name);
int getsize();
private:
string name;
int size;
};
struct Chain{
Ingredients* ing;
Chain* next;
}
And in my main;
int main()
{
cout<<typeid(Chain).name()<<endl;
cout<<typeid(Chain->ing).name()<<endl;
cout<<typeid(Chain->next).name()<<endl;
}
my headers are;
#include <iostream>
#include <typeinfo>
using namespace std;
and finally outputs;
P8Chain
P12Ingredients
P8Chain
so my question is, will this types are reliable for using it in a code? If the types are changing (because of the P8 and P12 things I am not sure it would be the same) from computer to comp. this types wouldn't be reliable. What are your opinions?
Also they are not changing on every run.
They depend on your compiler, so don't use them inside your code.
The C++ standard says the following concerning typeid (section 5.2.8):
The result of a typeid expression is an lvalue of static type const std::type_info and dynamic type const std::type_info or const name where name is an implementation-defined class derived from std::type_info.
What you can do if you want some sort of RTTI is
if (typeid(myobject) == typeid(Chain)) {
do_something();
}
It depends on what you mean by "type". The more or less standard definition of type is the set of values and operations the type can take, and this will change from one machine to the next, because the size of int will change, or the maximum length of a string. On the other hand, there is a very real sense that type is what the compiler and the C++ standard consider it to be, which very roughly would correspond to, or at least be identified by the scoped name. Finally, the std::type_info::name() function is seriously underspecified. At best, it can be useful for debugging (e.g. logging the actual derived class a function was called with), and not all compilers provide even that. As far as the standard is concerned, a compiler could always return an empty string, and still be conform.
According to the standards the name() is implementation defined (as well as according to Stroustrup book - see p 415 on 3th edition)
The word "type" has multiple meanings.
In terms of type theory, a C++ struct doesn't really define a type at all.
More usefully, the C++ language standard talks about types in a way that can be taken rigorously, even if it never quite rigorously defines the term. In those terms, the struct declaration does define a unique and consistent type.
Maybe even more usefully, to your C++ compiler (and C linker), a type is represented by things like a memory layout, a mangled name, a list of member names and types, pointers to special and normal member functions, possibly pointers to vtable and/or rtti info, etc. This will not be the same from implementation to implementation. Between builds with the same implementation, some details (like where the pointers point) may change even if no relevant code changes, but you could probably define a useful subset of information that you could usefully call a "type" that doesn't change.
Beyond that, the type_info instance defined by section 18.5.1 of the standard cannot change within the bounds of what's explicitly defined, and the result of typeid as defined by section 5.2.8 as to be that instance or a compatible object that still compares equal to it. So, it sounds to me like, if it were possible to load up the type_info instances from two different runs at the same time, operator== would have to return true. However, it's not actually possible to load up type_info instances from two different runs (there's no requirement that they be serializable in any way, for example), so this may not be relevant.
Finally, the name displayed by typeid().name(), as defined by section 18.5.1, is just any implementation-defined NTBS. It could change between builds, runs, even calls within the same run. It could always be empty. Practically, it'll often be something vaguely useful for debugging, but that isn't guaranteed—and, even if it were, that wouldn't help, because "vaguely useful for debugging" doesn't have to mean "unique within a run and persistent across runs".
If you're asking about a specific compiler, the documentation for the compiler may give stricter guarantees than the standard requires. For example, I believe that on various platforms g++ guarantees that it'll use the C++ ABI defined at CodeSourcery, and will not change ABI versions within minor compiler versions, and will use the mangled names defined in the ABI as the type_info names. This means taking the binary to another computer won't affect the names, and even recompiling the source on another computer with the same platform and g++ version won't affect the names.

Casting big POD into small POD - guaranteed to work?

Suppose I've a POD struct which has more than 40 members. These members are not built-in types, rather most of them are POD structs, which in turn has lots of members, most of which are POD struct again. This pattern goes up to many levels - POD has POD has POD and so on - up to even 10 or so levels.
I cannot post the actual code, so here is one very simple example of this pattern:
//POD struct
struct Big
{
A a[2]; //POD has POD
B b; //POD has POD
double dar[6];
int m;
bool is;
double d;
char c[10];
};
And A and B are defined as:
struct A
{
int i;
int j;
int k;
};
struct B
{
A a; //POD has POD
double x;
double y;
double z;
char *s;
};
It's really very simplified version of the actual code which was written (in C) almost 20 years back by Citrix Systems when they came up with ICA protocol. Over the years, the code has been changed a lot. Now we've the source code, but we cannot know which code is being used in the current version of ICA, and which has been discarded, as the discarded part is also present in the source code.
That is the background of the problem. The problem is: now we've the source code, and we're building a system on the top of ICA protocol, for which at some point we need to know the values of few members of the big struct. Few members, not all. Fortunately, those members appear in the beginning of the struct, so can we write a struct which is part of the big struct, as:
//Part of struct B
//Whatever members it has, they are in the same order as they appear in Big.
struct partBig
{
A a[2];
B b;
double dar[6];
//rest is ignored
};
Now suppose, we know pointer to Big struct (that we know by deciphering the protocol and data streams), then can we write this:
Big *pBig = GetBig();
partBig *part = (partBig*)pBig; //Is this safe?
/*here can we pretend that part is actually Big, so as to access
first few members safely (using part pointer), namely those
which are defined in partBig?*/
I don't want to define the entire Big struct in our code, as it has too many POD members and if I define the struct entirely, I've to first define hundreds of other structs. I don't want to that, as even if I do, I doubt if I can do that correctly, as I don't know all the structs correctly (as to which version is being used today, and which is discarded).
I've done the casting already, and it seems to work for last one year, I didn't see any problem with that. But now I thought why not start a topic and ask everyone. Maybe, I'll have a better solution, or at least, will make some important notes.
Relevant references from the language specification will be appreciated. :-)
Here is a demo of such casting: http://ideone.com/c7SWr
The language specification has some features that are similar to what you are trying to do: the 6.5.2.3/5 states that if you have several structs that share common initial sequence of members, you are allowed to inspect these common members through any of the structs in the union, regardless of which one is currently active. So, from this guarantee one can easily derive that structs with common initial sequence of members should have identical memory layout for these common members.
However, the language does not seem to explicitly allow doing what you are doing, i.e. reinterpreting one struct object as another unrelated struct object through a pointer cast. (The language allows you to access the first member of a struct object through a pointer cast, but not what you do in your code). So, from that perspective, what you are doing might be a strict-aliasing violation (see 6.5/7). I'm not sure about this though, since it is not immediately clear to me whether the intent of 6.5/7 was to outlaw this kind of access.
However, in any case, I don't think that compilers that use strict-aliasing rules for optimization do it at aggregate type level. Most likely they only do it at fundamental type level, meaning that your code should work fine in practice.
Yes, assuming your small structure is laid out with the same packing/padding rules as the big one, you'll be fine. The memory layout order of fields in a structure is well-defined by the language specifications.
The only way to get BigPart from Big is using reinterpret_cast (or C style, as in your question), and the standard does not specify what happens in that case. However, it should work as expected. It would probably fail only for some super exotic platforms.
The relevant type-aliasing rule is 3.10/10: "If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: ...".
The list contains two cases that are relevant to us:
a type similar (as defined in 4.4) to the dynamic type of the object
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union)
The first case isn't sufficient: it merely covers cv-qualifications. The second case allows at the union hack that AndreyT mentions (the common initial sequence), but not anything else.