c++ alignment using pragma pack and inheritance - c++

I am not too familiar with concept of packing / alignment in C++, I did some reading about this recently and have a question.
I am deriving from a base class (written by somebody else and I have header for that). Author of this class has used pragma pack to align members to 1 byte boundary. however I am not sure if it is necessary for derived class to do the same or not, what are consequences of packing/not packing derived class with same alignment as base class ?
any help/suggestions will be greatly appreciated
thanks

In everyday, well-written C++ code it doesn't normally matter if there's padding or not, though the choice may impact performance. So, you should be able to derive from that base class without worrying about explicitly specifying any packing yourself. That said, the base class may be packed because there'll be a massive number of instances in memory or bitwise-copied to a file or network stream, in which case you'll want to consider whether instances of your new class may end up mixed in with that data, and whether you also want to use packing for the extra data members for the same reasons.
Not all code is well-written though. For example, if the program treats the objects as binary blobs of data and uses functions like memcmp on them, or does a byte-wise void*/size checksum, then garbage data in padding members may break the logic/behaviour. If the data is written object by object with particular separator or delimiter characters, then embedded garbage may inject unwanted separators/delimiters and break the reading/parsing logic. There's no way to assess these risks without doing an impact study on the existing code.

Related

Significance of classes over data-structures

Whats the significance of classes over data-structures or data-structures over classes?
Ok so The most basic ones can be that we can use "Access Specifiers In Classes" meaning we can prevent some and allow some to access our data.
next can be that data-hiding.
But whats the main thing that separates classes and data-structures? I mean why need data-structures when we have classes or vice-versa?
C++ has fundamantal types, and classes.
Struct and Class are both keywords that introduce a new class. There are slightly different defaults.
Data structures are an arrangement of data with some kind of invarient. They can be a class, they can contain classes, or they could be completely class free.
They are different categories of thing. It is like asking what the difference is between steel and an automobile.
In a course assignment, what the teacher is asking for is for you to know the definition the teacher or the text taught those terms meant. Terms mean what the context they are in tells them to mean. It is a matter of "are you paying attention" not "do you know this fact"; having asked it of the internet, you have already failed.
In terms of syntax, in C++ the only difference between a class and a struct is that members of a struct are public by default, while the members of a class are private by default.
From a perspective of implied design intent, however, there is a larger difference. struct was/is a feature of C, and was/is used (in both C and C++) to help the programmer organize Plain Old Data in useful ways. (for example, if you know every Person in your persons-database needs to store the person's first name, last name, and age, then you can put two char arrays and and int together in a struct Person and thereby make it more convenient to track all of that data as a unit, than if you had to store each of those fields separately).
C++ continues to provide that C-style struct functionality, but then goes further by adding additional features to better support object-oriented-programming. In particular, C++ adds explicit support for encapsulation (via the private and protected keywords), functionality-extension via inheritance, the explicit tying-together of code and data via methods, and run-time polymorphism via virtual methods. Note that all of these features can be approximated in C by manually following certain coding conventions, but by explicitly supporting them as part of the language, C++ makes them easier to use correctly and consistently.
Having done that, C++ then goes on to muddy the waters a bit, by making all of that new functionality available to structs as well as classes. (This is why the technical difference is so minor, as described in the first paragraph) However, I believe it is the case that when most programmers see a struct defined, they tend to have an implicit expectation that the struct is intended be used as a simple C-style data-storage/data-organization receptacle, whereas when they see a class, they expect it to include not just "some raw data" but also some associated business-logic, as implemented in the class's methods, and that the class will enforce its particular rules/invariants by requiring the calling code to call those methods, rather than allowing the calling code to read/write the class's member-variables directly. (That's why public member-variables are discouraged in a class, but less so in a struct -- because a public member-variable in a class-object contradicts this expectation, and violates the principle of least surprise).

Design alternative for objects with mutually exclusive data members

It was tough to summarize the problem in the title; so allow me to clarify the situation here.
I have a class that I'm designing that represents a BER TLV structure. In this specification, the "data" portion of the TLV can contain raw bytes of data OR other nested TLVs. To support both forms, I use the same structure but with two vectors (only one will actually contain something, depending on what we find as we parse the TLV data):
class BerTlv
{
public:
void Parse(std::vector<std::uint8_t> const& bytes_to_parse);
// Assume relevant accessors are provided
private:
// Will be m_data or m_nestedTlvs, but never both
std::vector<std::uint8_t> m_data;
std::vector<BerTlv> m_nestedTlvs;
};
From the outside, after this object is fully constructed (all TLV data parsed), the user will need to detect what kind of data they are dealing with. Basically, they'd have to check m_data.empty(), and if so, use m_nestedTlvs. I'm not really happy with this approach; it smells like it lacks a better design.
I thought of some form of a union, although I do not think a real union would be appropriate here since vector data is heap allocated. So I thought of std::variant:
std::vector<std::variant<BerTlv, std::uint8_t>> m_data;
However, I'm worried this negatively impacts the std::uint8_t case since that's literally just byte data. It will now become non-continuous as well. The variant only benefits the nested TLV case and not by much.
Next I considered using the visitor pattern here, but I can't quite visualize what the interface would look like or how this would improve usability in both cases (raw data vs nested TLVs). Is visitor the right solution here?
Nothing I've thought of so far feels right, so I'm hoping for feedback on a better design approach to this problem. The general problem here is having data members that are sometimes unused or are mutually exclusive. It's a problem I run into in other contexts as well, so it would be great to have a general design approach to such a problem.
Note that I have access to C++14 features and below.
Basically, they'd have to check m_data.empty(), and if so, use m_nestedTlvs.
If the idea is that an object either has an array of bytes or has an array of other objects, then that's the variant you ought to use: variant<vector<std::uint8_t>, vector<BerTlv>>. A vector of variants does not match your specified use case.

Checking if a class has expected attributes

I am going to give some classes about C++ and data structures, and to check students' progress I'd like them to develop the structures I talk about. This is the common approach for data structures classes, I guess. But I want more, I want the students to have a quick feedback on what they are missing, so I developed several unit tests for the classes that check the behavior and give them instant results on what is wrong.
This has been working properly for the past two semesters, but I want a step further on that automatized correction. I've been studying how to check what are the internal components of a class, so I can know if someone has implemented correctly a tree with a node* root and size_t size and hasn't used additional not-necessary attributes, for instance.
I know that I can have a rough approximation of an object size with sizeof, but the results are not that precise. It frequently is different from what I expect, for example: I tested creating a class with a pointer (8 bytes) and an int (4 bytes), but the sizeof was 28. From what I learnt, probably this has something to do with virtual function table and other alignment stuff.
So, how far and further can I go analyzing if someone has coded a data structure the proper and expected manner? How can I check that someone just didn't #include <list> and created an adaptor (for this I know I can just strip the includes but anyway)?
Let's break this answer into two parts, we'll split on the return from is_standard_layout.
1. Virtual Classes
is_standard_layout will return false, meaning the class is virtual. virtual classes will contain all the members from their parents aside from just a virtual function pointer. You can find more info here. Basically, your best bet for finding the size of the members here is to do sizeof the class in question reduced by sizeof(void*) And that's the size of your virtual class's members.
2. Non-Virtual Classes
is_standard_layout will return true meaning this is not a virtual class. In this case we can use offsetof to find the first member variable past the header information. Finding the end of the object with the pointer to the object and sizeof will you to measure the distance to the point returned by offsetof.
Both of these methods should yield the size of the members in the classes. determining an allowable range for class size is a matter of preference. But placing the evaluation in a static_assert will allow you to also provide a compile time message indicating the reason for the assert.

Datastructure Storage in Filesystem

I am trying to write a persistent datastructure in C++ , however I feel that I should be able to make it binary compatible with various other implementations of my datastructure readers, and hence, my current idea is to declare datastructure in the native memory without any abstraction.
For example, I would specify a linear block of memory as a datastructure (using new keyword) and then describe what the first byte means, what the second byte means and so on. I know I can do this using struct but then, the datastructure would be bound to one language and other languages will have to then use this structure. Also, the implementation might then change from compiler to compiler. I would instead like it as a memory standard.
Is what I am trying to do somewhat sensible? Or I am trying to over-simplify things and should really proceed with a struct data structure? Now onto the C++ part, if you believe that I should be using a struct data structure, then what are the disadvantages of using a full-fledged class?
(I am using a class anyway to wrap around the memory structure and provide functions to it since the datastructure is anyway persistent.)
EDIT
As justin as suggested, I do not need any such advanced interface wrapper around the memory structure, so my last point about class wrapper is not stated properly. What I mean is I would like to have a class interface for the memory representation, it does not necessarily have to be a wrapper.
Several file formats I have read/worked with do exactly that -- define a memory standard or layout, then typically back it up with a demonstration in C-like pseudo-structure. Sometimes they will provide struct or class representations, and some are completely abstracted by a library. Of course, these formats go on to document all fields, their sizes, the endianness of the data and so on.
I figure endian related issues, padding, complexity (e.g. introduced by variations in the data structures) and proper versioning are the biggest sources of errors. Another issue I find is the use of data structures of yesteryear and inconsistency of data structures used to represent similar functionalities -- You may receive a spec, and realize it contains several different string representations -- all of which are archaic, and somebody has to go on to support all of these (bidirectionally).
Proceeding that route:
You should not commit to a binary representation (or compilable program) if you don't want to support it (and attempts of long-lived formats fail/stumble along the way, as platforms and toolsets change). Just commit to a formal memory standard at first, then build on top of that with tests and example input files to verify the representation is properly serialized and deserialized correctly. A very basic test suite will help ensure your model is portable on all the systems you need, and can point out potential pitfalls or platform specific considerations you may not have been aware of.
If you really want to provide a compilable representation, I'd stick with a very compliant struct representation -- clients can take that (in memory) representation and turn it into any C++ abstraction/representation they like. That is to say, a serialized representation should probably not reflect that of a representation in memory, apart from trivially simple representations and the intermediate storage of such a representation (flattened and packed structs).
One of the important parts is that you should have tests which confirm your in memory object graph which you create with these structs are forward and backwards serializable and de-serializable, and support proper versioning -- so it often takes a bit of work to make a complex serialized representation compatible. So you see this approach just introduces one abstraction layer on top of another. In this regard, you may want give C++ abstraction the ability to create itself from the packed in memory representation, and to ensure that that representation can also correctly populate the packed structure without data loss.
Beyond that, is there any need to have a more advanced interface? If there is, then you may want to provide that information.
So yes, the memory standard is the part that you must get correct and stable, and to which all implementations should refer to and test against -- regardless of platform/architecture differences. IOW, you're on the right track ;)
In C++ there's no practical difference between struct and class (besides the default accessibility being public in struct). Traditionally, struct is used when a type only has (public) member variables and no member functions but this is only a convention, not a rule enforced by the compiler.
I'd certainly use a struct/class to describe the data. If someone wants to write a reader of your data structure, they can either import your header file or implement the data structure in their language of choice - in most programming languages this should be pretty simple.
I recommend you start your structure something like this:
typedef struct
{
int Version; // struct layout version
int ByteSize; // byte size of structure for validation
...
} MYDATA;
This way when your data structure is being passed around, your code can verify that the allocated structure size matches with how many bytes you'd expect for a given version of your structure. You could then easily introduce new versions of your structure by simply updating the version field and checking for the new size.
When you save your data to disk, make sure that you write it out field-by-field, rather than through a single write (using a pointer and sizeof() to ensure that other languages won't have to deal with potential padding that your C++ compiler may decide to put in. It's possible to manually lay out fields in the structure so that there's no padding but you have to be very, very careful while doing that and it's easy to make mistakes.

Access v-table at run-time

Is it possible to access a function's v-table at runtime? Can meta-information such as the number of different function versions be determined? This might be more of a theoretical question, but could a developer put a cap on the number of classes that can extend a given base class by making sure the v-table never exceeds a certain number of rows?
Is it possible to access a function's v-table at runtime? Can meta-information such as the number of different function versions be determined?
Not in a portable way. The standard does not even have the concept of virtual table, it is more of an implementation detail than a requirement, even if all implementations I know use vtables. In the general case there will not even be enough information available at runtime (i.e. the compiler does not need to store the number of entries in the vtable, as it sees the type and can count)
Could a developer put a cap on the number of classes that can extend a given base class by making sure the v-table never exceeds a certain number of rows?
Again no, but since this shows a misconception, it might be worth treating it apart. When a base class has any virtual functions the compiler (in all implementations that use vtables) will create the vtable and that table will have exactly 1 entry per virtual function in the base class (plus some additional data --typeinfo or pointer to it, offset to the beginning of the object or other implementation details). When a class extends that base class, it will not add new elements to that vtable, but rather create a separate vtable (or more, depending on the type hierarchy). If the derived function does not add any new virtual function, the vtable for the derived object will contain the exact number of elements that the original vtable had. That is, you can have a huge hierarchy of inheritances without that affecting the vtable layout at all. What will change are the typeinfo data stored and the pointers to each virtual function, that will refer to the final overrider
an meta-information such as the number of different function versions be determined?
No C++ doesn't support reflection. What you are trying to achieve is not possible in C++ AFAIK
Theoretically, yes, because it's stored in memory and you have access to it. In practice, there is no sane, portable way to do it, because the compiler is free to implement virtual functions in any way it wants, so you would have to dig through your compiler's source code to find out how/where to access the desired information and how to interpret it.
The only glimmer of hope I could imagine for your effort, is the handling of dynamic_cast. Each compiler, with corresponding library support, has some concept of traversing a hierarchy to achieve a dynamic cast. If you could hook into that traversal, you might then know something about how many levels of inheritance you are dealing with. That said, even if you made this work, it would be compiler-specific (as others have said) since such implementation is proprietary.
You can use the Debug Interface Access SDK or other debug support interfaces (gdb) for this sort of thing.
RTT data is more portable but may not have sufficient details for your project.
For your specific question on limiting the v-table and preventing it from being extended too far, you can try this method;
IDiaSymbol::get_classParente
Retrieves a reference to the class parent of the symbol.
HRESULT get_classParent (IDiaSymbol** pRetVal);
You can investigate all of the class related symbol types here, what you might want to do is enumerate all class types loaded, get_classParent recursively and keep a tally of all of the classes which extend your base.
Your class could also require symbols to be available on startup to help with enforcement.