Accessing consecutive members with a single pointer - c++

I want to access continuously declared member arrays of the same type with a single pointer.
So for example say I have :
struct S
{
double m1[2];
double m2[2];
}
int main()
{
S obj;
double *sp = obj.m1;
// Code that maybe unsafe !!
for (int i(0); i < 4; ++i)
*(sp++) = i; // *
return 0;
}
Under what circumstances is line (*) problematic ?
I know there's for sure a problem when virtual functions are present but I need a more structured answer than my assumptions

You can be sure that the members of the struct are stored in a contiguos block of bytes, in the order they appear. Besides, the elements of the arrays are contiguous. So, it seems that everything is fine.
The problem here is that there is no standard way of knowing if there is padding bytes between consecutive members in the struct.
So, it is unsafe to assume that there is not padding bytes at all.
If you can be plenty sure, for some particular reason, that there are not padding bytes, then the 4 double elements will be contiguous, as you want.

The C++ standard makes certain guarantees about the layout of "plain old data" (or in C++11, standard layout) types. For the most part, these inherit from how C treated such data.
What follows only applies to "plain old data"/"standard layout" structures and data.
If you have two structs with the same initial order and type of arguments, casting a pointer to one to a pointer to the other and accessing their common initial prefix is valid, and will access the corresponding field. This is known as "layout compatible". This also applies if you have a structure X and a structure Y, and X is the first element of the structure Y -- a pointer to Y can be cast to a pointer to X, and it will access the fields of the X substructure in Y.
Now, while it is a common assumption, I am unaware of a requirement of either C or C++ that an array and a structure starting with fields of the same type and count are layout compatible with an array.
Your case is somewhat similar, in that we have two arrays adjacent to each other in a structure, and you are treating it as one large array of size equal to the sum of those two arrays size. It is a relatively common and safe assumption that it works, but I am unaware of a guarantee in the standard that it actually works.
In this kind of undefined behavior, you have to examine your particular compilers guarantees (de facto or explicit) about layout of plain old data/standard layout data, as the C++ standard does not guarantee your code does what you want it to do.

Related

memory of a reference variable in c++?

I've just started to learn Cpp from the basics and am confused when I came across reference variables.
From what I have learn't reference variables are just like an alias (another name to the same memory), so in this case it need need any memory.
When I ran the below code:
class sample_class
{
public:
int num; //line1
int& refnum = num; //line2
}
int main()
{
sample_class sample_object;
cout<< "sample_class object size : " << sizeof(sample_object) <<endl;
return 0;
}
I got the output as:
sample_class object size : 8
==>Here, the size for num is 4 bytes (32-bit compiler) and refnum since a reference is simply an alias to num. Then, why in this case, the size of object is 8?
==>Also, if really an refnum is like an alias then when does this information (info that refnum also holds/alias to the same memory address of num) gets stored?
Edited :
And consider this case (changine the definition of sample_class):
class sample_class
{
public:
char letter; //line3
char& refletter = letter; //line4
char letter_two; //line5
}
Here, If I print the object size of sample_class object, I get it as 12 (though the size of letter,refletter and letter_two are each equal to 1). But if I comment line 4, the object size is just 2. How is this happening???
I'm interested to learn from the basics, so if I'm wrong anywhere please correct me
A reference is an alias, it should not be considered as a new variable. You cannot obtain its address and you cannot obtain its size. Any attempt to do so will instead obtain the address or size of the aliased object. In practice, most implementations implement them like pointers, but the standard does not require this. It makes no mention of the expected size for a reference.
From : http://en.cppreference.com/w/cpp/language/reference
References are not objects; they do not necessarily occupy storage, although the compiler may allocate storage if it is necessary to implement the desired semantics (e.g. a non-static data member of reference type usually increases the size of the class by the amount necessary to store a memory address).
Edit : The c++ standard gives a lot of leeway to implementations to decide for themselves the size of types and classes in order to accommodate the unique requirements of every architecture. In this case, padding is introduced between the members of your class. There is no requirement in c++ that a class's size must be equal to the sum of it's members' size. See Objects and alignment on cppreference.com for more information on the subject.
Edit 2 : There still seems to be some confusion regarding sizeof(T&).
From http://en.cppreference.com/w/cpp/language/sizeof :
When applied to a reference type, the result is the size of the referenced type.
The expression sizeof(T&) is treated as if you had written sizeof(T). This doesn't mean that the size of T& is equal to the size of T. It's simply that you cannot get the size of a reference directly with sizeof.
In addition to the answer already provided I would recommend having a read of the material regarding padding at:
Data structure padding
Which is a good basic discussion regarding padding of classes and structures in C++ with simple examples to consider.
A reference store the address of the variable it reference (like pointer with stronger pre/post-conditions).
This means that sizeof(T&) == sizeof(T*) == 4 on a 32bit architecture.
Comment from #FrançoisAndrieux about the real size of T&:
#nefas You state "This means that sizeof(T&) == sizeof(T*) == 4 on a 32bit architecture." but this is not true. Try it with a larger T. For example, sizeof(std::array<int, 10>&) is much larger than a pointer. You are taking the size of T, not T&.
An other thing that you have to take into account when calculating size of class/struct is the padding of the struct/class: the size of the struct maybe higher than the sum of the size of its member (I won't explain how the padding work because I haven't enough knowledge about it).

Array of non-contiguous objects

#include <iostream>
#include <cstring>
// This struct is not guaranteed to occupy contiguous storage
// in the sense of the C++ Object model (§1.8.5):
struct separated {
int i;
separated(int a, int b){i=a; i2=b;}
~separated(){i=i2=-1;} // nontrivial destructor --> not trivially copyable
private: int i2; // different access control --> not standard layout
};
int main() {
static_assert(not std::is_standard_layout<separated>::value,"sl");
static_assert(not std::is_trivial<separated>::value,"tr");
separated a[2]={{1,2},{3,4}};
std::memset(&a[0],0,sizeof(a[0]));
std::cout<<a[1].i;
// No guarantee that the previous line outputs 3.
}
// compiled with Debian clang version 3.5.0-10, C++14-standard
// (outputs 3)
What is the rationale behind weakening standard guarantees to the point that this program may show undefined behaviour?
The standard says:
"An object of array type contains a contiguously allocated non-empty set of N subobjects of type T." [dcl.array] §8.3.4.
If objects of type T do not occupy contiguous storage, how can an array of such objects do?
edit: removed possibly distracting explanatory text
1.
This is an instance of Occam's razor as adopted by the dragons that actually write compilers: Do not give more guarantees than needed to solve the problem, because otherwise your workload will double without compensation. Sophisticated classes adapted to fancy hardware or to historic hardware were part of the problem. (hinting by BaummitAugen and M.M)
2.
(contiguous=sharing a common border, next or together in sequence)
First, it is not that objects of type T either always or never occupy contiguous storage. There may be different memory layouts for the same type within a single binary.
[class.derived] §10 (8): A base class subobject might have a layout different from ...
This would be enough to lean back and be satisfied that what is happening on our computers does not contradict the standard. But let's amend the question. A better question would be:
Does the standard permit arrays of objects that do not occupy contiguous storage individually, while at the same time every two successive subobjects share a common border?
If so, this would influence heavily how char* arithmetic relates to T* arithmetic.
Depending on whether you understand the OP standard quote meaning that only the subobjects share a common border, or that also within each subobject, the bytes share a common border, you may arrive at different conclusions.
Assuming the first, you find that
'contiguously allocated' or 'stored contiguously' may simply mean &a[n]==&a[0] + n (§23.3.2.1), which is a statement about subobject addresses that would not imply that the array resides within a single sequence of contiguous bytes.
If you assume the stronger version, you may arrive at the 'element offset==sizeof(T)' conclusion brought forward in T* versus char* pointer arithmetic
That would also imply that one could force otherwise possibly non-contiguous objects into a contiguous layout by declaring them T t[1]; instead of T t;
Now how to resolve this mess? There is a fundamentally ambiguous definition of the sizeof() operator in the standard that seems to be a relict of the time when, at least per architecture, type roughly equaled layout, which is not the case any more. (How does placement new know which layout to create?)
When applied to a class, the result [of sizeof()] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. [expr.sizeof] §5.3.3 (2)
But wait, the amount of required padding depends on the layout, and a single type may have more than one layout. So we're bound to add a grain of salt and take the minimum over all possible layouts, or do something equally arbitrary.
Finally, the array definition would benefit from a disambiguation in terms of char* arithmetic, in case this is the intended meaning. Otherwise, the answer to question 1 applies accordingly.
A few remarks related to now deleted answers and comments:
As is discussed in Can technically objects occupy non-contiguous bytes of storage?, non-contiguous objects actually exist. Furthermore, memseting a subobject naively may invalidate unrelated subobjects of the containing object, even for perfectly contiguous, trivially copyable objects:
#include <iostream>
#include <cstring>
struct A {
private: int a;
public: short i;
};
struct B : A {
short i;
};
int main()
{
static_assert(std::is_trivial<A>::value , "A not trivial.");
static_assert(not std::is_standard_layout<A>::value , "sl.");
static_assert(std::is_trivial<B>::value , "B not trivial.");
B object;
object.i=1;
std::cout<< object.B::i;
std::memset((void*)&(A&)object ,0,sizeof(A));
std::cout<<object.B::i;
}
// outputs 10 with g++/clang++, c++11, Debian 8, amd64
Therefore, it is conceivable that the memset in the question post might zero a[1].i, such that the program would output 0 instead of 3.
There are few occasions where one would use memset-like functions with C++-objects at all. (Normally, destructors of subobjects will fail blatantly if you do that.) But sometimes one wishes to scrub the contents of an 'almost-POD'-class in its destructor, and this might be the exception.

Storing a Dynamic Array of Structures

I have been working on a project which utilizes a dynamic array of structures. To avoid storing the number of structures in its own variables (the count of structures), I have been using an array of pointers to the structure variables with a NULL terminator.
For example, let's say my structure type is defined as:
typedef struct structure_item{
/* ... Structure Variables Here ... */
} item_t;
Now let's say my code has item_t **allItems = { item_1, item_2, item_3, ..., item_n, NULL }; and all item_#s are of the type item_t *.
Using this setup, I then do not have to keep track of another variable which tells me the total number of items. Instead, I can determine the total number of items as needed by saying:
int numberOfStructures;
for( numberOfStructures = 0;
*(allItems + numberOfStructures) != NULL;
numberOfStructures++
);
When this code executes, it counts the total number of pointers before NULL.
As a comparison, this system is similar to C-style strings; whereas tracking the total number of structures would be similar to a Pascal-style string. (Because C uses a NULL terminated array of characters vs. Pascal which tracks the length of its array of characters.)
My question is rather simple, is an array of pointers (pointer to pointer to struct) really necessary or could this be done with an array of structs (pointer to struct)? Can anybody provide better ways to handle this?
Note: it is important that the solution is compatible with both C and C++. This is being used in a wrapper library which is wrapping a C++ library for use in standard C.
Thank you all in advance!
What you need is a sentinel value, a recognizable valid value that means "nothing". For pointers, the standard sentinel value is NULL.
If you want to use your structs directly, you will need to decide on a sentinel value of type item_t, and check for that. Your call.
Yes, it is possible to have an array of structs, and (at least) one of those a defined sentinel (which is that the '\0' used at the end of strings, and the NULL pointer in your case).
What you need to do, for your struct type, is reserve one or more possible values of that struct (composed of the set of values of its members) to indicate a sentinel.
For example, let's say we have a struct type
struct X {int a; char *p};
then define a function
int is_sentinel(struct X x)
{
return x.p == NULL;
}
This will mean any struct X for which the member p is NULL can be used as a sentinel (and the member a would not matter in this case).
Then just loop looking for a sentinel.
Note: to be compatible in both C and C++, the struct type needs to be compatible (e.g. POD).

Only one array without a size allowed per struct?

I was writing a struct to describe a constant value I needed, and noticed something strange.
namespace res{
namespace font{
struct Structure{
struct Glyph{
int x, y, width, height, easement, advance;
};
int glyphCount;
unsigned char asciiMap[]; // <-- always generates an error
Glyph glyphData[]; // <-- never generates an error
};
const Structure system = {95,
{
// mapping data
},
{
// glyph spacing data
}
}; // system constructor
} // namespace font
} // namespace res
The last two members of Structure, the unsized arrays, do not stop the compiler if they are by themselves. But if they are both included in the struct's definition, it causes an error, saying the "type is incomplete"
This stops being a problem if I give the first array a size. Which isn't a problem in this case, but I'm still curious...
My question is, why can I have one unsized array in my struct, but two cause a problem?
In standard C++, you can't do this at all, although some compilers support it as an extension.
In C, every member of a struct needs to have a fixed position within the struct. This means that the last member can have an unknown size; but nothing can come after it, so there is no way to have more than one member of unknown size.
If you do take advantage of your compilers non-standard support for this hack in C++, then beware that things may go horribly wrong if any member of the struct is non-trivial. An object can only be "created" with a non-empty array at the end by allocating a block of raw memory and reinterpreting it as this type; if you do that, no constructors or destructors will be called.
You are using a non-standard microsoft extension. C11 (note: C, not C++) allows the last array in a structure to be unsized (read: a maximum of one arrays):
A Microsoft extension allows the last member of a C or C++ structure or class to be a variable-sized array. These are called unsized arrays. The unsized array at the end of the structure allows you to append a variable-sized string or other array, thus avoiding the run-time execution cost of a pointer dereference.
// unsized_arrays_in_structures1.cpp
// compile with: /c
struct PERSON {
unsigned number;
char name[]; // Unsized array
};
If you apply the sizeof operator to this structure, the ending array size is considered to be 0. The size of this structure is 2 bytes, which is the size of the unsigned member. To get the true size of a variable of type PERSON, you would need to obtain the array size separately.
The size of the structure is added to the size of the array to get the total size to be allocated. After allocation, the array is copied to the array member of the structure, as shown below:
The compiler needs to be able to decide on the offset of every member within the struct. That's why you're not allowed to place any further members after an unsized array. It follows from this that you can't have two unsized arrays in a struct.
It is an extension from Microsoft, and sizeof(structure) == sizeof(structure_without_variable_size_array).
I guess they use the initializer to find the size of the array. If you have two variable size arrays, you can't find it (equivalent to find one unique solution of a 2-unknown system with only 1 equation...)
Arrays without a dimension are not allowed in a struct,
period, at least in C++. In C, the last member (and only the
last) may be declared without a dimension, and some compilers
allow this in C++, as an extension, but you shouldn't count on
it (and in strict mode, they should at least complain about it).
Other compilers have implemented the same semantics if the last
element had a dimension of 0 (also an extension, requiring
a diagnostic in strict mode).
The reason for limiting incomplete array types to the last
element is simple: what would be the offset of any following
elements? Even when it is the last element, there are
restrictions to the use of the resulting struct: it cannot be
a member of another struct or an array, for example, and
sizeof ignores this last element.

Two arrays in a union in C++

is it possible to share two arrays in a union like this:
struct
{
union
{
float m_V[Height * Length];
float m_M[Height] [Length];
} m_U;
};
Do these two arrays share the same memory size or is one of them longer?
Both arrays are required to have the same size and layout. Of course,
if you initialize anything using m_V, then all accesses to m_M are
undefined behavior; a compiler might, for example, note that nothing in
m_V has changed, and return an earlier value, even though you've
modifed the element through m_M. I've actually used a compiler which
did so, in the distant past. I would avoid accesses where the union
isn't visible, say by passing a reference to m_V and a reference to
m_M to the same function.
It is implicitly guaranteed that these will be the same size in memory. The compiler is not allowed to insert padding anywhere in either the 2D array or the 1D array, because everything must be compatible with sizeof.
[Of course, if you wrote to m_V and read from m_M (or vice versa), you'd still be type-punning, which technically invokes undefined behaviour. But that's a different matter.]