Form 16 bit words from a struct - c++

So I'm working on creating a ICMPv4 echo request and decided to roll my own struct to hold the packet. To make identifying the packet easy to identify in wireshark, I decided to put abcde into the data field.
struct icmpPacket{
u_int8_t icmp_type:8, icmp_code:8;
u_int16_t icmp_checksum:16, icmp_id:16, icmp_seqnum:16;
char icmp_data[6]; //cheat a little bit, set the field just large enough to store "abcde";
} __attribute__((aligned (16))) icmppckt; // icmp has an 8 byte header + 6 bytes of data
What I'm getting stuck on is how to make the compiler read the struct out as a series of 16 bit word

The standard-compliant way to do this is via memcpy:
icmpPacket packet = { /* ... */ };
uint16_t buf[sizeof(icmpPacket) / sizeof(uint16_t)];
memcpy(buf, &packet, sizeof(icmpPacket));
/* Now use buf */
Modern compilers are clever enough to optimize this appropriately, without actually doing a function call. See examples with clang and g++).
A common compiler extension allows you to use unions, though this is undefined behavior under the C++ standard:
union packet_view{
icmpPacket packet;
uint16_t buf[sizeof(icmpPacket) / sizeof(uint16_t)];
};
icmpPacket packet = { /* ... */ };
packet_view view;
view.packet = packet;
/* Now read from view.buf. This is technically UB in C++ but most compilers define it. */
Using a reinterpret_cast<uint16_t*>(&packet) or its C equivalent would break strict aliasing rules and result in undefined behavior. §3.10 [basic.lval]/p10 of the C++ standard:
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including,
recursively, an element or non-static data member of a subaggregate or
contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Similarly, §6.5/p7 of C11 says:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.

you can use 16 bit pointers for that
but yout need to add aligning to 1 Byte of the structure elements !!!
in C++ you can do it like this:
#pragma pack(1)
struct icmpPacket
{
u_int8_t icmp_type:8, icmp_code:8;
u_int16_t icmp_checksum:16, icmp_id:16, icmp_seqnum:16;
char icmp_data[6]; //cheat a little bit, set the field just large enough to store "abcde";
} icmppckt; // icmp has an 8 byte header + 6 bytes of data
WORD *picmppckt16=(WORD*)((void*)&icmppckt);
#pragma pack()
change WORD to 16 bit data type your compiler knows ...

Related

Is the address of the first data member of an instance the same as the address of the instance? [duplicate]

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.

Converting from std::string to const unsigned int

My payload is stored in a std::string xyz (holds binary data), and I need to pass it to a function that takes it as const unsigned int*. How would I convert from std::string to const unsigned int*?
I tried reinterpret_cast<const unsigned int*>(&xyz.front()) but it is not working!
The function prototype is as follows:
void roll(void *pdst, const unsigned int *psrc);
pdst will hold the results.
Don't use std::string to store binary data; that class is specifically designed for working with strings. It feels like there was original C code that was using a char array to store a sequence of bytes and translated that to std::string for C++. In this case, it's not being used as a string, so it doesn't make sense to store it in a std::string.
From there, translating to an unsigned int, well for starters, you can't simply cast it even if you were using a more primitive type such as a char *, as it would violate the rules of strict aliasing resulting in undefined behavior. What you want to do is create a new variable and memcpy the data into this new variable.
Here is the section from the C++14 standard working draft describing compatible types (3.10 p10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:
54
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned types among its elements or non-
static data members (including, recursively, an element or non-static data member of a subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
— a
char
or
unsigned char
type.
As you can see, it explicitly allows for accessing any object as a char or unsigned char, but it gives no such allowance to access a char or unsigned char as anything else.
The problem is how you store binary data to std string? If you are simply using the constructor, you could get your binary data by xyz.data().

C struct elements alignment (ansi)

just a simple question... what the standard says about the structure members alignment?
for example with this one:
struct
{
uint8_t a;
uint8_t b;
/* other members */
} test;
It is guarateed that b is at offset 1 from the struct start?
Thanks
The standard (as of C99) doesn't really say anything.
The only real guarantees are that (void *)&test == (void *)&a, and that a is at a lower address than b. Everything else is up to the implementation.
C11 6.7.2.1 Structure and union specifiers p14 says
Each non-bit-field member of a structure or union object is aligned in
an implementation- defined manner appropriate to its type.
meaning that you can't make any portable assumptions about the difference between the addresses of a and b.
It should be possible to use offsetof to determine the offset of members.
For C the alignment is implementation defined, we can see that in the draft C99 standard section 6.7.2.1 Structure and union specifiers paragraph 12(In C11 it would be paragraph 14) which says:
Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.
and paragraph 13 says:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
and for C++ we have the following similar quotes from the draft standard section 9.2 Class members paragraph 13 says:
Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (Clause 11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other;
and paragraph 19 says:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. —end note ]
the case you're using is not really an edge case, both uint_8 are small enough to fit in the same word in memory and it would be no use to put each uint_8 in a uint_16.
A more critical case would be something like :
{
uint8_t a;
uint8_t b;
uint_32 c; // where is C, at &a+2 or &a+4 ?
/* other members */
} test;
and anyway this will always depend on the target architecture and your compiler...
K&R second edition (ANSI C) in chapter 6.4 (page 138) says:
Don't assume, however, that the size of a structure is the sum of the sizes of its memebers. Because of alignment requirements for different objects, there may be unnamed "holes" in a structure.
So no, ANSI C does not guarantee that b is at offset 1.
It is even likely that the compiler puts b at offset sizeof(int) so that it's aligned on the size of a machine word, which is easier to deal with.
Some compilers support pack-pragmas so that you can force that there are no such "holes" in the struct, but that is not portable.
What is guaranteed by the C-Standard already had been mentioned by other answers.
However, to make sure b is at offset 1 your compiler might offer options to "pack" the structure, will say to explicitly add no padding.
For gcc this can be achieved by the #pragma pack().
#pragma pack(1)
struct
{
uint8_t a; /* Guaranteed to be at offset 0. */
uint8_t b; /* Guaranteed to be at offset 1. */
/* other members are guaranteed to start at offset 2. */
} test_packed;
#pragma pack()
struct
{
uint8_t a; /* Guaranteed to by at offset 0. */
uint8_t b; /* NOT guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_unpacked;
A portable (and save) solution would be to simply use an array:
struct
{
uint8_t ab[2]; /* ab[0] is guaranteed to be at offset 0. */
/* ab[1] is guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_packed;

C/C++ Pointer to a POD struct also points to the 1st struct member

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.

Conversion between short* to int*

Assuming short is 2 bytes and int is 4 bytes on a 32 bit OS. Is the following an undefined behavior?
short s = 42;
int *p = (int*)(&s);
No, the code that you have posted does not exhibit undefined behavior but attempting to read *p would. Also, depending on the alignment requirements of int and short, the result of the cast may be unspecified and irreversable (see 5.2.10 [expr.reinterpret.cast] / 7).
See ISO/IEC 14882:2011 3.10 [basic.lval] / 10:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
The object that you are trying to access is a short and *p is a glvalue of type int which doesn't meet any of the above descriptions.
Your code is directly in the realm of UB, as you are reading two uninitialized bytes.
However, the opposite,
int b
short* f = (short *) &b
will probably work due to the semantics of the little endian architecture.
(This is all assuming the compiler doesn't do anything stupid)
From wikipedia:
The little-endian system has the property that the same value can be
read from memory at different lengths without using different
addresses (even when alignment restrictions are imposed). For example,
a 32-bit memory location with content 4A 00 00 00 can be read at the
same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit
(00004A), or 32-bit (0000004A), all of which retain the same numeric
value. Although this little-endian property is rarely used directly by
high-level programmers, it is often employed by code optimizers as
well as by assembly language programmers.
So, as long as you are little endian, the opposite direction should be fine.
Still undefined behavior though.