uint32_t pointer to the same location as uint8_t pointer - c++

#include <iostream>
int main(){
uint8_t memory[1024];
memory[0] = 1;
memory[1] = 1;
uint32_t *test = memory;
//is it possible to get a value for *test that would be in this example 257?
}
I want to create a uin32_t pointer to the same adress as the uint8_t pointer. Is this possible without using new(adress)? I don't want to lose the information at the adress. I know pointers are just adresses and therefor I should be able to just set the uint32_t pointer to the same adress.
This code produces an error:
invalid conversion from 'uint8_t*' to 'uint32_t*' in initialization

This would be a violation of so-called Strict Aliasing Rule, so it can not be done. Sad, but true.
Use memcpy to copy data and in many cases compilers will optimize memory copy and generate the same code as they would with cast, but in Standard-conforming way.

As already mentioned you cannot convert uint8_t * to uint32_t * due to strict aliasing rule, you can convert uint32_t * to unsigned char * though:
#include <iostream>
int main(){
uint32_t test[1024/4] = {}; // initialize it!
auto memory = reinterpret_cast<unsigned char *>( test );
memory[0] = 1;
memory[1] = 1;
std::cout << test[0] << std::endl;
}
this is not portable code due to Endianness, but at least it does not have UB.

This question completely ignores the concept of endian-ness; while your example has the lower and upper byte the same value, if the byte order is swapped it makes no difference; but in the case where it is; your number will be wrong unexpectedly.
As such, there's no portable way to use the resulting number.

You can do that with union. As mentioned above, you have to be aware of endianness of target device, but in most cases it will be little-endian. And there is also a bit of controversy about using unions in such way, but fwiw it's getting a job done and for some uses it's good enough.
#include <iostream>
int main(){
union {
uint8_t memory[1024] = {};
uint32_t test[1024/4];
};
memory[0] = 1;
memory[1] = 1;
std::cout << test[0]; // 257
}

uint32_t *test =(uint32_t*) memory;
uint32_t shows that the memory pointed by test should contain uint32_t .

Related

Casting and writing in pointer array reports misaligned address with clang sanitizer

I'm using a char* array to store different data types, like in the next example:
int main()
{
char* arr = new char[8];
*reinterpret_cast<uint32_t*>(&arr[1]) = 1u;
return 0;
}
Compiling and running with clang UndefinedBehaviorSanitizer will report the following error:
runtime error: store to misaligned address 0x602000000011 for type 'uint32_t' (aka 'unsigned int'), which requires 4 byte alignment
I suppose I could do it another way, but why is this undefined behavior? What concepts are involved here?
You cannot cast an arbitrary char* to uint32_t*, even if it points to an array large enough to hold a uint32_t
There are a couple reasons why.
The practical answer:
uint32_t generally likes 4-byte alignment: its address should be a multiple of 4.
char does not have such a restriction. It can live at any address.
That means that an arbitrary char* is unlikely to be aligned properly for a uint32_t.
The Language Lawyer answer:
Aside from the alignment issue, your code exhibits undefined behavior because you're violating the strict aliasing rules. No uint32_t object exists at the address you're writing to, but you're treating it as if there is one there.
In general, while char* may be used to point to any object and read its byte representation, a T* for any given type T, cannot be used to point at an array of bytes and write the byte-representation of the object into it.
No matter the reason for the error, the way to fix it is the same:
If you don't care about treating the bytes as a uint32_t and are just serializing them (to send over a network, or write to disk, for example), then you can std::copy the bytes into the buffer:
char buffer[BUFFER_SIZE] = {};
char* buffer_pointer = buffer;
uint32_t foo = 123;
char* pfoo = reinterpret_cast<char*>(&foo);
std::copy(pfoo, pfoo + sizeof(foo), buffer_pointer);
buffer_pointer += sizeof(foo);
uint32_t bar = 234;
char* pbar = reinterpret_cast<char*>(&bar);
std::copy(pbar, pbar + sizeof(bar), buffer_pointer);
buffer_pointer += sizeof(bar);
// repeat as needed
If you do want to treat those bytes as a uint32_t (if you're implementing a std::vector-like data structure, for example) then you will need to ensure the buffer is properly-aligned, and use placement-new:
std::aligned_storage_t<sizeof(uint32_t), alignof(uint32_t)> buffer[BUFFER_SIZE];
uint32_t foo = 123;
uint32_t* new_uint = new (&buffer[0]) uint32_t(foo);
uint32_t bar = 234;
uint32_t* another_new_uint = new (&buffer[1]) uint32_t(foo);
// repeat as needed

Cast char* to short*

I want to sum up all bytes of my structure. I read that I should cast pointer of my structure from char to short. Why?
Does casting using (short) from char to short is correct?
My code
#include <stdio.h>
#include <string.h>
struct pseudo_header
{
int var1;
int var2;
char name[25];
};
void csum(const unsigned short* ptr, int nbytes)
{
unsigned long sum = 0;
for(int i = 0; i < sizeof(struct pseudo_header); i++)
{
printf("%#8x\n", ptr[i]);
sum+= ptr[i];
}
printf("%#8x", sum);
}
int main() {
struct pseudo_header psh = {0};
char datagram[4096];
psh.var1 = 10;
psh.var2 = 20;
strcpy(psh.name, "Test");
memcpy(datagram, &psh, sizeof(struct pseudo_header));
csum((unsigned short*)datagram, sizeof(struct pseudo_header));
return 0;
}
It looks like it works, but I can't verify this. Any help is appreciated.
No, the behaviour on dereferencing a pointer that's been set to the result of a cast from a char* to a short* is undefined, unless the data to which char* is pointing was originally a short object or array; which yours isn't.
The well-defined way (in both C and C++) to analyse memory is to use an unsigned char*, but be careful not to traverse your memory so as to reach areas that are not owned by your program.
Basically this works because you cleared the structure with zero. = {0}.
You can give the function a pointer to a structure struct *pseudo_header.
I would see an alignment issue.
I would check sizeof(struct ..) for expected value 33 if I have to add a pragma pack() statement before the structure and then cast to unsigned char* inside the function.
Test your function with a 25 chars length name.
short is at least two bytes. So if you want to sum all the bytes then casting to short* is wrong. Instead cast to unsigned char*.
Casting (aliasing) a pointer with any type other than a char * violates the strict aliasing rule. Gcc will optimize based on assumptions of the strict aliasing rule and can lead to interesting bugs. Tge only truly safe way round the strict aliasing rule is to use a memcpy. Gcc supports memcpy as a compiler intrinsic so can optimize into the copy.
Alternatively, you can disable strict aliasing with the -fno-strict-aliasing flag.
PS - I am unclear if a union provides a suitable way round the strict aliasing rule.

Portable tagged pointers

Is there a portable way to implement a tagged pointer in C/C++, like some documented macros that work across platforms and compilers? Or when you tag your pointers you are at your own peril? If such helper functions/macros exist, are they part of any standard or just are available as open source libraries?
Just for those who do not know what tagged pointer is but are interested, it is a way to store some extra data inside a normal pointer, because on most architectures some bits in pointers are always 0 or 1, so you keep your flags/types/hints in those extra bits, and just erase them right before you want to use pointer to dereference some actual value.
const int gc_flag = 1;
const int flag_mask = 7; // aka 0b00000000000111, because on some theoretical CPU under some arbitrary OS compiled with some random compiler and using some particular malloc last three bits are always zero in pointers.
struct value {
void *data;
};
struct value val;
val.data = &data | gc_flag;
int data = *(int*)(val.data & flag_mask);
https://en.wikipedia.org/wiki/Pointer_tagging
You can get the lowest N bits of an address for your personal use by guaranteeing that the objects are aligned to multiples of 1 << N. This can be achieved platform-independently by different ways (alignas and aligned_storage for stack-based objects or std::aligned_alloc for dynamic objects), depending on what you want to achieve:
struct Data { ... };
alignas(1 << 4) Data d; // 4-bits, 16-byte alignment
assert(reinterpret_cast<std::uintptr_t>(&d) % 16 == 0);
// dynamic (preferably with a unique_ptr or alike)
void* ptr = std::aligned_alloc(1 << 4, sizeof(Data));
auto obj = new (ptr) Data;
...
obj->~Data();
std::free(ptr);
You pay by throwing away a lot of memory, exponentionally growing with the number of bits required. Also, if you plan to allocate many of such objects contiguously, such an array won't fit in the processor's cacheline for comparatively small arrays, possibly slowing down the program considerably. This solution therefore is not to scale.
If you're sure that the addresses you are passing around always have certain bits unused, then you could use uintptr_t as a transport type. This is an integer type that maps to pointers in the expected way (and will fail to exist on an obscure platform that offers no such possible map).
There aren't any standard macros but you can roll your own easily enough. The code (sans macros) might look like:
void T_func(uintptr_t t)
{
uint8_t tag = (t & 7);
T *ptr = (T *)(t & ~(uintptr_t)7);
// ...
}
int main()
{
T *ptr = new T;
assert( ((uintptr_t)ptr % 8) == 0 );
T_func( (uintptr_t)ptr + 3 );
}
This may defeat compiler optimizations that involve tracking pointer usage.
Well, GCC at least can compute the size of bit-fields, so you can get portability across platforms (I don't have an MSVC available to test with). You can use this to pack the pointer and tag into an intptr_t, and intptr_t is guaranteed to be able to hold a pointer.
#include <limits.h>
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <inttypes.h>
struct tagged_ptr
{
intptr_t ptr : (sizeof(intptr_t)*CHAR_BIT-3);
intptr_t tag : 3;
};
int main(int argc, char *argv[])
{
struct tagged_ptr p;
p.tag = 3;
p.ptr = (intptr_t)argv[0];
printf("sizeof(p): %zu <---WTF MinGW!\n", sizeof p);
printf("sizeof(p): %lu\n", (unsigned long int)sizeof p);
printf("sizeof(void *): %u\n", (unsigned int)sizeof (void *));
printf("argv[0]: %p\n", argv[0]);
printf("p.tag: %" PRIxPTR "\n", p.tag);
printf("p.ptr: %" PRIxPTR "\n", p.ptr);
printf("(void *)*(intptr_t*)&p: %p\n", (void *)*(intptr_t *)&p);
}
Gives:
$ ./tag.exe
sizeof(p): zu <---WTF MinGW!
sizeof(p): 8
sizeof(void *): 8
argv[0]: 00000000007613B0
p.tag: 3
p.ptr: 7613b0
(void *)*(intptr_t*)&p: 60000000007613B0
I've put the tag at the top, but changing the order of the struct would put it at the bottom. Then shifting the pointer-to-be-stored right by 3 would implement the OP's use case. Probably make macros for access to make it easier.
I also kinda like the struct because you can't accidentally dereference it as if it were a plain pointer.

Best way to set bits of fields in union

Let's say I have the following
struct S {
union {
uint8_t flags;
struct {
uint8_t flag2bits : 2;
uint8_t flag1bit : 1;
};
};
};
S s;
s.flag2bits = 2;
s.flag1bit = 1; // this will wipe out the values of other bits
What's the best way to assign value to a specific bit without affecting other bit fields?
I can shift around and then assign and then shift again but it means once someone changes the order of the bit fields, the code is broken....
I can shift around and then assign and then shift again but it means
once someone changes the order of the bit fields, the code is
broken....
No, it doesn't mean the code is broken. You can change the bitfields whatever (in any order/you can leave some of them unset) you like
In your example:
S s;
s.flag2bits = 2;
s.flag1bit = 1;
Changing flag2bits will not affect value stored in flag1bit.
However, your problem may be related to the union you hold in your struct. Changing the flags variable will affect both of the bitfields, as you are storing them in a separate struct.
I hope this example will explain the case here:
#include <iostream>
#include <cstdint>
struct S {
union {
uint8_t flags;
struct {
uint8_t flag2bits : 2;
uint8_t flag1bit : 1;
};
};
};
int main(int argc, char *argv[]) {
S s;
s.flag2bits = 2;
s.flag1bit = 1;
std::cout << int(s.flag2bits) << int(s.flag1bit) << std::endl;
s.flags = 4; // As you are using union, at this point you are overwriting
// values stored in your (nested) struct
std::cout << int(s.flag2bits) << int(s.flag1bit) << std::endl;
return 0;
}
EDIT: As #M.M points out, it's undefined behavior to read from the member of the union that wasn't most recently written. Though at least on clang-3.5, the code above would print:
21
01
which illustrates the point I am trying to make (i.e. overwriting of union fields).
I would consider removing union from your struct S code, though I may not see the whole picture of what you are trying to achieve.
The C++ compiler will manage the bits for you. You can just set the values as you have it. Only the appropriate bits will be set.
Did you try it?

Union hack for endian testing and byte swapping

For a union, writing to one member and reading from other member (except for char array) is UB.
//snippet 1(testing for endianess):
union
{
int i;
char c[sizeof(int)];
} x;
x.i = 1; // writing to i
if(x.c[0] == 1) // reading from c[0]
{ printf("little-endian\n");
}
else
{ printf("big-endian\n");
}
//snippet 2(swap bytes using union):
int swapbytes()
{
union // assuming 32bit, sizeof(int)==4
{
int i;
char c[sizeof(int)];
} x;
x.i = 0x12345678; // writing to member i
SWAP(x.ch[0],x.ch[3]); // writing to char array elements
SWAP(x.ch[1],x.ch[2]); // writing to char array elements
return x.i; // reading from x.i
}
Snippet 1 is legal C or C++ but not snippet 2. Am I correct? Can some one point to the section of standard where it says its OK to write to a member of union and read from another member which is a char array.
There is a really simple way that gets round the undefined behaviour (well undefinied behvaiour that is defined in pretty much every compiler out there ;)).
uint32_t i = 0x12345678;
char ch[4];
memcpy( ch, &i, 4 );
bool bLittleEndian = ch[0] == 0x78;
This has the added bonus that pretty much every compiler out there will see that you are memcpying a constant number of bytes and optimise out the memcpy completely resulting in exactly the same code as your snippet 1 while staying totally within the rules!
I believe it (snippet 1) is technically not allowed, but most compilers allow it anyway because people use this kind of code. GCC even documents that it is supported.
You will have problems on some machines where sizeof(int) == 1, and possibly on some that are neither big endian nor little endian.
Either use available functions that change words to the proper order, or set this with a configuration macro. You probably need to recognize compiler and OS anyway.