Cast char* to short* - c++

I want to sum up all bytes of my structure. I read that I should cast pointer of my structure from char to short. Why?
Does casting using (short) from char to short is correct?
My code
#include <stdio.h>
#include <string.h>
struct pseudo_header
{
int var1;
int var2;
char name[25];
};
void csum(const unsigned short* ptr, int nbytes)
{
unsigned long sum = 0;
for(int i = 0; i < sizeof(struct pseudo_header); i++)
{
printf("%#8x\n", ptr[i]);
sum+= ptr[i];
}
printf("%#8x", sum);
}
int main() {
struct pseudo_header psh = {0};
char datagram[4096];
psh.var1 = 10;
psh.var2 = 20;
strcpy(psh.name, "Test");
memcpy(datagram, &psh, sizeof(struct pseudo_header));
csum((unsigned short*)datagram, sizeof(struct pseudo_header));
return 0;
}
It looks like it works, but I can't verify this. Any help is appreciated.

No, the behaviour on dereferencing a pointer that's been set to the result of a cast from a char* to a short* is undefined, unless the data to which char* is pointing was originally a short object or array; which yours isn't.
The well-defined way (in both C and C++) to analyse memory is to use an unsigned char*, but be careful not to traverse your memory so as to reach areas that are not owned by your program.

Basically this works because you cleared the structure with zero. = {0}.
You can give the function a pointer to a structure struct *pseudo_header.
I would see an alignment issue.
I would check sizeof(struct ..) for expected value 33 if I have to add a pragma pack() statement before the structure and then cast to unsigned char* inside the function.
Test your function with a 25 chars length name.

short is at least two bytes. So if you want to sum all the bytes then casting to short* is wrong. Instead cast to unsigned char*.

Casting (aliasing) a pointer with any type other than a char * violates the strict aliasing rule. Gcc will optimize based on assumptions of the strict aliasing rule and can lead to interesting bugs. Tge only truly safe way round the strict aliasing rule is to use a memcpy. Gcc supports memcpy as a compiler intrinsic so can optimize into the copy.
Alternatively, you can disable strict aliasing with the -fno-strict-aliasing flag.
PS - I am unclear if a union provides a suitable way round the strict aliasing rule.

Related

Casting and writing in pointer array reports misaligned address with clang sanitizer

I'm using a char* array to store different data types, like in the next example:
int main()
{
char* arr = new char[8];
*reinterpret_cast<uint32_t*>(&arr[1]) = 1u;
return 0;
}
Compiling and running with clang UndefinedBehaviorSanitizer will report the following error:
runtime error: store to misaligned address 0x602000000011 for type 'uint32_t' (aka 'unsigned int'), which requires 4 byte alignment
I suppose I could do it another way, but why is this undefined behavior? What concepts are involved here?
You cannot cast an arbitrary char* to uint32_t*, even if it points to an array large enough to hold a uint32_t
There are a couple reasons why.
The practical answer:
uint32_t generally likes 4-byte alignment: its address should be a multiple of 4.
char does not have such a restriction. It can live at any address.
That means that an arbitrary char* is unlikely to be aligned properly for a uint32_t.
The Language Lawyer answer:
Aside from the alignment issue, your code exhibits undefined behavior because you're violating the strict aliasing rules. No uint32_t object exists at the address you're writing to, but you're treating it as if there is one there.
In general, while char* may be used to point to any object and read its byte representation, a T* for any given type T, cannot be used to point at an array of bytes and write the byte-representation of the object into it.
No matter the reason for the error, the way to fix it is the same:
If you don't care about treating the bytes as a uint32_t and are just serializing them (to send over a network, or write to disk, for example), then you can std::copy the bytes into the buffer:
char buffer[BUFFER_SIZE] = {};
char* buffer_pointer = buffer;
uint32_t foo = 123;
char* pfoo = reinterpret_cast<char*>(&foo);
std::copy(pfoo, pfoo + sizeof(foo), buffer_pointer);
buffer_pointer += sizeof(foo);
uint32_t bar = 234;
char* pbar = reinterpret_cast<char*>(&bar);
std::copy(pbar, pbar + sizeof(bar), buffer_pointer);
buffer_pointer += sizeof(bar);
// repeat as needed
If you do want to treat those bytes as a uint32_t (if you're implementing a std::vector-like data structure, for example) then you will need to ensure the buffer is properly-aligned, and use placement-new:
std::aligned_storage_t<sizeof(uint32_t), alignof(uint32_t)> buffer[BUFFER_SIZE];
uint32_t foo = 123;
uint32_t* new_uint = new (&buffer[0]) uint32_t(foo);
uint32_t bar = 234;
uint32_t* another_new_uint = new (&buffer[1]) uint32_t(foo);
// repeat as needed

The safe and standard-compliant way of accessing array of integral type as an array of another unrelated integral type?

Here's what I need to do. I'm sure it's a routine and recognizable coding task for many C++ developers out there:
void processAsUint16(const char* memory, size_t size) {
auto uint16_ptr = (const uint16_t*)memory;
for (size_t i = 0, n = size/sizeof(uint16_t); i < n; ++i) {
std::cout << uint16_ptr[i]; // Some processing of the other unrelated type
}
}
Problem: I'm developing with an IDE that integrates clang static code analysis, and every way of casting I tried, short of memcpy (which I don't want to resort to) is either discouraged or strongly discouraged. For example, reinterpret_cast is simply banned by the CPP Core Guidelines. C-style cast is discouraged. static_cast cannot be used here.
What's the right way of doing this that avoids type aliasing problems and other kinds of undefined behavior?
What's the right way of doing this that avoids type aliasing problems and other kinds of undefined behavior?
You use memcpy:
void processAsUint16(const char* memory, size_t size) {
for (size_t i = 0; i < size; i += sizeof(uint16_t)) {
uint16_t x;
memcpy(&x, memory + i, sizeof(x));
// do something with x
}
}
uint16_t is trivially copyable, so this is fine.
Or, in C++20, with std::bit_cast (which awkwardly has to go through an array first):
void processAsUint16(const char* memory, size_t size) {
for (size_t i = 0; i < size; i += sizeof(uint16_t)) {
alignas(uint16_t) char buf[sizeof(uint16_t)];
memcpy(buf, memory + i, sizeof(buf));
auto x = std::bit_cast<uint16_t>(buf);
// do something with x
}
}
Practically speaking, compilers will just "do the right thing" if you just reinterpret_cast, even if it's undefined behavior. Perhaps something like std::bless will give us a more direct, non-copying, mechanism of doing this, but until then...
My preference would be to treat the array of char as a sequence of octets in a defined order. This obviously doesn't work if it actually can be either order depending on target architecture, but in practise, a memory buffer like this usually comes from a file or a network connection.
void processAsUint16(const char* memory, size_t size) {
for (size_t i = 0; i < size; i += 2) {
const unsigned char lo = memory[i];
const unsigned char hi = memory[i+1];
const uint16_t x = lo + hi*256; // or "lo | hi << 8"
// do something with x
}
}
Note that we do not use sizeof(uint16_t) here. memory is a sequence of octets, so even if CHAR_BITS is 16, there will be two chars needed to hold a uint16_t.
This can be a little bit cleaner if memory can be declared as unsigned char - no need for the definition of lo and hi.

uint32_t pointer to the same location as uint8_t pointer

#include <iostream>
int main(){
uint8_t memory[1024];
memory[0] = 1;
memory[1] = 1;
uint32_t *test = memory;
//is it possible to get a value for *test that would be in this example 257?
}
I want to create a uin32_t pointer to the same adress as the uint8_t pointer. Is this possible without using new(adress)? I don't want to lose the information at the adress. I know pointers are just adresses and therefor I should be able to just set the uint32_t pointer to the same adress.
This code produces an error:
invalid conversion from 'uint8_t*' to 'uint32_t*' in initialization
This would be a violation of so-called Strict Aliasing Rule, so it can not be done. Sad, but true.
Use memcpy to copy data and in many cases compilers will optimize memory copy and generate the same code as they would with cast, but in Standard-conforming way.
As already mentioned you cannot convert uint8_t * to uint32_t * due to strict aliasing rule, you can convert uint32_t * to unsigned char * though:
#include <iostream>
int main(){
uint32_t test[1024/4] = {}; // initialize it!
auto memory = reinterpret_cast<unsigned char *>( test );
memory[0] = 1;
memory[1] = 1;
std::cout << test[0] << std::endl;
}
this is not portable code due to Endianness, but at least it does not have UB.
This question completely ignores the concept of endian-ness; while your example has the lower and upper byte the same value, if the byte order is swapped it makes no difference; but in the case where it is; your number will be wrong unexpectedly.
As such, there's no portable way to use the resulting number.
You can do that with union. As mentioned above, you have to be aware of endianness of target device, but in most cases it will be little-endian. And there is also a bit of controversy about using unions in such way, but fwiw it's getting a job done and for some uses it's good enough.
#include <iostream>
int main(){
union {
uint8_t memory[1024] = {};
uint32_t test[1024/4];
};
memory[0] = 1;
memory[1] = 1;
std::cout << test[0]; // 257
}
uint32_t *test =(uint32_t*) memory;
uint32_t shows that the memory pointed by test should contain uint32_t .

Typecaste char[] into structure and retrieve the values of the structure.?

#include <iostream> // std::cout
using namespace std;
struct mystruct
{
unsigned int a;
unsigned char b;
unsigned long long c;
};
int main ()
{
unsigned char str[1];
unsigned int a,b,c;
str[0]=1; // str[0]=??????
mystruct* obj = (mystruct *)(&(str[0]));
c=obj->c;
a=(unsigned int)obj->a;
b=(unsigned int)obj->b;
cout<<"a="<<a<<"\t b="<<b<<"\t c="<<c<<endl;
}
Is it possible do the above thing? If yes, then:
What should I fill in str[0] so that I get value of a=1,b=257,c=1?
currently I'm getting below output:
a=1 b=0 c=8388449
Unless you are coding for a microcontroller on a compiler with very defined semantics, you shouldn't be doing that. The reason is that the struct could have paddings, the computer could be little or big endian, sizeof(int) is not the same on all computers, and char is not necessarily 8 bits either.
This is besides the fact that your str is too short anyway.
While this is undefined behavior in C, on microcontrollers these things are often well-defined and can be used. One example would be:
unsigned char str[sizeof(struct mystruct)];
struct mystruct* obj = (void *)str;
To know the conversion between the contents of str and obj, you would need to exactly know how your compiler pads the struct as well as the sizeof each member and the endian-ness of the computer.
Again, unless in very specific locations, this kind of coding is plain wrong.

Casting Function Pointer to Integer in C++

I have an array of unsigned integers that need to store pointers to data and functions as well as some data. In the device I am working with, the sizeof pointer is the same as sizeof unsigned int. How can I cast pointer to function into unsigned int? I know that this makes the code not portable, but it is micro controller specific. I tried this:
stackPtr[4] = reinterpret_cast<unsigned int>(task_ptr);
but it give me an error "invalid type conversion"
Casting it to void pointer and then to int is messy.
stackPtr[4] = reinterpret_cast<unsigned int>(static_cast<void *> (task_ptr));
Is there a clean way of doing it?
Edit - task_ptr is function pointer void task_ptr(void)
Love Barmar's answer, takes my portability shortcoming away. Also array of void pointer actually makes more sense then Unsigned Ints. Thank you Barmar and isaach1000.
EDIT 2: Got it, my compiler is thinking large memory model so it is using 32 bit pointers not 16 bit that I was expecting (small micros with 17K total memory).
A C-style cast can fit an octogonal peg into a trapezoidal hole, so I would say that given your extremely specific target hardware and requirements, I would use that cast, possibly wrapped into a template for greater clarity.
Alternately, the double cast to void* and then int does have the advantage of making the code stand out like a sore thumb so your future maintainers know something's going on and can pay special attention.
EDIT for comment:
It appears your compiler may have a bug. The following code compiles on g++ 4.5:
#include <iostream>
int f()
{
return 0;
}
int main()
{
int value = (int)&f;
std::cout << value << std::endl;
}
EDIT2:
You may also wish to consider using the intptr_t type instead of int. It's an integral type large enough to hold a pointer.
In C++ a pointer can be converted to a value of an integral type large enough to hold it. The conditionally-supported type std::intptr_t is defined such that you can convert a void* to intptr_t and back to get the original value. If void* has a size equal to or larger than function pointers on your platform then you can do the conversion in the following way.
#include <cstdint>
#include <cassert>
void foo() {}
int main() {
void (*a)() = &foo;
std::intptr_t b = reinterpret_cast<std::intptr_t>(a);
void (*c)() = reinterpret_cast<void(*)()>(b);
assert(a==c);
}
This is ansi compliant:
int MyFunc(void* p)
{
return 1;
}
int main()
{
int arr[2];
int (*foo)(int*);
arr[0] = (int)(MyFunc);
foo = (int (*)(int*))(arr[0]);
arr[1] = (*foo)(NULL);
}