Difference between pointer to char and pointer to word

Difference between pointer to char and pointer to word - c++

I've been getting warnings from Lint (740 at http://www.gimpel.com/html/pub/msg.txt) to the effect that it warns me not to cast a pointer to a union to a pointer to an unsigned long. I knew I was casting incompatible types so I was using a reinterpret_cast and still I got the warning which surprised me.
Example:
// bar.h
void writeDWordsToHwRegister(unsigned long* ptr, unsigned long size)
{
// write double word by double word to HW registers
...
};
// foo.cpp
#include "bar.h"
struct fooB
{
...
}
union A
{
unsigned long dword1;
struct fooB; // Each translation unit has unique content in the union
...
}
foo()
{
A a;
a = ...; // Set value of a
// Lint warning
writeDWordsToHwRegister(reinterpret_cast<unsigned long*> (&a), sizeof(A));
// My current triage, but a bad one since someone, like me, in a future refactoring
// might redefine union A to include a dword0 variable in the beginning and forget
// to change below statement.
writeDWordsToHwRegister(reinterpret_cast<unsigned long*> (&(a.dword1)), sizeof(A));
}
Leaving aside exactly why I was doing it and how to solve it in the best way (void* in interface and cast to unsigned long* in writeDWordsToHwRegister?), reading the Lint warning explained that on some machines there was a difference between pointer to char and pointer to word. Could someone explain how that difference could manifest itself and maybe give examples on some processors that shows these differences? Are we talking alignment issues?
Since its an embedded system we do use exotic and in house cores so if bad things can happen, they probably will.

Generally difference between pointers do refer to the fact that different types have different sizes and if you do a pointer+=1 you will get different results if p is a pointer to char or if it is a pointer to word.

The compiler assumes that pointers to As and pointers to longs (which are usually dwords, but might just be words in your case) do not point to the same area of memory. This makes a number of optimizations okay: For example, when writing to somewhere pointed to A*, prior loads from long* do not need to be updated. This is called aliasing - or in this case, the lack thereof. But in your case, it has the effect that the code produced might actually not work as expected.
To make this portable, you first have to copy your data through a char buffer, which has an exception to the anti-aliasing rule. chars alias with everything. So when seeing a char, the compiler has to assume it can point to anything. For example, you could do this:
char buffer[sizeof(A)];
// chars aliases with A
memcpy(buffer, reinterpret_cast<char*>(&a), sizeof(A));
// chars also aliases with unsigned long
writeWordsToHwRegister(reinterpret_cast<unsigned long*> (buffer), sizeof(A));
If you have any more questions, look up "strict aliasing" rules. It is actually a pretty well known issue by now.

I know that on some machines, pointers to char and pointers to word are actually different, as pointer to char needs extra bits due to the way memory is addressed.
There are some machines (mainly DSPs, but I think old DEC machines did this too) where this is the case.
This means if you reinterpret_cast something to char on one of these machines, the bit pattern is necessarily valid.
As a pointer to a union can in theory point to any member of it, it means a union pointer then has to contain something to allow you to succesfully use it to point to a char or a word. Which in turn means that reinterpret_casting it will end up with bits that mean something to the compiler being used as if they were part of a valid address
For instance if a pointer is 0xfffa where the 'a' is some magic that the compiler uses to help it work out what to do when you say unionptr->charmember (perhaps nothing) and something different when you do unionptr->wordmember (perhaps convert it to 3ff before using it), when you reinterpret_cast it to long *, you still have fffa, because reinterpret_cast does nothing to the bit pattern.
Now you have something the compiler thinks is a pointer to long, containing fffa, whereas it should be (say) 3ff.
Which is likely to result in a nasty crash.

A char* can be byte-aligned (anything!), whereas a long* generally needs to be aligned to a 4-byte boundary on any modern processor.
On bigger iron, you'll get some crash when you try accessing a long on a mis-aligned boundary (say SIGBUS on *nix). However, on some embedded systems you can just quietly get some odd results which makes detection difficult.
I've seen this happen on ARM7, and yes, it was hard to see what was going on.

I'm not sure why you think a pointer to char is involved - you're casting a pointer to union A to a pointer to long. The best fix would probably be to change:
void writeWordsToHwRegister(unsigned long* ptr, unsigned long size)
to:
void writeWordsToHwRegister(const void * ptr, unsigned long size)

Related

Why pass a pointer as a (char ) and cast to a (long )

I know legacy is always a justification, but I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on,
static int show_open_tables(THD *, SHOW_VAR *var, char *buff) {
var->type = SHOW_LONG;
var->value = buff;
*((long *)buff) = (long)table_cache_manager.cached_tables();
return 0;
}
Here they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.
I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.
Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?

There seems to be two questions here. So I've split my answer up.
Using char*
Using a char* is fine. Character types (char, signed char, and unsigned char) are specially treated by the C and C++ standards. The C standard defines the following rules for accessing an object:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This effectively means character types are the closest the standards come to defining a 'byte' type (std::byte in C++17 is just defined as enum class byte : unsigned char {})
However, as per the above rules casting a char* to a long* and then assigning to it is incorrect (although generally works in practice). memcpy should be used instead. For example:
long cached_tables = table_cache_manager.cached_tables();
memcpy(buf, &cached_tables, sizeof(cached_tables));
void* would also be a legitimate choice. Whether it is better is a mater of opinion. I would say the clearest option would be to add a type alias for char to convey the intent to use it as a byte type (e.g. typedef char byte_t). Of the top of my head though I can think of several examples of prominent libraries which use char as is, as a byte type. For example, the Boost memory mapped file code gives a char* and leveldb uses std::string as a byte buffer type (presumably to taking advantage of SSO).
Regarding uinptr_t:
uintptr_t is an optional type defined as an unsigned integer capable of holding a pointer. If you want to store the address of a pointed-to object in an integer, then it is a suitable type to use. It is not a suitable type to use here.

they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.
Or something. That code is hideous.
I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.
Who knows? Who knows what the guy was on when he wrote it? Who cares? That code is hideous, we certainly shouldn't be trying to learn from it.
Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?
Yes, and it still is. The purpose of uintptr_t is to define an integer type that is big enough to hold a pointer.
I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on
You might have reservations about doing so but I certainly don't, that API is just a blatant lie. The way to do it (if you absolutely have to) would (obviously) be:
static int show_open_tables(THD *, SHOW_VAR *var, long *buff) {
var->type = SHOW_LONG;
var->value = (char *) buff;
*buff = (long)table_cache_manager.cached_tables();
return 0;
}
Then at least it is no longer a ticking time bomb.
Hmmm, OK, maybe (just maybe) that function is used in a dispatch table somewhere and therefore needs (unless you cast it) to have a specific signature. If so, I'm certainly not going to dig through 10,000 lines of code to find out (and anyway, I can't, it's so long it crashes my tablet).
But if anything, that would just make it worse. Now that timebomb has become a stealth bomber. And anyway, I don't believe it's that for a moment. It's just a piece of dangerous nonsense.

What is the difference between these two casts?

I have a function in my c++ application that needs an integer as an input. Sadly this integer is only available in form of an usigned char array, which inclines me to do this:
unsigned char c[4] = {'1','2','3','4'};
void myFuncThatBadlyNeedsInts(int i)
//compares some memory value(which is an int) with anotherone...
myFuncThatBadlyNeedsInts((int)c);
This gives me an error, which tells me that this is not allowed.
But if i decide to get tricky and do this:
myFuncThatBadlyNeedsInts(*((int*)&c));
Now the program goes about and gives me always the result i want. My question is: Why is there a diffrence in the result of the two casts?
Shouldn't they both do the same, with the diffrence i have two unneccessary pointers in the process?
Help or the guidance to an alredy existing answer to my qustion is much appreciated.
EDIT (since i can't comment): The need for this indeed silly conversion is inheritated from a project which compares a specific memory location (as an int) with a DWORD wich is retrived from a FGPA and comes as an array. The DWORD gets read in the end as one hex-number.
I'll try to get permission to change this and THANK YOU ALL for the quick responses. I really didn't get the part of this program nor did I understand why it worked like this in the first place. Now I know someone got lucky
P.S.: Since im new here and this my first qustion please let me know what other specifics you might need or just edit my newby misshabits away.

When you do myFuncThatBadlyNeedsInts((int)c) the compiler first decay the array c to a pointer to the first element, i.e. &c[0], you then cast this pointer to an int and pass that to the function.
When you do *((int*)&c) you take the address of the array (of type int (*)[4]) and tell the compiler that it's a pointer to an int (which is not correct) and then dereference that (incorrect) int*.
So both calls are actually incorrect. The casting just silences the compiler.
If you want to treat the four bytes of the array as a single 32-bit word, there are ways to do it, but they all breaks the strict aliasing rule.
The simplest way is very close to what you have now, and is done with casting. Using C-casting you cast the pointer that c decays to as a pointer to int and dereference that:
myFuncThatBadlyNeedsInts(*(int*)c);
Note that this is not the same thing as either of your attempts.
The second way is to use a union:
union my_union
{
char bytes[sizeof(int)];
int integer;
};
Then copy the data to your unions bytes member, and read out the integer.

In the first case you are trying to cast an char array to an int - this is obviously meaningless in that an list of characters is quite different to an int.
In The second case you first take the address of the array - the & operator gives you a character pointer to the first element of the array.
Specifically the type of &c is unsigned char * - it is legal (although dangerous) to cast between pointer types thus the cast from unsigned char * to int * is legal.
Then you dereference the pointer and get the integer that is at this spot which is probably some nasty (meaningless) number derived from the first couple of characters in the string which are those bytes.
So you second solution doesn't convert from char[] to int[] which is presumably what you want, instead it give you an integer representation of the first bytes of the char array.

In the second case you get pointer from unsigned char than cast it to integer, so in fact you always use your uchar and 3 bytes just after (in this case whole array c). Because of sizeof int is 4 (usually, but not always), and size of uchar is only 1. So don't do this unless you like to shoot yourself in leg.
To be honest I don't really understand what you are going to achive in this example

Inline assembly inside C++ for data conversion

I am trying to write a C++ code for conversion of assembly dq 3FA999999999999Ah into C++ double. What to type inside asm block? I dont know how to take out the value.
int main()
{
double x;
asm
{
dq 3FA999999999999Ah
mov x,?????
}
std::cout<<x<<std::endl;
return 0;
}

From the comments it sounds a lot like you want to use a reinterpret cast here. Essentially what this does is to tell the compiler to treat the sequence of bits as if it were of the type that it was casted to but it doesn't do any attempt to convert the value.
uint64_t raw = 0x3FA999999999999A;
double x = reinterpret_cast<double&>(raw);
See this in action here: http://coliru.stacked-crooked.com/a/37aec366eabf1da7
Note that I've used the specific 64bit integer type here to make sure the bit representation required matches that of the 64bit double. Also the cast has to be to double& because of the C++ rules forbidding the plain cast to double. This is because reinterpret cast deals with memory and not type conversions, for more details see this question: Why doesn't this reinterpret_cast compile?. Additionally you need to be sure that the representation of the 64 bit unsigned here will match up with the bit reinterpretation of the double for this to work properly.
EDIT: Something worth noting is that the compiler warns about this breaking strict aliasing rules. The quick summary is that more than one value refers to the same place in memory now and the compiler might not be able to tell which variables are changed if the change occurs via the other way it can be accessed. In general you don't want to ignore this, I'd highly recommend reading the following article on strict aliasing to get to know why this is an issue. So while the intent of the code might be a little less clear you might find a better solution is to use memcpy to avoid the aliasing problems:
#include <iostream>
int main()
{
double x;
const uint64_t raw = 0x3FA999999999999A;
std::memcpy(&x, &raw, sizeof raw);
std::cout<<x<<std::endl;
return 0;
}
See this in action here: http://coliru.stacked-crooked.com/a/5b738874e83e896a
This avoids the issue with the aliasing issue because x is now a double with the correct constituent bits but because of the memcpy usage it is not at the same memory location as the original 64 bit int that was used to represent the bit pattern needed to create it. Because memcpy is treating the variable as if it were an array of char you still need to make sure you get any endianness considerations correct.

Datatype declaration significance in pointer to pointer (C/C++)

Is there a difference between pointer to integer-pointer (int**) and pointer to character-pointer (char**), and any other case of pointer to pointer?
Isn't the memory block size for any pointer is the same, so the sub-datatype doesn't play a role in here?
Is it just a semantic distinction with no other significance?
Why not to use just void**?

Why should we use void** when you want a pointer to a char *? Why should we not use char **?
With char **, you have type safety. If the pointer is correctly initialized and not null, you know that by dereferencing it once you get a valid char * - and by dereferencing that pointer, in turn, you get a char.
Why should you ignore this advantage in type safety, and instead play pointer Russian roulette with void**?

The difference is in type-safety. T** implicitly interprets the data as T. void**, however, needs to be manually casted first. And no, pointers are not all 4 / 8 bytes on 32 / 64bit architectures respectively. Member function pointers, for instance, contain offset information too, which needs to be stored in the pointer itself (in the most common implementation).

Most C implementations use the same size and format for all pointers, but this is not required by the C standard.
Some machines do not have byte addressing, so the C implementation implements it by using shifts and other operations. In these implementations, pointers to larger types, such as int, may be normal addresses, but pointers to char would have to have both a machine address and a byte-within-word offset.
Additionally, C makes use of the type information for a variety of purposes, including reducing mistakes made by programmers (possibly giving warnings or errors when you attempt to use a pointer to int where a pointer to float is needed) and optimization. Regarding optimization, consider this example:
void foo(float *array, int *limit)
{
for (int i = 0; i < *limit; ++i)
array[i] = <some calculation>;
}
The C standard says a compiler may use the fact that array and limit are pointers to different types to conclude that they do not overlap. Given this rule, the C implementation may evaluate *limit once when the loop starts, because it knows it will not change during the loop. Without this rule, the compiler would have to assume that one of the assignments to array[i] might change *limit, and it would have to load *limit from memory in each iteration.

Union: Reading from one data member of a union to write into another

I know that for the code below, "Illegal" below is undefined (while some compilers allow it), because union member "a" is active, and then we read from union member "b".
The question is, does the code in "AmILegal" fix it, or am I doing something scary and even more obscure? Can I use memcpy to achieve the same effect or is there another undefined behaviour I am invoking there?
EDIT: Maybe the example is not clear enough. All I want to do is activate the other member.
So I am changing float to int. Although it seems dumb, it is closer to the real case. Read BELOW the code.
(Is it for some reason disallowed to copy one union member into another?)
struct Foo
{
union Bar
{
int a[4];
int b[4];
};
void this_is_Illegal()
{
a[0]=1;
a[1]=2;
a[2]=3;
a[3]=4;
std::cout<<b[0]<<b[1]<<b[2]<<b[3];
}
void but_is_this_Legal?()
{
a[0]=1;
a[1]=2;
a[2]=3;
a[3]=4;
b[0]=a[0];
b[1]=a[1];
b[2]=a[2];
b[3]=a[3];
std::cout<<b[0]<<b[1]<<b[2]<<b[3];
}
void this_looks_scary_but_is_it?()
{
a[0]=1;
a[1]=2;
a[2]=3;
a[3]=4;
//forget portability for this q, assume sizeof(int)==sizeof(float)
//maybe memmove works here as well?
memcpy(b, a, sizeof(int)*4)
std::cout<<b[0]<<b[1]<<b[2]<<b[3];
}
};
If all of the above does not sound very useful, think that a is in truth an _m128 unioned with a float[4]. The bit representation is exact and correct, always.
At one point in time, you WILL need to actually use it, and you NEED to have it in main memory as an array of floats.
The "copy instruction" is in truth an _mm_store_ps from the _m128 union member to the float[4] member. Hence the question about the memset - maybe it is the more exact example to what I need...

The second function is perfectly legal - but doesn't do the same thing, since it will perform an int to float conversion rather than leaving the bits unchanged.
To be honest I would just stick with the first one - the behaviour is technically undefined, but I suspect it just does the right thing for you.
The third one switches one form of undefined behaviour for another (once you've written arbitrary bytes into a float, anything could happen). But if you know the bytes really represent a valid floating point value, it's fine.

the this_is_illegal,this_is_legal? pretty much the standard way to use enums ;)
but the memcpy will not work, becayse &a and &b are at the same address because of the enum and memcpy will do nothing
because &a and &b are at the same address you can do some intresting things with the enum - in your case interpret a float as an integer is the built-in feature of your enum, but auto casting can't be triggered, because they are at the same address
you might want to look at attribute((packed)) because it helps to declare protocol structs/enums

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js