Using mmap with /dev/mem - Is it okay to use reinterpret_cast - c++

I'm using mmap with /dev/mem. I've seen examples in C use the following pattern:
#define OFFSET = ...;
int fd = 0;
void* base;
fd = open("/dev/mem", ...);
base = mmap(..., fd, ...);
// Below is line of interest.
*((uint32_t*)(base + OFFSET)) = 23;
First question - What is happening here?
It looks like we are adding an offset value to a void*, then casting it to uint32_t* and then assigning a number to it. Why can't we just declare base as uint32_t*? Why cast it just before assigning it?
Second question - How would I do this in C++?
The following works from bits and pieces on the net. But it's basically me trying whack-a-mole with reinterpret_cast and static_cast and seeing which one gives me the right result and doesn't throw errors or warnings. Also replaced void* with uint8_t* to prevent compiler warning me of arithmetic on void pointer. I don't know why it works or if it's even the right way to do it. Help me not shoot myself in the foot?
#define OFFSET = ...;
int fd = 0;
uint8_t* base;
fd = open("/dev/mem", ...);
base = reinterpret_cast<uint8_t*>(mmap(..., fd, ...));
*(reinterpret_cast<uint32_t*>(base + OFFSET)) = 23;

base + OFFSET is a gcc extension where pointer arithmetic works on void *. "Why can't we just declare base as uint32_t*?" Because the arithmetic would come out wrong. void * arithmetic is in bytes.
There seems to be some confusion as to the meaning of arithmetic in bytes. OFFSET is the number of bytes (not 32 bit integers) from base where the value 23 needs to go; so casting to uint32_t first would move four times as far. This kind of code is typical of code that writes to heterogeneous data-structures directly rather than using a struct. There's pros and cons of declaring a struct vs. writing the accesses out directly. In the ancient days they usually declared a struct, but the pendulum has swung back because of alignment issues. It's easier to ensure the compiler doesn't hose the struct definition by inserting padding by writing it out longhand.
base = reinterpret_cast<uint8_t*>(mmap(..., fd, ...));
*(reinterpret_cast<uint32_t*>(base + OFFSET)) = 23;
This is indeed what you'd have to do to say it in standard C++; though for code like this people tend to use C style casts in C++ (not going to debate it; just pointing it out).

Related

What is the use of intptr_t?

I know it is an integer type that can be cast to/from pointer without loss of data, but why would I ever want to do this? What advantage does having an integer type have over void* for holding the pointer and THE_REAL_TYPE* for pointer arithmetic?
EDIT
The question marked as "already been asked" doesn't answer this. The question there is if using intptr_t as a general replacement for void* is a good idea, and the answers there seem to be "don't use intptr_t", so my question is still valid: What would be a good use case for intptr_t?
The primary reason, you cannot do bitwise operation on a void *, but you can do the same on a intptr_t.
On many occassion, where you need to perform bitwise operation on an address, you can use intptr_t.
However, for bitwise operations, best approach is to use the unsigned counterpart, uintptr_t.
As mentioned in the other answer by #chux, pointer comparison is another important aspect.
Also, FWIW, as per C11 standard, §7.20.1.4,
These types are optional.
There's also a semantic consideration.
A void* is supposed to point to something. Despite modern practicality, a pointer is not a memory address. Okay, it usually/probably/always(!) holds one, but it's not a number. It's a pointer. It refers to a thing.
A intptr_t does not. It's an integer value, that is safe to convert to/from a pointer so you can use it for antique APIs, packing it into a pthread function argument, things like that.
That's why you can do more numbery and bitty things on an intptr_t than you can on a void*, and why you should be self-documenting by using the proper type for the job.
Ultimately, almost everything could be an integer (remember, your computer works on numbers!). Pointers could have been integers. But they're not. They're pointers, because they are meant for different use. And, theoretically, they could be something other than numbers.
The uintptr_t type is very useful when writing memory management code. That kind of code wants to talk to its clients in terms of generic pointers (void *), but internally do all kinds of arithmetic on addresses.
You can do some of the same things by operating in terms of char *, but not everything, and the result looks like pre-Ansi C.
Not all memory management code uses uintptr_t - as an example, the BSD kernel code defines a vm_offset_t with similar properties. But if you are writing e.g. a debug malloc package, why invent your own type?
It's also helpful when you have %p available in your printf, and are writing code that needs to print pointer sized integral variables in hex on a variety of architectures.
I find intptr_t rather less useful, except possibly as a way station when casting, to avoid the dread warning about changing signedness and integer size in the same cast. (Writing portable code that passes -Wall -Werror on all relevant architectures can be a bit of a struggle.)
What is the use of intptr_t?
Example use: order comparing.
Comparing pointers for equality is not a problem.
Other compare operations like >, <= may be UB. C11dr §6.5.8/5 Relational operators.
So convert to intptr_t first.
[Edit] New example: Sort an array of pointers by pointer value.
int ptr_cmp(const void *a, const void *b) {
intptr_t ia = (intptr) (*((void **) a));
intptr_t ib = (intptr) (*((void **) b));
return (ia > ib) - (ia < ib);
}
void *a[N];
...
qsort(a, sizeof a/sizeof a[0], sizeof a[0], ptr_cmp);
[Former example]
Example use: Test if a pointer is of an array of pointers.
#define N 10
char special[N][1];
// UB as testing order of pointer, not of the same array, is UB.
int test_special1(char *candidate) {
return (candidate >= special[0]) && (candidate <= special[N-1]);
}
// OK - integer compare
int test_special2(char *candidate) {
intptr_t ca = (intptr_t) candidate;
intptr_t mn = (intptr_t) special[0];
intptr_t mx = (intptr_t) special[N-1];
return (ca >= mn) && (ca <= mx);
}
As commented by #M.M, the above code may not work as intended. But at least it is not UB. - just non-portably functionality. I was hoping to use this to solve this problem.
(u)intptr_t is used when you want to do arithmetic on pointers, specifically bitwise operations. But as others said, you'll almost always want to use uintptr_t because bitwise operations are better done in unsigned. However if you need to do an arithmetic right shift then you must use intptr_t1. It's usually used for storing data in the pointer, usually called tagged pointer
In x86-64 you can use the high 16/7 bits of the address for data, but you must do sign extension manually to make the pointer canonical because it doesn't have a flag for ignoring the high bits like in ARM2. So for example if you have char* tagged_address then you'll need to do this before dereferencing it
char* pointer = (char*)((intptr_t)tagged_address << 16 >> 16);
The 32-bit Chrome V8 engine uses smi (small integer) optimization where the low bits denote the type
|----- 32 bits -----|
Pointer: |_____address_____w1| # Address to object, w = weak pointer
Smi: |___int31_value____0| # Small integer
So when the pointer's least significant bit is 0 then it'll be right shifted to retrieve the original 31-bit signed int
int v = (intptr_t)address >> 1;
For more information read
Using the extra 16 bits in 64-bit pointers
Pointer magic for efficient dynamic value representations
Another usage is when you pass a signed integer as void* which is usually done in simple callback functions or threads
void* my_thread(void *arg)
{
intptr_t val = (intptr_t)arg;
// Do something
}
int main()
{
pthread_t thread1;
intptr_t some_val = -2;
int r = pthread_create(&thread1, NULL, my_thread, (void*)some_val);
}
1 When the implementation does arithmetic shift on signed types of course
2 Very new x86-64 CPUs may have UAI/LAM support for that

How to cast char array to int at non-aligned position?

Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?
No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.
There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];

Casting/dereferencing char pointers to a double array

Is there anything wrong with the casting a double pointer to a char pointer? Goal in the following code is to change the 1 element in three different ways.
double vec1[100];
double *vp = vec1;
char *yp = (char*) vp;
vp++;
vec1[1] = 19.0;
*vp = 12.0;
*((double*) (yp + (1*sizeof (vec1[0])))) = 34.0;
Casts of this type fall into the category of "OK if you know what you're doing but dangerous if you don't".
For example, in this case you already know the pointer value of "yp" (it was pointing to a double) so it is technically safe to increase its value by the size of a double and re-cast back to a double*.
A counter-example: suppose you didn't know where the char* came from...say, it was given to you as a function parameter. Now, your cast would be a big problem: since char* is technically 1-byte-aligned and a double is usually 8-byte-aligned, you can't be sure if you were given an 8-byte-aligned address. If it's aligned, your arithmetic would produce a valid double*; if not, it would crash when dereferenced.
This is just one example of how casts can go wrong. What you're doing (at first glance) looks like it will work but in general you really have to pay attention when you cast things.
With newer INTEL processors the main problem you can run into is alignment. Say you were to write something like this:
*((double*) (yp + 4)) = 34.0;
Then you are likely to have a runtime error because a double should be aligned on 8 bytes. This was also true on processors such as 68k, or MIPS.
This is similar to having a structure and doing casts on that structure. You are not unlikely to break things.
In most cases, if you can avoid such, your code will be a lot stronger. Personally, I do not even use such casts when reading a file. Instead, I get the data from the file and put it in a structure as required. Say I read 4 bytes in a buffer to convert to an integer, I'd write something like this:
unsigned char buf[4];
...
fread(buf, 1, 4, f);
my_struct.integer = buf[0] | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24);
Now I did not do an ugly cast and I could control the endianess of the integer in the file whatever the endian of the processor you are running with.

Casting long int as a struct pointer

I know this is a rather noobish question, but no amount of googling or permutations of code seem to work.
I have a structure which is defined like this.
typedef struct
{
int rate;
int duration;
} DummyStructure;
Now, i try to use code similar to this.
//
DummyStructure* structure;
DummyStructure* structure2;
long int point;
//
structure = (DummyStructure*)malloc(sizeof(DummyStructure));
structure->rate = 19;
structure->duration = 92;
point = (long int)&structure;
// now i'd like to set structure2 to the same memory location as 1.
// point is the 8-byte int location (i assume?) of structure.
// so naturally i'd assume that by casting as a DummyStructure pointer
// that long int would act as a pointer to that 1.
// It doesn't.
structure2 = (DummyStructure*)point;
I stress that i've tried every permutation of ** and * that is possible. I just don't get it. Either it doesn't compile, or it does, and when it does i end up with seemingly random numbers for the fields contained in structure2. I assume that somehow i'm winding up with an incorrect memory location, but how else can you get it except from using the &?
I have the memory location, right? How do i set the structure to that location?
EDIT; I forgot to mention (and subsequent answers have asked) but i'm intending to use this to wrap libvorbis for jni. Using jni means that i can't pass-back any of the structs that libvorbis does, but it requires them for its core functions. Therefore my wrapper is going to use vorbis directly to make the structs, and i pass back to java the pointer to them so that when i need to fill the buffer with more sound, i can simply re-reference the struct objects from the integer value of the pointer.
Why are you trying to cast pointers to integers and back? Is it just to learn, to figure something out, to work around some (untold) restriction, or what? It's a rather strange thing to be doing in a "plain" program such as this, as there is no point.
One likely cause of your problems is that there's no guarantee that a pointer will even fit in a long int. You can check by adding this code:
printf("a regular pointer is %u bytes, long int is %u",
(unsigned int) sizeof structure, (unsigned int) sizeof point);
If the numbers printed are different, that's probably the largest cause of your problems.
If you're using C99, you should #include <stdint.h> and then use the intptr_t type instead of unsigned long to hold a pointer, in any case.
structure is already a pointer, so you don't have to take the address there:
long int point = reinterpret_cast<long int>(structure);
DummyStructure* structure2 = reinterpret_cast<DummyStructure*>(point);
structure is already a pointer. You just want to do point = (long int) structure; (although, realistically, why a long int is involved at all, I don't know. It's a lot easier to just do structure2=structure; which works fine since structure and structure2 are both pointers.)
When you do &structure you're getting the memory location where the pointer itself is stored, which is why it isn't the correct value. You really probably don't want to ever use &structure unless it's being passed into a function which is going to change which DummyStructure structure points to.
Others have answered your question, but I'd like to make a more general comment. You mention JNI; in this case, you don't want long int, but jlong (which will be a typedef to either long int or long long int, depending on the machine. The problem is that long will have a different size, depending on the machine, and will map to a different Java type. Of course, you're counting on the fact that jlong will be big enough to hold a pointer, but since jlong is 64 bits, this seems like a safe bet for the immediate future (and beyond—I don't see a time coming where 64 bits of addressing doesn't suffice).
Also, I would suggest you borrow an idea from Swig, and avoid the subtleties of pointer to integral conversions, by using something like the following:
jlong forJNI = 0;
*reinterpret_cast<DummyStructure*>( &forJNI ) = structure;
// ...
structure2 = *reinterpret_cast<DummyStructure*>( &forJNI );
This is ugly, but it is guaranteed to work (with one caveat) for all
systems where sizeof(DummyStructure*) <= 64.
Just be sure to compile with strict aliasing turned off. (You have to
do this anytime you cast between pointers and ints. IMHO, you shouldn't
have to in cases where the casts are visible to the compiler, but some
compiler writers prefer breaking code intentionally, even when the
intent is clear.)
Long ints aren't the same as pointers. Why don't you just do:
DummyStructure** point;
structure = malloc(sizeof(DummyStructure));
structure->rate = 19;
structure->duration = 92;
point = &structure;
structure2 = *point;
The problem is probably a combination of the fact that 1) you don't dereference point. structure2 is a pointer to structure which is itself a pointer. You'd have to do:
structure2 = *((DummyStructure*)point);
But on top of that is the fact that long ints aren't the same as pointers. There's probably also a signedness issue here.
point = (long int)&structure;
This takes the address of structure which is a DummyStructure* and assign it to point. So point should be a double pointer (pointer to pointer). And when you assign structure2, it should be properly type casted.
typedef struct
{
int rate;
int duration;
} DummyStructure;
DummyStructure* structure;
DummyStructure* structure2;
long int **point;
structure = (DummyStructure*)malloc(sizeof(DummyStructure));
structure->rate = 19;
structure->duration = 92;
point = (long int **)&structure;
structure2 = (DummyStructure*)*point;
If your intention is to make structure2 point to the same memory location as structure, why don't you directly assign it rather than having an intermediate long int **.
The bug is that point is the address of structure, which is itself a pointer to a DummyStructure. In order for structure2 to point to the same thing as structure, you need to dereference point. Ignoring for a second all length, signedness, and similar issues,
structure2 = *(DummyStructure**)point;
would fix your code. But why not just:
structure2 = structure;
If you really want to hold a pointer in something generic, hold it in a void*. At least that's the right size.

C++: how to cast 2 bytes in an array to an unsigned short

I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.