Pointer arithmetic and portability - c++

I'm writing an application and I had to do some pointers arithmetic. However this application will be running on different architecture! I was not really sure if this would be problematic but after reading this article, I thought that I must change it.
Here was my original code that I didn't like much:
class Frame{
/* ... */
protected:
const u_char* const m_pLayerHeader; // Where header of this layer starts
int m_iHeaderLength; // Length of the header of this layer
int m_iFrameLength; // Header + payloads length
};
/**
* Get the pointer to the payload of the current layer
* #return A pointer to the payload of the current layer
*/
const u_char* Frame::getPayload() const
{
// FIXME : Pointer arithmetic, portability!
return m_pLayerHeader + m_iHeaderLength;
}
Pretty bad isn't it! Adding an int value to a u_char pointer! But then I changed to this:
const u_char* Frame::getPayload() const
{
return &m_pLayerHeader[m_iHeaderLength];
}
I think now, the compiler is able to say how much to jump! Right? Is the operation [] on array considered as pointer arithmetic? Does it fix the portability problem?

p + i and &p[i] are synonyms when p is a pointer and i a value of integral type. So much that you can even write &i[p] and it's still valid (just as you can write i + p).
The portability issue in the example you link was coming from sizeof(int) varying across platforms. Your code is just fine, assuming m_iHeaderLength is the number of u_chars you want to skip.

In your code you are advancing the m_pLayerHeader by m_iHeaderLength u_chars. As long as whatever wrote the data you are pointing into has the same size for u_char, and i_HeaderLength is the number of u_chars in the header area you are safe.
But if m_iHeaderLength is really referring to bytes, and not u_chars, then you may have a problem if m_iHeaderLength is supposed to advance the pointer past other types than char.
Say you are sending data from a 16-bit system to a 32-bit system, your header area is defined like this
struct Header {
int something;
int somethingElse;
};
Assume that is only part of the total message defined by the struct Frame.
On the 32-bit machine you write the data out to a port that the 16-bit machine will read from.
port->write(myPacket, sizeof(Frame));
On the 16-bit machine you have the same Header definition, and try to read the information.
port->read(packetBuffer, sizeof(Frame));
You are already in trouble because you've tried to read twice the amount of data the sender wrote. The size of int on the 16-bit machine doing the reading is two, and the size of the header is four. But the header size was eight on the sending machine, two ints of four bytes each.
Now you attempt to advance your pointer
m_iHeaderLength = sizeof(Header);
...
packetBuffer += m_iHeaderLength;
packetBuffer will still be pointing into the data which was in the header in the frame sent from the originator.

If there is a portability problem, then no, that wouldn't fix it. m_pLayerHeader + m_iHeaderLength and &m_pLayerHeader[m_iHeaderLength] are completely equivalent (in this case).

Related

What does (reinterpret_cast<char*>(this), sizeof(*this)) mean?

I am a beginner and doing my first project. I cannot understand this line:
(reinterpret_cast<char*>(this), sizeof(*this))
I assume this is in the context of passing these two values as arguments to a function, certainly in the context of a class. Maybe you're just seeing this all around and wondering why it keeps coming up...
This is a relatively common pattern when serializing* an instance of a class. Let's break it down into the two parts (aka "arguments").
reinterpret_cast<char*>(this)
reinterpret_cast<A>(B) means whatever B is, actually treat it as A
char* is a pointer to a char
char is basically just a byte
this is a "pointer" to the current class, of whatever type.
A "pointer" to something literally is the location in memory of the first byte that represents that something
So we're getting a pointer to the first byte, in memory, that represents this's class.
sizeof(*this)
sizeof(A) gets the number of bytes it takes to hold whatever type A is
*this gets the value of the pointer
this (again) is a "pointer" to the current class, of whatever type.
So we get the number of bytes, in memory, that are used to represent this's class.
Why?
Serialization
There are lots of APIs in C/C++ that need to use a block of bytes, in particular when interfacing with things outside of pure software world, storing or transmitting blocks of data, which very often needs a way to "serialize" the data in question.
The location and the length/size (the arguments to the function) are the minimum bits of information needed to describe a block of memory, aka a block of bytes. You'll see this kind of thing when you're serializing a class.
Example class that might use this
These days it's common to need to send "pixel" data out over a serial line to control "addressable" LEDs. It might be helpful to do something like this:
// Some function that sends the raw bytes on the wire to the WS2812 pixels
extern void sendBytes(char const * bytes, size_t num);
struct Pixel {
// WS2812 uses GRB order instead of RGB :(
uint8_t g;
uint8_t r;
uint8_t b;
void display() {
sendBytes(reinterpret_cast<char*>(this), sizeof(*this));
}
// Other functions
};

Why reserve memory in the structure?

I often see structures in the code, at the end of which there is a memory reserve.
struct STAT_10K4
{
int32_t npos; // position number
...
float Plts;
Pxts;
float Plto [NUM];
uint32_t reserv [(NUM * 3)% 2 + 1];
};
Why do they do this?
Why are some of the reserve values dependent on constants?
What can happen if you do not make such reserves? Or make a mistake in their size?
This is a form of manual padding of a class to make its size a multiple of some number. In your case:
uint32_t reserv [(NUM * 3)% 2 + 1];
NUM * 3 % 2 is actually nonsensical, as it would be equivalent to NUM % 2 (not considering overflow). So if the array size is odd, we pad the struct with one additional uint32_t, on top of + 1 additional ones. This padding means that STAT_10K4's size is always a multiple of 8 bytes.
You will have to consult the documentation of your software to see why exactly this is done. Perhaps padding this struct with up to 8 bytes makes some algorithm easier to implement. Or maybe it has some perceived performance benefit. But this is pure speculation.
Typically, the compiler will pad your structs to 64-bit boundaries if you use any 64-bit types, so you don't need to do this manually.
Note: This answer is specific to mainstream compilers and x86. Obviously this does not apply to compiling for TI-calculators with 20-bit char & co.
This would typically be to support variable-length records. A couple of ways this could be used will be:
1 If the maximum number of records is known then a simple structure definition can accomodate all cases.
2 In many protocols there is a "header-data" idiom. The header will be a fixed size but the data variable. The data will be received as a "blob". Thus the structure of the header can be declared and accessed by a pointer to the blob, and the data will follow on from that. For example:
typedef struct
{
uint32_t messageId;
uint32_t dataType;
uint32_t dataLenBytes;
uint8_t data[MAX_PAYLOAD];
}
tsMessageFormat;
The data is received in a blob, so a void* ptr, size_t len.
The buffer pointer is then cast so the message can be read as follows:
tsMessageFormat* pMessage = (psMessageFormat*) ptr;
for (int i = 0; i < pMessage->dataLenBytes; i++)
{
//do something with pMessage->data[i];
}
In some languages the "data" could be specified as being an empty record, but C++ does not allow this. Sometimes you will see the "data" omitted and you have to perform pointer arithmetic to access the data.
The alternative to this would be to use a builder pattern and/or streams.
Windows uses this pattern a lot; many structures have a cbSize field which allows additional data to be conveyed beyond the structure. The structure accomodates most cases, but having cbSize allows additional data to be provided if necessary.

i2c byte write function, how works this code? I can´t understand complety

Can someone explain me how works this lines
template <class T>
...const T& value)...
.
.
.
const uint8_t* p = (const uint8_t*)(const void*)&value;
on this code (i2c byte write for eeprom)
template <class T>
uint16_t writeObjectSimple(uint8_t i2cAddr, uint16_t addr, const T& value){
const uint8_t* p = (const uint8_t*)(const void*)&value;
uint16_t i;
for (i = 0; i < sizeof(value); i++){
Wire.beginTransmission(i2cAddr);
Wire.write((uint16_t)(addr >> 8)); // MSB
Wire.write((uint16_t)(addr & 0xFF));// LSB
Wire.write(*p++);
Wire.endTransmission();
addr++;
delay(5); //max time for writing in 24LC256
}
return i;
}
template <class T>
uint16_t readObjectSimple(uint8_t i2cAddr, uint16_t addr, T& value){
uint8_t* p = (uint8_t*)(void*)&value;
uint8_t objSize = sizeof(value);
uint16_t i;
for (i = 0; i < objSize; i++){
Wire.beginTransmission (i2cAddr);
Wire.write((uint16_t)(addr >> 8)); // MSB
Wire.write((uint16_t)(addr & 0xFF));// LSB
Wire.endTransmission();
Wire.requestFrom(i2cAddr, (uint8_t)1);
if(Wire.available()){
*p++ = Wire.read();
}
addr++;
}
return i;
}
I think the lines works like pointers?
I can't understand how the code store correctly each type of data when I do that
struct data{
uint16_t yr;
uint8_t mont;
uint8_t dy;
uint8_t hr;
uint8_t mn;
uint8_t ss;
};
.
.
.
data myString;
writeObjectSimple(0x50,0,myString);
And then recover the values correctly using
data myStringRead;
readObjectSimple(0x50,0,myStringRead)
the function i2c byte write detect some special character between each data type to store in the correct place?
thx
First I have to state, that this code has been written by a person not fully familiar with the differences between how C++ and C deal with pointer types. My impression that this person has a strong C background and was simply trying to shut up a C++ compiler to throw warnings.
Let's break down what this line of code does
const uint8_t* p = (const uint8_t*)(const void*)&value;
The intent here is to take a buffer of an arbitrary type – which we don't even know here, because it's a template type – and treat it as if it were a buffer of unsigned 8 bit integers. The reason for that is, that later on the contents of this buffer are to be sent over a wire bit by bit (this is called "bit banging").
In C the way to do this would have been to write
const uint8_t* p = (const void*)&value;
This works, because in C is perfectly valid to assign a void* typed pointer to a non-void pointer and vice versa. The important rule set by the C language however is, that – technically – when you convert a void* pointer to a non-void type, then the void* pointer must have been obtained by taking the address (& operator) of an object of the same type. In practice however implementations allow casting of a void* types pointer to any type that is alignment compatible to the original object and for most – but not all! – architectures uint8_t buffers may be aligned to any address.
However in C++ this back-and-forth assignment of void* pointers is not allowed implicitly. C++ requires an explicit cast (which is also why you can often see C++ programmers writing in C code something like struct foo *p = (struct foo*)malloc(…)).
So what you'd write in C++ is
const uint8_t* p = (const uint8_t*)&value;
and that actually works and doesn't throw any warnings. However some static linter tools will frown upon it. So the first cast (you have to read casts from right to left) first discards the original typing by casting to void* to satisfy the linter, then the second cast casts to the target type to satisfy the compiler.
The proper C++ idiom however would have been to use a reinterpret_cast which most linters will also accept
const uint8_t* p = reinterpret_cast<const uint8_t*>(&value);
However all this casting still invokes implementation defined behavior and when it comes to bit banging you will be hit by endianess issues (the least).
Bit banging itself works by extracting each bit of a value one by one and tickling the wires that go in and out of a processor's port accordingly. The operators used here are >> to shift bits around and binary & to "select" particular bits.
So for example when you see a statement like
(v & (1<<x))
then what is does is checking if bit number x is set in the variable v. You can also mask whole subsets of the bits in a variable, by masking (= applying the binary & operator – not to be confused with the unary "address of" operator that yields a pointer).
Similarly you can use the | operator to "overlay" the bits of several variables onto each other. Combined with the shift operators you can use this to build the contents of a variable bit-by-bit (with the bits coming in from a port).
The target device is an I2C EEPROM, so the general form for writing is to send the destination address followed by some data. To read data from the EEPROM, you write the source address and then switch to a read mode to clock out the data.
First of all, the line:
const uint8_t* p = (const uint8_t*)(const void*)&value;
is simply taking the templated type T and casting away its type, and converting it to a byte array (uint8_t*). This pointer is used to advance one byte at a time through the memory containing value.
In the writeObjectSimple method, it first writes the 16-bit destination address (in big-endian format) followed by a data byte (where p is a data pointer into value):
Wire.write(*p++);
This writes the current byte from value and moves the pointer along one byte. It repeats this for however many bytes are in the type of T. After writing each byte, the destination address is also incremented, and it repeats.
When you code:
data myString;
writeObjectSimple(0x50,0,myString);
the templated writeObjectSimple will be instantiated over the data type, and will write its contents (one byte at a time) starting at address 0, to the device with address 0x50. It uses sizeof(data) to know how many bytes to iterate over.
The read operation works very much the same way, but writes the source address and then requests a read (which is implicit in the LSB of the I2C address) and reads one byte at a time back from the device.
the function i2c byte write detect some special character between each data type to store in the correct place?
Not really, each transaction simply contains the address followed by the data.
[addr_hi] [addr_lo] [data]
Having explained all that, operating one byte at a time is a very inefficient way of achieving this. The device is a 24LC256, and this 24LC family of EEPROMs support sequential writes (up to a page) in size in a single I2C transaction. So you can easily send the entire data structure in one transfer, and avoid having to retransmit the address (2 bytes for every byte of data). Have a look in the datasheet for the full details.

C++ casting a struct to std::vector<char> memory alignment

I'm trying to cast a struct into a char vector.
I wanna send my struct casted in std::vector throw a UDP socket and cast it back on the other side. Here is my struct whith the PACK attribute.
#define PACK( __Declaration__ ) __pragma( pack(push, 1) ) __Declaration__ __pragma( pack(pop) )
PACK(struct Inputs
{
uint8_t structureHeader;
int16_t x;
int16_t y;
Key inputs[8];
});
Here is test code:
auto const ptr = reinterpret_cast<char*>(&in);
std::vector<char> buffer(ptr, ptr + sizeof in);
//send and receive via udp
Inputs* my_struct = reinterpret_cast<Inputs*>(&buffer[0]);
The issue is:
All works fine except my uint8_t or int8_t.
I don't know why but whenever and wherever I put a 1Bytes value in the struct,
when I cast it back the value is not readable (but the others are)
I tried to put only 16bits values and it works just fine even with the
maximum values so all bits are ok.
I think this is something with the alignment of the bytes in the memory but i can't figure out how to make it work.
Thank you.
I'm trying to cast a struct into a char vector.
You cannot cast an arbitrary object to a vector. You can cast your object to an array of char and then copy that array into a vector (which is actually what your code is doing).
auto const ptr = reinterpret_cast<char*>(&in);
std::vector<char> buffer(ptr, ptr + sizeof in);
That second line defines a new vector and initializes it by copying the bytes that represent your object into it. This is reasonable, but it's distinct from what you said you were trying to do.
I think this is something with the alignment of the bytes in the memory
This is good intuition. If you hadn't told the compiler to pack the struct, it would have inserted padding bytes to ensure each field starts at its natural alignment. The fact that the operation isn't reversible suggests that somehow the receiving end isn't packed exactly the same way. Are you sure the receiving program has exactly the same packing directive and struct layout?
On x86, you can get by with unaligned data, but you may pay a large performance cost whenever you access an unaligned member variable. With the packing set to one, and the first field being odd-sized, you've guaranteed that the next fields will be unaligned. I'd urge you to reconsider this. Design the struct so that all the fields fall at their natural alignment boundaries and that you don't need to adjust the packing. This may make your struct a little bigger, but it will avoid all the alignment and performance problems.
If you want to omit the padding bytes in your wire format, you'll have to copy the relevant fields byte by byte into the wire format and then copy them back out on the receiving end.
An aside regarding:
#define PACK( __Declaration__ ) __pragma( pack(push, 1) ) __Declaration__ __pragma( pack(pop) )
Identifiers that begin with underscore and a capital letter or with two underscores are reserved for "the implementation," so you probably shouldn't use __Declaration__ as the macro's parameter name. ("The implementation" refers to the compiler, the standard library, and any other runtime bits the compiler requires.)
1
vector class has dynamically allocated memory and uses pointers inside. So you can't send the vector (but you can send the underlying array)
2
SFML has a great class for doing this called sf::packet. It's free, open source, and cross-platform.
I was recently working on a personal cross platform socket library for use in other personal projects and I eventually quit it for SFML. There's just TOO much to test, I was spending all my time testing to make sure stuff worked and not getting any work done on the actual projects I wanted to do.
3
memcpy is your best friend. It is designed to be portable, and you can use that to your advantage.
You can use it to debug. memcpy the thing you want to see into a char array and check that it matches what you expect.
4
To save yourself from having to do tons of robustness testing, limit yourself to only chars, 32-bit integers, and 64-bit doubles. You're using different compilers? struct packing is compiler and architecture dependent. If you have to use a packed struct, you need to guarantee that the packing is working as expected on all platforms you will be using, and that all platforms have the same endianness. Obviously, that's what you're having trouble with and I'm sorry I can't help you more with that. I would I would recommend regular serializing and would definitely avoid struct packing if I was trying to make portable sockets.
If you can make those guarantees that I mentioned, sending is really easy on LINUX.
// POSIX
void send(int fd, Inputs& input)
{
int error = sendto(fd, &input, sizeof(input), ..., ..., ...);
...
}
winsock2 uses a char* instead of a void* :(
void send(int fd, Inputs& input)
{
char buf[sizeof(input)];
memcpy(buf, &input, sizeof(input));
int error = sendto(fd, buf, sizeof(input), ..., ..., ...);
...
}
Did you tried the most simple approach of:
unsigned char *pBuff = (unsigned char*)&in;
for (unsigned int i = 0; i < sizeof(Inputs); i++) {
vecBuffer.push_back(*pBuff);
pBuff++;
}
This would work for both, pack and non pack, since you will iterate the sizeof.

A Better Way To Build a Packet - Byte by Byte?

This is related to my question asked here today on SO. Is there a better way to build a packet to send over serial rather than doing this:
unsigned char buff[255];
buff[0] = 0x02
buff[1] = 0x01
buff[2] = 0x03
WriteFile(.., buff,3, &dwBytesWrite,..);
Note: I have about twenty commands to send, so if there was a better way to send these bytes to the serial device in a more concise manner rather than having to specify each byte, it would be great. Each byte is hexadecimal, with the last byte being the checksum. I should clarify that I know I will have to specify each byte to build the commands, but is there a better way than having to specify each array position?
You can initialize static buffers like so:
const unsigned char command[] = {0x13, 0x37, 0xf0, 0x0d};
You could even use these to initialize non-const buffers and then replace only changing bytes by index.
Not sure what you're asking. If you ask about the problem of setting the byte one by one and messing up the data, usually this is doen with a packed struct with members having meaningful names. Like:
#pragma push(pack)
#pragma pack(1)
struct FooHeader {
uint someField;
byte someFlag;
dword someStatus;
};
#pragma pack(pop)
FooHeader hdr;
hdr.someField = 2;
hdr.someFlag = 3;
hdr.someStatus = 4;
WriteFile(..., sizeof(hdr), &hdr);
Is there a better way to build a packet than assembling it byte by byte?
Yes, but it will require some thought and some careful engineering. Many of the other answers tell you other mechanisms by which you can put together a sequence of bytes in C++. But I suggest you design an abstraction that represents a part of a packet:
class PacketField {
void add_to_packet(Packet p);
};
Then you can define various subclasses:
Add a single byte to the packet
Add a 16-bit integer in big-endian order. Another for little-endian. Other widths besides 16.
Add a string to the packet; code the string by inserting the length and then the bytes.
You also can define a higher-order version:
PacketField sequence(PacketField first, PacketField second);
Returns a field that consists of the two arguments in sequence. If you like operator overloading you could overload this as + or <<.
Your underlying Packet abstraction will just be an extensible sequence of bytes (dynamic array) with some kind of write method.
If you wind up programming a lot of network protocols, you'll find this sort of design pays off big time.
Edit: The point of the PacketField class is composability and reuse:
By composing packet fields you can create more complex packet fields. For example, you could define "add a TCP header" as a function from PacketFields to PacketFields.
With luck you build up a library of PacketFields that are specific to your application or protocol family or whatever. Then you reuse the fields in the library.
You can create subclasses of PacketField that take extra parameters.
It's quite possibly that you can do something equally nice without having to have this extra level of indirection; I'm recommending it because I've seen it used effectively in other applications. You are decoupling the knowledge of how to build a packet (which can be applied to any packet, any time) from the act of actually building a particular packet. Separating concerns like this can help reuse.
Yes, there is a better method. Have your classes read from and write to a packed buffer. You could even implement this as an interface. Templates would help to.
An example of writing:
template <typename Member_Type>
void Store_Value_In_Buffer(const Member_Type&, member,
unsigned char *& p_buffer)
{
*((Member_Type *)(p_buffer)) = member;
p_buffer += sizeof(Member_Type);
return;
}
struct My_Class
{
unsigned int datum;
void store_to_buffer(unsigned char *& p_buffer)
{
Store_Value_In_Buffer(datum, buffer);
return;
}
};
//...
unsigned char buffer[256];
unsigned char * p_buffer(buffer);
MyClass object;
object.datum = 5;
object.store_to_buffer(p_buffer);
std::cout.write(p_buffer, 256);
Part of the interface is also to query the objects for the size that they would occupy in the buffer, say a method size_in_buffer. This is left as an exercise for the reader. :-)
There is a much better way, which is using structs to set the structures. This is usually how network packets are built on a low level.
For example, say you have packets which have an id, length, flag byte, and data, you'd do something like this:
struct packet_header {
int id;
byte length;
byte flags;
};
byte my_packet[] = new byte[100];
packet_header *header = &my_packet;
header->id = 20;
header->length = 10; // This can be set automatically by a function, maybe?
// etc.
header++; // Header now points to the data section.
Do note that you're going to have to make sure that the structures are "packed", i.e. when you write byte length, it really takes up a byte. Usually, you'd achieve this using something like #pragma pack or similar (you'll have to read about your compiler's pragma settings).
Also, note that you should probably use functions to do common operations. For example, create a function which gets as input the size, data to send, and other information, and fills out the packet header and data for you. This way, you can perform calculations about the actual size you want to write in the length field, you can calculate the CRC inside the function, etc.
Edit: This is a C-centric way of doing things, which is the style of a lot of networking code. A more C++-centric (object oriented) approach could also work, but I'm less familiar with them.
const char *c = "\x02\x02\x03";